Core vocabulary¶
T_CONST¶
Values of this type are used to hold integral constants. Zwerg constants allow storing signed or unsigned 64-bit numbers. Arithmetic operators handle transitions between signed and unsigned values transparently:
$ dwgrep '0xffffffffffffff00 0xffffffffffffffff sub'
-0xff
Note that division and modulo by zero, as well as overflow, are caught at runtime:
$ dwgrep '5 (2, 0, 3) div'
2
Error: division by zero occured when computing 5/0
1
$ dwgrep '0xffffffffffffffff (-1, 0, 1) add'
0xfffffffffffffffe
0xffffffffffffffff
Error: overflow occured when computing 18446744073709551615+1
T_SEQ¶
Values of this type hold sequences, ordered heterogeneous collections of other values (including other sequences):
$ dwgrep '[[], 2, "yes"] elem type'
T_SEQ
T_CONST
T_STR
T_STR¶
Values of this type hold strings (sequences of characters). No unicode support is provided as such, though UTF-8 should naturally work. Zwerg strings are not NUL-terminated:
$ dwgrep '"abc \x00 def" (, length)'
abc def
9
!=, !eq, !ge, !gt, !le, !lt, !ne, <, <=, ==, >, >=, ?eq, ?ge, ?gt, ?le, ?lt, ?ne¶
These are comparison operators. The ones with no alphanumeric
characters in them, ==
, !=
, <
etc., are for use in infix
expressions, such as:
entry (offset == 0x123)
The others are low-level assertions with equivalent behavior.
Two elements are inspected: one below TOS and TOS (A and B, respectively). The assertion holds if A and B satisfy a relation implied by the word.
For example:
$ dwgrep '1 2 ?lt "yep"'
---
yep
2
1
Note that there is both !eq
and ?ne
, !lt
and ?ge
, etc.
These are mostly for symmetry. For consistency, the first character
of any assertion is always either ?
or !
, and both flavors are
always available.
!empty, ?empty¶
T_SEQ
¶
Asserts that a sequence on TOS is empty. This predicate holds for both empty sequence literals as well as sequences that just happen to be empty:
$ dwgrep '[] ?empty'
[]
$ dwgrep 'dwgrep '[(1, 2, 3) (== 0)] ?empty'
[]
T_STR
¶
This predicate holds if the string on TOS is empty:
$ dwgrep '"" ?empty ">%s< is empty"'
>< is empty
$ dwgrep '"\x00" !empty ">%s< is not empty"'
>< is not empty
!ends, ?ends¶
T_SEQ
T_SEQ
, T_STR
T_STR
¶
The word ?ends
asserts that the value on TOS forms a suffix of the
value below it. This would be used e.g. like so:
(hay stack expression) ?("needle" ?ends)
This word is applicable to sequences as well as strings:
[hay stack] ?([needle] ?ends)
For example:
$ dwgrep '"foobar" "bar" ?ends'
---
bar
foobar
!find, ?find¶
T_SEQ
T_SEQ
, T_STR
T_STR
¶
?find
asserts that the value on top of stack is contained within
the one below it. This would be used e.g. like so:
(hay stack expression) ?("needle" ?find)
This word is applicable to sequences as well as strings:
[hay stack] ?([needle] ?find)
For example:
$ dwgrep '"foobar" "oba" ?find'
---
oba
foobar
To determine whether a sequence (or a string) contains a particular element, you would use the following construct instead:
[that sequence] (elem == something)
E.g.:
[child @AT_name] ?(elem == "foo")
[child] ?(elem @AT_name == "foo")
To filter only those elements that match, you could do the following:
[child] [|L| L elem ?(@AT_name == "foo")]
The above is suitable if the origin of the sequence is out of your control. It is of course preferable to write this sort of thing directly, if possible:
[child ?(@AT_name == "foo")]
!match, !~, =~, ?match¶
T_STR
T_STR
¶
This asserts that TOS (which is a string with a regular expression)
matches the string below TOS. The whole string has to match. If you
want to look for matches anywhere in the string, just surround your
expression with .*
's:
"haystack" ?(".*needle.*" ?match)
For example:
$ dwgrep '"foobar" "f.*r" ?match'
---
f.*r
foobar
!starts, ?starts¶
T_SEQ
T_SEQ
, T_STR
T_STR
¶
The word ?start
asserts that the value on TOS forms a prefix of
the value below it. This would be used e.g. like so:
(hay stack expression) ?("needle" ?starts)
This word is applicable to sequences as well as strings:
[hay stack] ?([needle] ?starts)
For example:
$ dwgrep '"foobar" "foo" ?starts'
---
foo
foobar
add¶
T_SEQ
T_SEQ
->
T_SEQ
¶
Concatenate two sequences and yield the resulting sequence:
$ dwgrep '[1, 2, 3] [4, 5, 6] add'
[1, 2, 3, 4, 5, 6]
Using sub-expression capture may be a more flexible alternative to
using add
:
$ dwgrep '[1, 2, 3] [4, 5, 6] [7, 8, 9] [|A B C| (A, B, C) elem]'
[1, 2, 3, 4, 5, 6, 7, 8, 9]
T_CONST
T_CONST
->?
T_CONST
¶
Zwerg contains a suite of basic operators for integer arithmetic:
add
, sub
, mul
, div
and mod
. Two elements are
popped: A and B, with B the original TOS, and A OP B
is pushed
again. OP is an operation suggested by the operator name.
Integers in Zwerg are 64-bit signed or unsigned quantities. A value
of type T_CONST
can hold either signed or unsigned number. Which
it is is decided automatically. Arithmetic operators handle these
cases transparently.
Overflows and division and modulo by zero produce an error message and
abort current computation, which is the reason this operation is
denoted with ->?
relation:
$ dwgrep '5 (2, 0, 3) div'
2
Error: division by zero occured when computing 5/0
1
T_STR
T_STR
->
T_STR
¶
add
concatenates two strings on TOS and yields the resulting string:
$ dwgrep '"foo" "bar" add'
foobar
Using formatting strings may be a better way to concatenate strings:
$ dwgrep '"foo" "bar" "baz" "%s%s%s"'
foobarbaz
bin, dec, hex, oct¶
bin
is used for converting constants to base-2, oct
to base-8,
dec
to base-10 and hex
to base-16. These operators yield
incoming stack, except the domain of constant on TOS is changed.
Examples:
$ dwgrep '64 hex'
0x40
$ dwgrep 'DW_AT_name hex'
0x3
The value remains a constant, only the way it's displayed changes.
You can use "%s"
to convert it to a string, in which case it's rendered
with the newly-selected domain:
$ dwgrep 'DW_AT_name "=%s="'
=DW_AT_name=
$ dwgrep 'DW_AT_name hex "=%s="'
=0x3=
Though you can achieve the same effect with formatting directives
%b
, %o
, %d
and %x
:
$ dwgrep 'DW_AT_name "=%x="'
=0x3=
div, mod, mul, sub¶
T_CONST
T_CONST
->?
T_CONST
¶
Zwerg contains a suite of basic operators for integer arithmetic:
add
, sub
, mul
, div
and mod
. Two elements are
popped: A and B, with B the original TOS, and A OP B
is pushed
again. OP is an operation suggested by the operator name.
Integers in Zwerg are 64-bit signed or unsigned quantities. A value
of type T_CONST
can hold either signed or unsigned number. Which
it is is decided automatically. Arithmetic operators handle these
cases transparently.
Overflows and division and modulo by zero produce an error message and
abort current computation, which is the reason this operation is
denoted with ->?
relation:
$ dwgrep '5 (2, 0, 3) div'
2
Error: division by zero occured when computing 5/0
1
drop, dup, over, rot, swap¶
These words reorder elements on stack according to the following schemes:
op | before | after |
---|---|---|
dup | A B C D | A B C D D |
over | A B C D | A B C D C |
swap | A B C D | A B D C |
rot | A B C D | A C D B |
drop | A B C D | A B C |
Realistically, most of what end users should write will be an occasional dup or drop. Most of the stack reorganization effects can be described more clearly using bindings and sub-expressions. But the operators are present for completeness' sake.
elem, relem¶
T_SEQ
->*
T_???
¶
For each element in the input sequence, which is popped, yield a stack with that element pushed on top.
To zip contents of two top lists near TOS, do:
$ dwgrep '[1, 2, 3] ["a", "b", "c"]
(|A B| A elem B elem) ?((|A B| A pos == B pos)) [|A B| A, B]'
[1, a]
[2, b]
[3, c]
The first parenthesis enumerates all combinations of elements. The second then allows only those that correspond to each other position-wise. At that point we get three stacks, each with two values. The last bracket then packs the values on stacks back to sequences, thus we get three stacks, each with a two-element sequence on top.
The expression could be simplified a bit on expense of clarity:
[1, 2, 3] ["a", "b", "c"]
(|A B| A elem B elem (pos == drop pos)) [|A B| A, B]
relem
operates in the same fashion as elem
, but backwards.
T_STR
->*
T_STR
¶
The description at T_SEQ
overload applies. For strings, elem
and relem
yield individual characters of the string, again as
strings:
$ dwgrep '["foo" elem]'
[f, o, o]
false, true¶
length¶
T_SEQ
->
T_CONST
¶
Yield number of elements of sequence on TOS.
E.g. the following tests whether the DIE's whose all attributes report the same form as their abbreviations suggest, comprise all DIE's. This test comes from dwgrep's test suite:
[entry ([abbrev attribute label] == [attribute label])] length
== [entry] length
(Note that this isn't anything that should be universally true, though
it typically will, and it is for the particular file that this test is
run on. Attributes for which their abbreviation suggests
DW_FORM_indirect
will themselves have a different form.)
pos¶
Each function numbers elements that it produces, and stores number of
each element along with the element. That number can be recalled by
saying pos
:
$ dwgrep ./tests/dwz-partial -e 'unit (|A| A root "%s" A pos)'
---
0
[34] compile_unit
---
1
[a4] compile_unit
---
2
[e1] compile_unit
---
3
[11e] compile_unit
If you wish to know the number of values produced, you have to count them by hand:
[|Die| Die child]
let Len := length;
(|Lst| Lst elem)
At this point, pos
and Len
can be used to figure out both
absolute and relative position of a given element.
Note that every function counts its elements anew:
$ dwgrep '"foo" elem pos'
0
1
2
$ dwgrep '"foo" elem type pos'
0
0
0
type¶
This produces a constant according to the type of value on TOS (such as T_CONST, T_DIE, T_STR, etc.).