Core vocabulary

T_CONST

Values of this type are used to hold integral constants. Zwerg constants allow storing signed or unsigned 64-bit numbers. Arithmetic operators handle transitions between signed and unsigned values transparently:

$ dwgrep '0xffffffffffffff00 0xffffffffffffffff sub'
-0xff

Note that division and modulo by zero, as well as overflow, are caught at runtime:

$ dwgrep '5 (2, 0, 3) div'
2
Error: division by zero occured when computing 5/0
1

$ dwgrep '0xffffffffffffffff (-1, 0, 1) add'
0xfffffffffffffffe
0xffffffffffffffff
Error: overflow occured when computing 18446744073709551615+1

Applicable words

add, div, mod, mul, sub, value

T_SEQ

Values of this type hold sequences, ordered heterogeneous collections of other values (including other sequences):

$ dwgrep '[[], 2, "yes"] elem type'
T_SEQ
T_CONST
T_STR

T_STR

Values of this type hold strings (sequences of characters). No unicode support is provided as such, though UTF-8 should naturally work. Zwerg strings are not NUL-terminated:

$ dwgrep '"abc \x00 def" (, length)'
abc  def
9

!=, !eq, !ge, !gt, !le, !lt, !ne, <, <=, ==, >, >=, ?eq, ?ge, ?gt, ?le, ?lt, ?ne

These are comparison operators. The ones with no alphanumeric characters in them, ==, !=, < etc., are for use in infix expressions, such as:

entry (offset == 0x123)

The others are low-level assertions with equivalent behavior.

Two elements are inspected: one below TOS and TOS (A and B, respectively). The assertion holds if A and B satisfy a relation implied by the word.

For example:

$ dwgrep '1 2 ?lt "yep"'
---
yep
2
1

Note that there is both !eq and ?ne, !lt and ?ge, etc. These are mostly for symmetry. For consistency, the first character of any assertion is always either ? or !, and both flavors are always available.

!empty, ?empty

T_SEQ

Asserts that a sequence on TOS is empty. This predicate holds for both empty sequence literals as well as sequences that just happen to be empty:

$ dwgrep '[] ?empty'
[]

$ dwgrep 'dwgrep '[(1, 2, 3) (== 0)] ?empty'
[]

T_STR

This predicate holds if the string on TOS is empty:

$ dwgrep '"" ?empty ">%s< is empty"'
>< is empty

$ dwgrep '"\x00" !empty ">%s< is not empty"'
>< is not empty

!ends, ?ends

T_SEQ T_SEQ, T_STR T_STR

The word ?ends asserts that the value on TOS forms a suffix of the value below it. This would be used e.g. like so:

(hay stack expression) ?("needle" ?ends)

This word is applicable to sequences as well as strings:

[hay stack] ?([needle] ?ends)

For example:

$ dwgrep '"foobar" "bar" ?ends'
---
bar
foobar

!find, ?find

T_SEQ T_SEQ, T_STR T_STR

?find asserts that the value on top of stack is contained within the one below it. This would be used e.g. like so:

(hay stack expression) ?("needle" ?find)

This word is applicable to sequences as well as strings:

[hay stack] ?([needle] ?find)

For example:

$ dwgrep '"foobar" "oba" ?find'
---
oba
foobar

To determine whether a sequence (or a string) contains a particular element, you would use the following construct instead:

[that sequence] (elem == something)

E.g.:

[child @AT_name] ?(elem == "foo")
[child] ?(elem @AT_name == "foo")

To filter only those elements that match, you could do the following:

[child] [|L| L elem ?(@AT_name == "foo")]

The above is suitable if the origin of the sequence is out of your control. It is of course preferable to write this sort of thing directly, if possible:

[child ?(@AT_name == "foo")]

!match, !~, =~, ?match

T_STR T_STR

This asserts that TOS (which is a string with a regular expression) matches the string below TOS. The whole string has to match. If you want to look for matches anywhere in the string, just surround your expression with .*'s:

"haystack" ?(".*needle.*" ?match)

For example:

$ dwgrep '"foobar" "f.*r" ?match'
---
f.*r
foobar

!starts, ?starts

T_SEQ T_SEQ, T_STR T_STR

The word ?start asserts that the value on TOS forms a prefix of the value below it. This would be used e.g. like so:

(hay stack expression) ?("needle" ?starts)

This word is applicable to sequences as well as strings:

[hay stack] ?([needle] ?starts)

For example:

$ dwgrep '"foobar" "foo" ?starts'
---
foo
foobar

add

T_SEQ T_SEQ -> T_SEQ

Concatenate two sequences and yield the resulting sequence:

$ dwgrep '[1, 2, 3] [4, 5, 6] add'
[1, 2, 3, 4, 5, 6]

Using sub-expression capture may be a more flexible alternative to using add:

$ dwgrep '[1, 2, 3] [4, 5, 6] [7, 8, 9] [|A B C| (A, B, C) elem]'
[1, 2, 3, 4, 5, 6, 7, 8, 9]

T_CONST T_CONST ->? T_CONST

Zwerg contains a suite of basic operators for integer arithmetic: add, sub, mul, div and mod. Two elements are popped: A and B, with B the original TOS, and A OP B is pushed again. OP is an operation suggested by the operator name.

Integers in Zwerg are 64-bit signed or unsigned quantities. A value of type T_CONST can hold either signed or unsigned number. Which it is is decided automatically. Arithmetic operators handle these cases transparently.

Overflows and division and modulo by zero produce an error message and abort current computation, which is the reason this operation is denoted with ->? relation:

$ dwgrep '5 (2, 0, 3) div'
2
Error: division by zero occured when computing 5/0
1

T_STR T_STR -> T_STR

add concatenates two strings on TOS and yields the resulting string:

$ dwgrep '"foo" "bar" add'
foobar

Using formatting strings may be a better way to concatenate strings:

$ dwgrep '"foo" "bar" "baz" "%s%s%s"'
foobarbaz

bin, dec, hex, oct

bin is used for converting constants to base-2, oct to base-8, dec to base-10 and hex to base-16. These operators yield incoming stack, except the domain of constant on TOS is changed. Examples:

$ dwgrep '64 hex'
0x40

$ dwgrep 'DW_AT_name hex'
0x3

The value remains a constant, only the way it's displayed changes. You can use "%s" to convert it to a string, in which case it's rendered with the newly-selected domain:

$ dwgrep 'DW_AT_name "=%s="'
=DW_AT_name=
$ dwgrep 'DW_AT_name hex "=%s="'
=0x3=

Though you can achieve the same effect with formatting directives %b, %o, %d and %x:

$ dwgrep 'DW_AT_name "=%x="'
=0x3=

div, mod, mul, sub

T_CONST T_CONST ->? T_CONST

Zwerg contains a suite of basic operators for integer arithmetic: add, sub, mul, div and mod. Two elements are popped: A and B, with B the original TOS, and A OP B is pushed again. OP is an operation suggested by the operator name.

Integers in Zwerg are 64-bit signed or unsigned quantities. A value of type T_CONST can hold either signed or unsigned number. Which it is is decided automatically. Arithmetic operators handle these cases transparently.

Overflows and division and modulo by zero produce an error message and abort current computation, which is the reason this operation is denoted with ->? relation:

$ dwgrep '5 (2, 0, 3) div'
2
Error: division by zero occured when computing 5/0
1

drop, dup, over, rot, swap

These words reorder elements on stack according to the following schemes:

op before after
dup A B C D A B C D D
over A B C D A B C D C
swap A B C D A B D C
rot A B C D A C D B
drop A B C D A B C

Realistically, most of what end users should write will be an occasional dup or drop. Most of the stack reorganization effects can be described more clearly using bindings and sub-expressions. But the operators are present for completeness' sake.

elem, relem

T_SEQ ->* T_???

For each element in the input sequence, which is popped, yield a stack with that element pushed on top.

To zip contents of two top lists near TOS, do:

$ dwgrep '[1, 2, 3] ["a", "b", "c"]
          (|A B| A elem B elem) ?((|A B| A pos == B pos)) [|A B| A, B]'
[1, a]
[2, b]
[3, c]

The first parenthesis enumerates all combinations of elements. The second then allows only those that correspond to each other position-wise. At that point we get three stacks, each with two values. The last bracket then packs the values on stacks back to sequences, thus we get three stacks, each with a two-element sequence on top.

The expression could be simplified a bit on expense of clarity:

[1, 2, 3] ["a", "b", "c"]
(|A B| A elem B elem (pos == drop pos)) [|A B| A, B]

relem operates in the same fashion as elem, but backwards.

T_STR ->* T_STR

The description at T_SEQ overload applies. For strings, elem and relem yield individual characters of the string, again as strings:

$ dwgrep '["foo" elem]'
[f, o, o]

false, true

length

T_SEQ -> T_CONST

Yield number of elements of sequence on TOS.

E.g. the following tests whether the DIE's whose all attributes report the same form as their abbreviations suggest, comprise all DIE's. This test comes from dwgrep's test suite:

[entry ([abbrev attribute label] == [attribute label])] length
== [entry] length

(Note that this isn't anything that should be universally true, though it typically will, and it is for the particular file that this test is run on. Attributes for which their abbreviation suggests DW_FORM_indirect will themselves have a different form.)

T_STR -> T_CONST

Yields length of string on TOS:

dwgrep '"foo" length'
3

pos

Each function numbers elements that it produces, and stores number of each element along with the element. That number can be recalled by saying pos:

$ dwgrep ./tests/dwz-partial -e 'unit (|A| A root "%s" A pos)'
---
0
[34] compile_unit
---
1
[a4] compile_unit
---
2
[e1] compile_unit
---
3
[11e] compile_unit

If you wish to know the number of values produced, you have to count them by hand:

[|Die| Die child]
let Len := length;
(|Lst| Lst elem)

At this point, pos and Len can be used to figure out both absolute and relative position of a given element.

Note that every function counts its elements anew:

$ dwgrep '"foo" elem pos'
0
1
2

$ dwgrep '"foo" elem type pos'
0
0
0

type

This produces a constant according to the type of value on TOS (such as T_CONST, T_DIE, T_STR, etc.).

value

T_CONST -> T_CONST

Returns underlying value of the constant, with plain domain. For example:

$ dwgrep 'DW_FORM_flag value'
12

$ dwgrep '0xc value'
12