Tutorial¶
To get our feet wet, we will develop a query to find functions the
arguments of which are directly of structure or class types. Passing
such values in parameters may incur a potentially expensive copy, and
it might be desirable to audit these uses. If you have a dwgrep
distribution tarball handy, there's a file
tests/nontrivial-types.o
, which you can use to play along with the
tutorial. Or your can compile one yourself, this is the source that
we are using:
struct foo {};
int blah (struct foo f) {
return 0;
}
int main(int argc, char *argv[])
{
return blah ((struct foo) {});
}
First and foremost, a quick word about the command line
interface. dwgrep
is to some extent modeled after grep
utility. The following invokes dwgrep
with an empty query on
aforementioned file:
$ dwgrep '' ./tests/nontrivial-types.o
<Dwarf "./tests/nontrivial-types.o">
Much like with grep
, you can change the order of arguments if you
denote which of them is query:
$ dwgrep ./tests/nontrivial-types.o -e ''
<Dwarf "./tests/nontrivial-types.o">
You can also use -c
if you only care about the number of results,
not the results themselves:
$ dwgrep -c ./tests/nontrivial-types.o -e ''
1
The empty query is not very interesting, but it shows us that the input stack actually contains one value: the Dwarf file that we gave on the command line. So let's look inside!
entry
, unit
--- DIE iteration¶
There are several functions for entering the graph, the most useful
one is likely entry
. This takes [1] a Dwarf, and yields
[2] all DIE's in that Dwarf:
$ dwgrep ./tests/nontrivial-types.o -e 'entry'
[b] compile_unit
producer (strp) GNU C 4.6.3 20120306 (Red Hat 4.6.3-2);
[... more attributes ...]
[2d] structure_type
name (string) foo;
[... more output ...]
[1] | The word "takes" is generally used to mean that the
function expects a stack, whose top value(s) fits the given
description. E.g. Taken values are discarded from the stack, though typically the function in question pushes some other value(s) in their stead. |
[2] | This word is used to mean that the function produces
stack(s) with the outlined property. E.g. if length takes a
string or a sequence, and yields their length, that means that the
top value is popped, its length is determined, and a value
corresponding to that length is pushed back. Such stack is then
sent to output. |
Other words include unit
, which selects compilation and type
units, and abbrev
which selects abbreviation units.
It is impractical to have to repeat the whole dwgrep
command line
every time a code snippet needs to be shown. In the following we will
use the following notation:
$ 'unit'
CU 0
The dollar at the beginning represents the command line prompt, and then the query itself is given in single quotes.
If we don't care about query output (e.g. if it is trivial, predictable or uninteresting), we will just list the query itself:
entry
?DW_TAG_foo
, ?TAG_foo
--- DIE tag assertion¶
The assertion ?DW_TAG_foo
holds [3] for DIE's the tag of
which is DW_TAG_foo
.
[3] | The functions which either yield an unmodified input or nothing at all are called assertions, and are very common in Zwerg expressions. We say that assertion holds if the values on top of stack (TOS) satisfy some property. If the assertion holds, it yields an unchanged incoming stack. If it doesn't hold, it yields nothing at all. Assertions never modify the stack. |
You can write ?DW_TAG_foo
also as ?TAG_foo
, they do the same
thing. In the following, that's how we will refer to this assertion.
In Zwerg, function pipelines are formed by simply placing functions next to each other. The stacks that the left function produces becomes an input of the right function.
For example, to list all subprograms, one would use the following expression:
$ 'entry ?TAG_subprogram'
[35] subprogram
external (flag) true;
name (strp) blah;
[... more attributes ...]
[6c] subprogram
external (flag) true;
name (strp) main;
[... more attributes ...]
?DW_AT_foo
, !DW_AT_foo
--- attribute presence assertion¶
?DW_AT_foo
holds on DIE's that have an attribute DW_AT_foo
,
and on attributes that are DW_AT_foo
. As with tags (and any
Dwarf-related assertions at all), you can shorten this to ?AT_foo
.
!DW_AT_foo
similarly holds on DIE's that do NOT have an attribute
DW_AT_foo
, or on attributes that are NOT DW_AT_foo
.
For assertions, concatenation happens to work as a logical and, so for example, this is how we can get a list of all subprograms that have a declaration attribute:
entry ?TAG_subprogram ?AT_declaration
Similarly, to get subprograms that do not have declaration attribute, we say:
entry ?TAG_subprogram !AT_declaration
child
--- child traversal¶
child
takes a DIE and yields each of its children. (Which is to
say: it yields nothing for child-less DIE's, or it can actually yield
a number of stacks if there are many DIE's. Each of them will have
one of the children on TOS.)
Applied to DIE's in our running example, this will get us to formal arguments of the selected subprograms:
$ 'entry ?TAG_subprogram !AT_declaration child'
[58] formal_parameter
name (string) f;
[... more attributes ...]
[8f] formal_parameter
name (strp) argc;
[... more attributes ...]
[9d] formal_parameter
name (strp) argv;
[... more attributes ...]
If we want to make sure these children are actually formal parameters:
entry ?TAG_subprogram !AT_declaration child ?TAG_formal_parameter
@DW_AT_foo
--- value of attribute DW_AT_foo
¶
This word is used for accessing values of attributes. It always takes a DIE, but what it yields varies by the attribute type. It could be another DIE, a string or a number, a sequence of other values, or whatever value type is deemed best for representing a given piece of Dwarf.
We could for example obtain names of the formal parameters selected above:
$ 'entry ?TAG_subprogram !AT_declaration child ?TAG_formal_parameter @AT_name'
f
argc
argv
This could be used e.g. to select a particular attribute--we'll see later how to do this.
In is not an error to request value of attribute that a DIE doesn't
have. In such case, @AT_*
would simply not yield at all:
$ dwgrep ./tests/aranges.o -c -e 'entry @AT_data_member_location'
0
@AT_*
forms could actually also yield more than once. For example
attributes of location expression types yield once for every covered
address range:
$ dwgrep ./tests/aranges.o -c -e 'entry ?AT_location'
1
$ dwgrep ./tests/aranges.o -e 'entry @AT_location'
0x10000..0x10010:[0:reg5]
0x10010..0x1001a:[0:fbreg<-24>]
There's another use of this same feature: for attributes with reference form, we get the effect of traversing over the edge rooted at given attribute. For example, we could get types of formal parameters:
$ 'entry ?TAG_subprogram !AT_declaration child ?TAG_formal_parameter @AT_type'
[2d] structure_type
name (string) foo;
[... more attributes ...]
[65] base_type
byte_size (data1) 4;
encoding (data1) DW_ATE_signed;
name (string) int;
[ac] pointer_type
byte_size (data1) 8;
type (ref4) [b2];
That's quite a bit more useful--we could find out whether the formal parameters have structure types:
$ 'entry ?TAG_subprogram !AT_declaration child ?TAG_formal_parameter
@AT_type ?TAG_structure_type'
[2d] structure_type
name (string) foo;
[... more attributes ...]
So that will let us know whether there are any offenders like that. That's closer to being interesting, but not quite what we need either. We would like to know about the subprograms themselves, that have this property!
?(EXPR)
, !(EXPR)
--- Sub-expression assertions¶
Some Zwerg expressions are evaluated in what we call a sub-expression context. What happens in sub-expression context, stays there--the stack effects of sub-expression computation never leak back to the surrounding expression.
?(EXPR)
expression is one such case. It asserts that EXPR
produces at least one element. We can use it to get to DIE's that
have arguments that are structures:
$ 'entry ?TAG_subprogram !AT_declaration
?(child ?TAG_formal_parameter @AT_type ?TAG_structure_type)'
[35] subprogram
external (flag) true;
name (strp) blah;
decl_file (data1) /home/petr/proj/dwgrep/x.c;
decl_line (data1) 3;
prototyped (flag) true;
type (ref4) [65];
low_pc (addr) 0x10000;
high_pc (addr) 0x1000b;
frame_base (block1) 0..0xffffffffffffffff:[0:call_frame_cfa];
sibling (ref4) [65];
This asks whether, after going to types of children that are formal parameters, we get a structure. Because the initial two assertions have no stack effects anyway, we might say the same thing thus:
entry ?(?TAG_subprogram !AT_declaration
child ?TAG_formal_parameter @AT_type ?TAG_structure_type)'
The other sub-expression assertion, !(EXPR)
, holds if EXPR
produces no values at all. E.g. to select child-less DIE's in some
query, we would say:
some other query !(child)
EXPR == EXPR
, EXPR != EXPR
--- Infix assertions¶
As you might well know, mere presence of DW_AT_declaration
attribute doesn't tell use whether a DIE is a pure declaration. We
can probably safely assume that when a compiler produces that
attribute, it will have a value of true (and a form of
DW_FORM_flag_present
), so most of the time ?AT_declaration
(and !AT_declaration
) is all you need to write. But if there are
grounds for suspicion that this is not so, or if we simply want to
shield ourselves from the possibility, we need to actually look at
DW_AT_declaration
's value. So instead of ?AT_declaration, we
should be writing this:
(@AT_declaration == true)
This intuitively-looking construct actually deserves a closer
attention. Infix assertions are always evaluated in sub-expression
context. The mode of operation is that each side is evaluated
separately with the same incoming stack. Then if the assertion holds
for any pair of stacks that the two sides yield, the overall assertion
holds. Zwerg has a full suite of these operators--!=
, <
,
<=
, etc. There's also =~
and !~
for matching regular
expressions.
Importantly, infix assertions really are assertions. If they hold,
they produce unchanged incoming stack, otherwise they produce nothing
at all. Thus expressions such as ((A > B) == (C > D))
don't mean
what they seem to. This one for example is just ((A > B) (C >
D))
--i.e. two independent conditions. But consider for example this
snippet:
((A > B) != (C > D)) # WRONG!
If the two >
's hold, the expression reduces to !=
, or
inequality of two nops. Such assertion thus simply never holds
[4].
[4] | If for whatever reason you actually do need an
assertion that never holds, a simple one is !() . |
Precedence of comparison operators is lower than that of
concatenation, so you can write a couple words on each side of the
operator. For example, to look for DIE's where one of the location
expression opcodes is DW_OP_addr
, you could say:
entry (@AT_location elem label == DW_OP_addr)
Due to this precedence setting, comparisons are typically enclosed in
parens (as in the example), so that they don't force too much of your
computation into sub-expression context. The precedence is however
above ,
and ||
that are introduced further, so those need to
be parenthesized further if they should be a comparison operand.
For completeness sake, to check that a flag is false, you would use the following form:
!(@AT_declaration == true)
If there's no DW_AT_declaration
at a given DIE, the left hand side
of the inner expression doesn't yield anything, and the outer !()
succeeds--which is what we want, because flag absence is an implicit
false value. If the attribute is present, then the !()
effectively works as a logical negation. Contrast this with the
following:
(@AT_declaration == false) # WRONG!
You would be probably hard pressed to even find a Dwarf file that actually encodes false flags like this, so the above is useless.
Back to the problem at hand--besides DW_TAG_structure_type, we care
about DW_TAG_class_type
as well! We can express "and" easily
simply by juxtaposing the assertions, but we would like a way of
expressing "or" as well.
EXPR, EXPR
--- ALT-lists¶
An expression like EXPR₁, EXPR₂, ...
evaluates all constituent
EXPRₙ's with the same input, and then yields all values that each
EXPRₙ yields. If the expressions are assertions, this happens to
behave exactly like a logical or. So:
entry ?TAG_subprogram !AT_declaration
?(child ?TAG_formal_parameter @AT_type (?TAG_structure_type, ?TAG_class_type))
But the applicability is wider. Since the semantics are essentially those of a fork, one can for example ask whether an attribute has one of a number of values:
$ 'entry (@AT_name == ("argc", "argv"))'
[8f] formal_parameter
name (strp) argc;
[... more attributes ...]
[9d] formal_parameter
name (strp) argv;
[... more attributes ...]
EXPR || EXPR
--- OR-lists¶
An expression like EXPR₁ || EXPR₂ || ...
works differently. The
input stack is passed to EXPR₁ first, and anything that this yields,
is sent to output. But if nothing is yielded, the same input stack is
passed to EXPR₂. And so on. It thus yields whatever is yielded by
the first expression that actually yields anything. It therefore
operates in a manner similar to the operator ||
in C language.
The typical use would be in fallbacks. For example if we prefer
DW_AT_MIPS_linkage_name
to DW_AT_name
, but can make do with
the latter, that would be encoded as follows:
entry (@AT_MIPS_linkage_name || @AT_name)
For selecting structures and classes, we can use either of these two tools interchangeably.
So this is fine, but it still shows only functions that take structure (or class) arguments directly. But what if they take a const argument? Or if they take a typedef that evaluates to a structure? For these cases we need to keep peeling the fluff until we get to the interesting DIE's. Enter iterators:
EXPR*
, EXPR+
, EXPR?
--- expression iteration¶
EXPR*
yields the incoming stack unchanged, and also passes it to EXPR. It then collects whatever EXPR yields, yields it, and sends again to EXPR. It works similarly to*
in regular expressions.EXPR+
is exactly likeEXPR EXPR*
.EXPR?
is exactly like(, EXPR)
--- it may apply once
We can use this tool to remove DW_TAG_const_type
,
DW_TAG_volatile_type
and DW_TAG_typedef
layers from our
potential structure:
entry ?TAG_subprogram !AT_declaration
?(child ?TAG_formal_parameter
@AT_type ((?TAG_const_type, ?TAG_volatile_type, ?TAG_typedef) @AT_type)*
(?TAG_structure_type, ?TAG_class_type))
Next on, we would like to write a message:
Literals, Strings, Formatting¶
Zwerg has roughly C-like string literals, using \ as an escape character. Hello world program looks like this in Zwerg:
"Hello, world!"
This is an example of a string literal. Literals in Zwerg add
themselves to the stack. There are many types of literals--apart from
strings and usual numeric literals, dwrgep knows about all the named
Dwarf constants--e.g. DW_AT_name
, DW_TAG_array_type
,
DW_FORM_flag
, etc. are all valid forms [5].
[5] | That dwarf constants are recognized is actually not
a feature of Zwerg per se. It's the way that dwgrep
wrapper sets up Zwerg query engine. |
Like C printf, string literals in Zwerg allow formatting directives. To write a nice error message for our running example, we could do for example this:
$ 'entry ?TAG_subprogram !AT_declaration
?(child ?TAG_formal_parameter
@AT_type ((?TAG_const_type, ?TAG_volatile_type, ?TAG_typedef) @AT_type)*
(?TAG_structure_type, ?TAG_class_type))
"%s: one of the parameters has non-trivial type."'
[35] subprogram: one of the parameters has non-trivial type.
It's clear that we'd like to improve on this a bit. We'd like to mention which parameter it is, and we'd like to tell the user the name of the function, not just a DIE offset. We'll address both--later. But first, a bit of background.
When dwgrep sees a string with formatting directives, it converts it
into a function. That function takes one value for each %s
,
substitutes the %s
with values of corresponding stack values, and
then pushes the result to stack. Consequently, to convert anything to
a string in dwgrep, you would just say:
"%s"
E.g.:
$ 'entry ?TAG_subprogram "%s"'
[35] subprogram
[6c] subprogram
When there are more formatting directives, each of them takes one value from the stack, in order from right to left:
$ dwgrep '1 2 "%s %s"'
1 2
We could get the desired format string improvements with these tools in our hands already. But there's a bit of syntax that will make our job easier still.
let X := EXPR;
--- name binding¶
Often you need to refer back to a value that was computed earlier.
Since this is a stack machine, one way to do this is to use stack
shuffling words--dup
, swap
, rot
, over
and drop
--to
move stuff around the way you need it. But keeping track of what is
where when gets old quickly. For this reason, Zwerg allows that you
give value a name. Later on, when that name is mentioned, it acts as
a function that pushes the bound value to stack.
In an expression such as let X Y := EXPR;
, EXPR is evaluated in
a sub-expression context. Top of stack is bound to name Y, and the
value below that to X, and so on in this fashion if there are more
names. E.g.:
$ dwgrep 'let A := 1;
A A add'
2
$ dwgrep 'let A B := 10 2;
A B div'
5
Let's use this tool to first remember the two values that we care about: the subprogram (S) and its naked structure parameter (P):
$ dwgrep ./tests/nontrivial-types.o -f /dev/stdin <<EOF
let S := entry ?TAG_subprogram !AT_declaration;
let P := S child ?TAG_formal_parameter
?(@AT_type ((?TAG_const_type,
?TAG_volatile_type, ?TAG_typedef) @AT_type)*
(?TAG_structure_type, ?TAG_class_type));
S P "%s: %s has non-trivial type."
EOF
---
[35] subprogram: [58] formal_parameter has non-trivial type.
<Dwarf "./tests/nontrivial-types.o">
Which is not too shabby, but having to keep track of which %s
takes which value is perhaps not too comfortable. For that reason,
Zwerg allows splicing of expressions in strings.
%( EXPR %)
--- format string splicing¶
In format strings, code between %(
and the matching %)
is
evaluated in plain context, after which TOS of the result is popped
and inserted in place of the %(...%)
. %s
is then exactly
equivalent to %(%)
.
With this tool, we can make the formatting string clearer:
$ dwgrep ./tests/nontrivial-types.o -f /dev/stdin <<EOF
let S := entry ?TAG_subprogram !AT_declaration;
let P := S child ?TAG_formal_parameter
?(@AT_type ((?TAG_const_type,
?TAG_volatile_type, ?TAG_typedef) @AT_type)*
(?TAG_structure_type, ?TAG_class_type));
"%( S %): %( P %) has non-trivial type."
EOF
---
[35] subprogram: [58] formal_parameter has non-trivial type.
<Dwarf "./tests/nontrivial-types.o">
But the actual output is still not very nice. Ideally we'd mention names and source code coordinates instead of Dwarf offsets and tag names. But with splicing, that's actually quite easy to achieve:
$ dwgrep ./tests/nontrivial-types.o -f /dev/stdin <<"EOF"
let S := entry ?TAG_subprogram !AT_declaration;
let P := S child ?TAG_formal_parameter
?(@AT_type ((?TAG_const_type,
?TAG_volatile_type, ?TAG_typedef) @AT_type)*
(?TAG_structure_type, ?TAG_class_type));
"%( S @AT_decl_file %): %( S @AT_decl_line %): note: in function "\
"`%( S @AT_name %)', parameter `%( P @AT_name %)' type is not trivial."
EOF
---
/home/petr/proj/dwgrep/x.c: 3: note: in function `blah', parameter `f' type is not trivial.
<Dwarf "./tests/nontrivial-types.o">
The message here is already fairly decent, the only thing making it ugly is that we actually yield a two-value stack. We'll deal with this next.
One thing to note here though is the string continuation syntax. Note
how the formatting string is split into two fragments. The former one
then ends with "\
instead of the customary "
, which is a
signal to the lexer that it should concatenate the two fragments
together before handing them further. For all intents and purposes,
these two fragments form a single string literal.
If we are paranoid, we can guard against missing @AT_decl_file
and
@AT_decl_line
. This is actually fairly important, because
requesting a missing attribute is not an error, but merely causes the
computation to stop. If, say, @AT_decl_line
weren't available,
the computation would be silently dropped--right at the point where we
had an offender and were ready to report them. So let's change the
formatting string thus:
"%( S @AT_decl_file || "???" %): %( S @AT_decl_line || "???" %): "\
"note: in function `%( S @AT_name %)', "\
"parameter `%( P @AT_name %)' type is not trivial."
Note how you can use string literals inside %( %)
inside
formatting strings. Not that it would be a good idea to nest layers
and layers of strings, but in principle it is possible, and for a
quick default like this, there's no harm.
Now to get rid of the Dwarf value that's occupying our bottom stack slot. The simplest approach is to drop the value at the point where we don't need it anymore:
$ dwgrep ./tests/nontrivial-types.o -f /dev/stdin <<"EOF"
let S := entry ?TAG_subprogram !AT_declaration;
let P := S child ?TAG_formal_parameter
?(@AT_type ((?TAG_const_type,
?TAG_volatile_type, ?TAG_typedef) @AT_type)*
(?TAG_structure_type, ?TAG_class_type));
drop
"%( S @AT_decl_file || "???" %): %( S @AT_decl_line || "???" %): "\
"note: in function `%( S @AT_name %)', "\
"parameter `%( P @AT_name %)' type is not trivial."
EOF
/home/petr/proj/dwgrep/x.c: 3: note: in function `blah', parameter `f' type is not trivial.
But there's one more way to make this work, and it would allow us to introduce another Zwerg feature.
(|X Y| EXPR)
--- scoped bindings¶
This expression introduces a function that takes one parameter for each name mentioned between the pipes, then passes the remaining stack to EXPR, which is evaluated in plain context. When the bound names are mentioned within EXPR, they recall the bound values. E.g.:
$ dwgrep '1 (|A| A A add)'
2
$ dwgrep '10 2 (|A B| A B div)'
5
$ dwgrep '1 (|A| A A add (|A| A A add))'
4
If we enclose the whole expression into a scope, we can drop the Dwarf from the stack where we don't need it, but still keep it around as a name:
$ dwgrep ./tests/nontrivial-types.o -f /dev/stdin <<"EOF"
(|D|
let S := D entry ?TAG_subprogram !AT_declaration;
let P := S child ?TAG_formal_parameter
?(@AT_type ((?TAG_const_type,
?TAG_volatile_type, ?TAG_typedef) @AT_type)*
(?TAG_structure_type, ?TAG_class_type));
"%( S @AT_decl_file || "???" %): %( S @AT_decl_line || "???" %): "\
"note: in function `%( S @AT_name %)', "\
"parameter `%( P @AT_name %)' type is not trivial."
)
EOF
/home/petr/proj/dwgrep/x.c: 3: note: in function `blah', parameter `f' type is not trivial.
So, that's it. This was a quick tour through the interesting parts of
dwgrep
. You may now want to check out Syntax, and
Core vocabulary or Dwarf vocabulary to learn about
the actual function words that you can use.