[pypy-dev] Full Python Annotation
rhamph at gmail.com
Fri Nov 25 18:13:25 CET 2005
I'm a bit disorganized here, but I don't think delaying will help much
as my brain is turning to mush. Nothing ventured...
Notes for description of value annotator
Iterative. Allows examination of individual steps
Stable. Produces result in finite amount of time
Order Insensitive. Always produces the same result, no matter what
order in which it processes the sourcecode
Specialization of builtins. All functions are given a "space depth".
Multiple calls (at different lines of code) to the same function will,
if it has a deeper depth, create separate copies of the function in
their own "space". If those functions then call a function of a
shallower depth they will use a shared function (rather than
continuing to specialize it). The result is that separate calls to
builtin functions (such as the plus operator) will get separate
annotations, but without risk of an infinite loop in the annotation
Values. A Value is a set (in the mathematical sense) of possible
values, be they integer, float, string, or whatever. In practice it
must store an approximation so that a set of 0..2**64-1 can be
represented. It is acceptable for the set to be expanded to include
values that are not really possible (as the program will still behave
correctly), but not acceptable to be smaller. Note however that as an
iterative process is used to determine values, it will initially be to
small, and later replaced with new Values that cover the full range.
Hypothetical value expansions. A mechanism is needed to expand values
to their full range quickly, rather than iterate it one step at a
time. To do this an integer_add will notice on the second pass (not
the first!) that it had a previous possible output, and that the new
output appears to create a sequence from it. A new value set is
created as normal (increased by only a single step), except a
"hypothetical" part is added to it. The hypothetical contains a Value
set with the predicted sequence of values (not including any already
known), as well as a reference to the node/space that created it. The
value is then set as the new output for the integer_add node and is
propagated through the rest of the code. Other nodes will modify the
hypothetical as if it was a normal value set, including putting a cap
on the the highest value in the case of an if branch, but they will
not explore new code as a result of it. If the hypothetical reaches
the node/space that created it then this indicates there is a loop
which allows the value to expand over the whole range, and the node
will convert it into a normal value (including any caps that were
Branches and bool replacement Values. To effect a cap on the range of
a value the bool value produced by lessthan and similar operations
will contain a pair reduced Value associated with the input variables.
If the bool is used for a branch then either the "True" or "False"
version of the reduced Value associated with the bool's original input
variable will be used to replace the variable thereafter. Note that a
form of Single Static Assignment is used to allow a variable to have
different values at different points in a function.
Comment on SSA and aliases. As the branch that reduces a variable's
range may happen outside the function itself, it is necessary to make
the SSA alias lists globally available and allow modification at any
point. This means they must be copied to every node, with the
resulting cost being quadratic (O(n**2)). However, as most of the
copies would have identical contents I believe it would be possible to
optimize this and get performance that is closer to linear (O(n)).
Finally, a note on what sorts of code this is aimed at. Anything
where all the code can be determined at compiletime, ie no
eval/exec/__import__/reload. Some analysis of code that does use
those features would still be possible however.
Partially annotated test "program":
My sourcecode (requires autopath.py):
Adam Olsen, aka Rhamphoryncus
More information about the Pypy-dev