
I'm a bit disorganized here, but I don't think delaying will help much as my brain is turning to mush. Nothing ventured... Notes for description of value annotator Iterative. Allows examination of individual steps Stable. Produces result in finite amount of time Order Insensitive. Always produces the same result, no matter what order in which it processes the sourcecode Specialization of builtins. All functions are given a "space depth". Multiple calls (at different lines of code) to the same function will, if it has a deeper depth, create separate copies of the function in their own "space". If those functions then call a function of a shallower depth they will use a shared function (rather than continuing to specialize it). The result is that separate calls to builtin functions (such as the plus operator) will get separate annotations, but without risk of an infinite loop in the annotation process. Values. A Value is a set (in the mathematical sense) of possible values, be they integer, float, string, or whatever. In practice it must store an approximation so that a set of 0..2**64-1 can be represented. It is acceptable for the set to be expanded to include values that are not really possible (as the program will still behave correctly), but not acceptable to be smaller. Note however that as an iterative process is used to determine values, it will initially be to small, and later replaced with new Values that cover the full range. Hypothetical value expansions. A mechanism is needed to expand values to their full range quickly, rather than iterate it one step at a time. To do this an integer_add will notice on the second pass (not the first!) that it had a previous possible output, and that the new output appears to create a sequence from it. A new value set is created as normal (increased by only a single step), except a "hypothetical" part is added to it. The hypothetical contains a Value set with the predicted sequence of values (not including any already known), as well as a reference to the node/space that created it. The value is then set as the new output for the integer_add node and is propagated through the rest of the code. Other nodes will modify the hypothetical as if it was a normal value set, including putting a cap on the the highest value in the case of an if branch, but they will not explore new code as a result of it. If the hypothetical reaches the node/space that created it then this indicates there is a loop which allows the value to expand over the whole range, and the node will convert it into a normal value (including any caps that were discovered). Branches and bool replacement Values. To effect a cap on the range of a value the bool value produced by lessthan and similar operations will contain a pair reduced Value associated with the input variables. If the bool is used for a branch then either the "True" or "False" version of the reduced Value associated with the bool's original input variable will be used to replace the variable thereafter. Note that a form of Single Static Assignment is used to allow a variable to have different values at different points in a function. Comment on SSA and aliases. As the branch that reduces a variable's range may happen outside the function itself, it is necessary to make the SSA alias lists globally available and allow modification at any point. This means they must be copied to every node, with the resulting cost being quadratic (O(n**2)). However, as most of the copies would have identical contents I believe it would be possible to optimize this and get performance that is closer to linear (O(n)). Finally, a note on what sorts of code this is aimed at. Anything where all the code can be determined at compiletime, ie no eval/exec/__import__/reload. Some analysis of code that does use those features would still be possible however. Spaces diagram: http://members.shaw.ca/rhamph/python-annvalue/spaces.pdf Values diagram: http://members.shaw.ca/rhamph/python-annvalue/values.pdf Partially annotated test "program": http://members.shaw.ca/rhamph/python-annvalue/annotation.pdf My sourcecode (requires autopath.py): http://members.shaw.ca/rhamph/python-annvalue/basicblock.py -- Adam Olsen, aka Rhamphoryncus

Hi Adam, On Fri, Nov 25, 2005 at 10:13:25AM -0700, Adam Olsen wrote:
Notes for description of value annotator
I know that you expect some answer, but there is really not much that we can discuss about so far. The notes you present are interesting but they don't really touch the problems particular to Python. This is similar to existing approaches (Samuele posted some links on IRC), with aggressive specialization to get very precise results like integer ranges. I agree that, done right, this range analysis can prove that many usages of integers in a program cannot actually overflow; I've seen papers that got good results because user programs typically looked like: i = 0 while i < len(container): ... i += 1 The addition here cannot overflow, because i < len(container) <= sys.maxint so that i+1 <= sys.maxint. The problem, though, is that this kind of consideration is not what we are interested in for PyPy. A more interesting problem to consider would be the unusual Python object model, where the position in the class hierarchy of instance attributes is not declared. However, we already solved this one in our annotator. If you target "full Python" (whatever is really meant by that), there are many problems that we have mentioned to you already but that you don't seem to start worrying about (and no, "exec" alone is not a problem). The more fundamental problems lie in that, on the one hand, real-world programs build some of their own structure indirectly, and on the other hand you cannot be sure that nobody is playing tricks. There are just too many tricks in Python to ensure that any type approximation can be completely safe. You would have to resort to expected-case analysis and run-time guards or some kind of callbacks invalidating code when things go out of hand at run-time. None of that is impossible, but none of that is straightforward either -- more precisely, I think that this would easily swallow many "man-years" of work. While *still* not being what PyPy is about, discussing this kind of Python-oriented issues here wouldn't be completely off-topic :-) Instead of doing this, in PyPy we used the practical approach of RPython to develop PyPy itself in, and then we are about to work on the JIT for user code - "full Python" user code, as in "I type python xyz.py and I get the expected result". A bientot, Armin
participants (2)
-
Adam Olsen
-
Armin Rigo