Introduction
============
This is the fifth of what will hopefully be many summaries of what's
been going on in the world of PyPy in the last week. I'd still like
to remind people that when something worth summarizing happens to
recommend if for "This Week in PyPy" as mentioned on:
http://codespeak.net/pypy/dist/pypy/doc/weekly/
where you can also find old summaries. I note in passing that the
idea of keeping track of IRC conversations in the weekly summary has
pretty much fizzled. Oh well.
There were about 230 commits to the pypy section of codespeak's
repository in the last week (a busy one, it seems :-).
SomePBC-refactoring
===================
We merged the branch at last! Finishing the branch off and getting
translate_pypy running again seemed to mostly involve fighting with
memoized functions and methods, and the "strange details" hinted at in
the last "This Week in PyPy" were not so bad -- indeed once we got to
the point of rtyping finishing, the backend optimizations, source
generation, compilation and resulting binary all worked first time
(there must be something to be said for this Test Driven Development
stuff :).
If you recall from the second This Week in PyPy the thing that
motivated us to start the branch was wanting to support multiple
independent object spaces in the translated binary. After three weeks
of refactoring we hoped we'd made this possible... and so it proved,
though a couple of small tweaks were needed to the PyPy source. The
resulting binary is quite a lot (40%) bigger but only a little (10%)
slower.
CCC papers
==========
As mentioned last week, two PyPy talks have been accepted for the
Chaos Communication Congress. The CCC asks that speakers provide
papers to accompany their talks (they make a proceedings book) so
that's what we've done, and the results are two quite nice pieces of
propaganda for the project:
http://codespeak.net/pypy/extradoc/talk/22c3/agility.pdfhttp://codespeak.net/pypy/extradoc/talk/22c3/techpaper.pdf
It's still possible to attend the conference in Berlin, from December
27th to the 30th:
http://events.ccc.de/congress/2005
A number of PyPy people will be around and innocently mixing with
people from other communities and generally be available for
discussing all things PyPy and the future.
Background EU-related work
==========================
Less visible but still requiring work, organisations funding and
organizing the EU PyPy project are currently preparing a lot of
paperwork and reports. Most of the reports are mostly done by now but
the next Göteborg sprint will start with two (insider only) days of
dotting the 'i's and crossing the 't's. Let's all hope that
everything goes well at our first major EU review at the end of
January.
Meanwhile, Holger was invited to give a talk about PyPy's technical
organisation at a workshop given by the german EU office on the 5th of
December. Also, Bea, Alastair and Holger will talk about PyPy at an
EU workshop on the 8th of December in Bruxelles. Hopefully, this will
enable us to find more opportunities to get PyPy recognized as an
interesting "live" project in the EU's corner of the world.
Where did PyPy-sync go?
=======================
What's a pypy-sync meeting? Apparently::
It's an XP-style meeting that serves to synchronize
development work and let everybody know who is
working on what. It also serves as a decision
board of the PyPy active developers. If discussions
last too long and decisions cannot be reached
they are delegated to a sub-group or get postponed.
pypy-sync meetings usually happen on thursdays at 1pm CET on the
#pypy-sync IRC channel on freenode, with an agenda prepared beforehand
and minutes posted to pypy-dev after the meeting. Except that the
last couple haven't really happened this way -- no agenda and only a
few people have turned up and mostly just the people who are in #pypy
all week anyway.
So after the Göteborg sprint next week we're going to try harder to
prepare and get developers to attend pypy-sync meetings again. This
is especially important as we head towards more varied and less
intrinsically related challenges such as a JIT compiler, integration
of logic programming, GC, higher level backends and much more.
--
Check out the comments in this source file that start with:
# Oh, lord help us.
-- Mark Hammond gets to play with the Outlook object model
I started Pascal backend for a while now. Have a look at:
http://sparcs.kaist.ac.kr/~tinuviel/pypy/
Resulting Pascal source can be compiled with FPC(Free Pascal Compiler),
available from http://www.freepascal.org/
What do you think? Also, if I understood correctly, translation part
is undergoing much change in the branch. What do I need to change?
Seo Sanghyeon
I'm a bit disorganized here, but I don't think delaying will help much
as my brain is turning to mush. Nothing ventured...
Notes for description of value annotator
Iterative. Allows examination of individual steps
Stable. Produces result in finite amount of time
Order Insensitive. Always produces the same result, no matter what
order in which it processes the sourcecode
Specialization of builtins. All functions are given a "space depth".
Multiple calls (at different lines of code) to the same function will,
if it has a deeper depth, create separate copies of the function in
their own "space". If those functions then call a function of a
shallower depth they will use a shared function (rather than
continuing to specialize it). The result is that separate calls to
builtin functions (such as the plus operator) will get separate
annotations, but without risk of an infinite loop in the annotation
process.
Values. A Value is a set (in the mathematical sense) of possible
values, be they integer, float, string, or whatever. In practice it
must store an approximation so that a set of 0..2**64-1 can be
represented. It is acceptable for the set to be expanded to include
values that are not really possible (as the program will still behave
correctly), but not acceptable to be smaller. Note however that as an
iterative process is used to determine values, it will initially be to
small, and later replaced with new Values that cover the full range.
Hypothetical value expansions. A mechanism is needed to expand values
to their full range quickly, rather than iterate it one step at a
time. To do this an integer_add will notice on the second pass (not
the first!) that it had a previous possible output, and that the new
output appears to create a sequence from it. A new value set is
created as normal (increased by only a single step), except a
"hypothetical" part is added to it. The hypothetical contains a Value
set with the predicted sequence of values (not including any already
known), as well as a reference to the node/space that created it. The
value is then set as the new output for the integer_add node and is
propagated through the rest of the code. Other nodes will modify the
hypothetical as if it was a normal value set, including putting a cap
on the the highest value in the case of an if branch, but they will
not explore new code as a result of it. If the hypothetical reaches
the node/space that created it then this indicates there is a loop
which allows the value to expand over the whole range, and the node
will convert it into a normal value (including any caps that were
discovered).
Branches and bool replacement Values. To effect a cap on the range of
a value the bool value produced by lessthan and similar operations
will contain a pair reduced Value associated with the input variables.
If the bool is used for a branch then either the "True" or "False"
version of the reduced Value associated with the bool's original input
variable will be used to replace the variable thereafter. Note that a
form of Single Static Assignment is used to allow a variable to have
different values at different points in a function.
Comment on SSA and aliases. As the branch that reduces a variable's
range may happen outside the function itself, it is necessary to make
the SSA alias lists globally available and allow modification at any
point. This means they must be copied to every node, with the
resulting cost being quadratic (O(n**2)). However, as most of the
copies would have identical contents I believe it would be possible to
optimize this and get performance that is closer to linear (O(n)).
Finally, a note on what sorts of code this is aimed at. Anything
where all the code can be determined at compiletime, ie no
eval/exec/__import__/reload. Some analysis of code that does use
those features would still be possible however.
Spaces diagram:
http://members.shaw.ca/rhamph/python-annvalue/spaces.pdf
Values diagram:
http://members.shaw.ca/rhamph/python-annvalue/values.pdf
Partially annotated test "program":
http://members.shaw.ca/rhamph/python-annvalue/annotation.pdf
My sourcecode (requires autopath.py):
http://members.shaw.ca/rhamph/python-annvalue/basicblock.py
--
Adam Olsen, aka Rhamphoryncus