Accessing the PyPy list via gmane, I have not seen anything for at least a couple of weeks. Is this a reflection of true dormancy? (If so, I will presume it is just temporary.) Terry J. Reedy
Hi Terry, [Terry Reedy Fri, Sep 24, 2004 at 03:06:19PM -0400]
Accessing the PyPy list via gmane, I have not seen anything for at least a couple of weeks. Is this a reflection of true dormancy? (If so, I will presume it is just temporary.)
Indeed, the last pypy-dev mails are from end of August as far as i see. There have been some commits and off-list communication going on, though. Also, we had some discussions on our pypy-funding list regarding EU funding. It seems possible now that PyPy will get funded starting from 1st of November which would clearly allow many of us to work more on pypy. However, Armin has been making quite some progress on a direct C-backend (not based on our Pyrex-approach) lately which reflects on the pypy-svn list. I am sure he is happy to answer questions or talk about the state of how this is going if someone asks him. Hum, hey Armin, what is the current state of the C backend? :-) Moreover, it's not unlikely that the next PyPy sprint will take place in Vilnius, Lithunia, with the help of the POV ("Programmers of Vilnius") people. The projected date is around 15th of November till 21st of November 2004. Btw, we are always very interested in places and possibilities to do coding sprints (not neccessarily in Europe!). Most of the current code base has been developed in sprints so far and it's fun events where one learns a lot. When all goes well i hope that we will have monthly update reports and I guess the key point will be to really put out releases for people to play with. cheers, holger
Hi Terry, hi Holger, hi everyone else, On Fri, Sep 24, 2004 at 09:25:17PM +0200, holger krekel wrote:
he is happy to answer questions or talk about the state of how this is going if someone asks him. Hum, hey Armin, what is the current state of the C backend? :-)
Hum, it is a piece of experimental code that has grown quite large. Here is a large e-mail to explain it and why I'd rather like it to be smaller. It produces a C extension module. The functions' body are, literally, basic blocks in the C sense, with labels, and jumps to each other, following directly the control flow graph's structure. Additionally, the end of each function contains code that decrefs the variables that need to be decrefed in case of error; each operation that can fail will, in case of failure, jump to some label there. The kind of objects supported is enumerated in genc_repr.py: ints, generic PyObjects, tuples, classes, instances, lists, function pointers, method pointers. It all works reasonably well. For example, instances can have both instance and (read-only) class attributes. Methods are just a particular case of class attributes, as in regular Python. (Class attributes are difficult to do in Pyrex; it was one of the motivations to switch to C.) But at the same time I am not fully satisfied with the C backend. I have both small and big concerns. The biggest concern is about its overall structure. It is really like a classical compiler, taking graph blocks, analysing the operations inside, using the annotations computed previously as a guide; then it produces lower-level operations with explicit conversions, and finally a separate pass turns these into real C code. Here is where all this occurs: * typer.py turns the block's SpaceOperations into low-level ops * genc_typeset.py defines which low-level ops exist * genc_op.py classes to write the C code for each low-level op * genc_repr.py maps block's Variables to C types * genc.py calls all other modules; writes the C module * classtyper.py maps user-defined classes to structs Additionally, genc.h is inserted verbatim at the beginning of the C module; it contains macros to do the common operations. These macro definitions are also parsed (!) by genc_typeset.py, so that typer.py can know about their existence. In some sense it is fragile: it depends on the annotations being generated exactly as expected. The (earlier) annotation phase analyses the SpaceOperations and somehow promizes that there is a way to do the given operation and guarantee that it produces some results; e.g. an 'add' when applied to two SomeInteger()s gives another SomeInteger(). But then genc_typeset.py or genc.h must make sure that it actually implements an operation with that signature. There is some duplication there. For example the backend currently expects that a list object is never converted, i.e. if an object is created as a "list of ints" then it will remain a "list of ints" for its whole life. The annotations currently produced have this property, but it's kind of accidental. For the C backend itself it is an implicit external assumption. We could devise by hand annotations that crash the C backend although they are a priori reasonable. Well, another concern is that typer.py is quite confusing and was difficult to get right. I'm not too sure that the code that generates the Py_DECREF() after an error will decref all the correct variables in all cases. Something I don't even dare speak about too much is the way a Variable in the flow graphs maps to potentially a list of unrelated C variables (or fields in a struct). For example, a Variable annotated as a SomeTuple() of two integers will become two distinct C variables. And Variables that are sufficiently constant become zero C variables! This is a nice idea but it makes typer.py, genc_typeset.py and genc_op.py all the more obscure. Of course, I'm always thinking about reorganizing it all... The latest such idea was inspired by Seo who, for the Lisp backend, put in transform.py some code that actually modifies the control flow graph before it is passed to the backend. He replaces the 'newlist' and 'mul' operations corresponding to an expression '[a] * b' with a single custom operation, 'alloc_and_set'. After a lot of unsuccessful efforts in implementing 'list += list' in the C backend I remembered this idea and now transform.py turns a list-based 'inplace_add' into a whole new bunch of control flow blocks: # a = inplace_add(b, c) # becomes the following graph: # # clen = len(c) # growlist(b, clen) # ensure there is enough space for clen new items # | # | (pass all variables to next block, plus i=0) # V # ,--> z = lt(i, clen) # | exitswitch(z): # | | | False # | | True `------------------> ...sequel... # | V # | x = getitem(c, i) # | fastappend(b, x) # | i1 = add(i, 1) # | | # `-----' (pass all variables, with i=i1) So now the backend only has to worry about the two simple operations 'growlist' and 'fastappend'. The latter is an append where we can assume that there is already enough preallocated space. The graph transformation code itself is quite verbose, but with more developed utility routines it could be made simpler. The point is that it looks like a good idea to perform as many optimizations and transformations on the flow graph itself before passing it to the backend. (With hindsight it's obvious.) So I'm now thinking about how more of the typer.py mess could be moved there. One extreme idea would be to say that the flow graph should be transformed much more, step by step, until it eventually contains only operations that have an obvious direct C equivalent. This would make the C backend much simpler again (and also make simple non-C backends fun and quick to implement). This would include typing the operations: the 'add' would be reserved for 'add two PyObject*'; another 'add_i' would add two integers. Conversion operations would be inserted in the flow graph as needed. So typing would be more tightly coupled with the annotation phase, which I think is a good idea. Essentially, the same code that says that adding two SomeIntegers() produces a SomeInteger() would say that to do so the correct operation is 'add_i'. And the code that says that by default 'add' produces a SomeObject() would say that this requires the two inputs to be converted to PyObject*. That's all for now... A bientot, Armin
participants (3)
-
Armin Rigo
-
hpk@trillke.net
-
Terry Reedy