Mailman 3 ast-objects branch created - Python-Dev

ast-objects branch created

"Martin v. Löwis"

1 Dec 2005 1 Dec '05

4:42 a.m.

I created http://svn.python.org/projects/python/branches/ast-objects/ You can convert your repository to that branch with svn switch svn+ssh://pythondev@svn.python.org/python/branches/ast-objects in the toplevel directory. In particular, this features http://svn.python.org/projects/python/branches/ast-objects/Parser/asdl_c.py http://svn.python.org/projects/python/branches/ast-objects/Include/Python-as... http://svn.python.org/projects/python/branches/ast-objects/Python/Python-ast... The status is currently this: - asdl_c generates a type hierarchy: "Sum" productions have one type per constructor, inheriting from a type for the sum; plain products only have a type for the product. - attributes are in the base type, accessible through o->_base.attr; projections of the product types are accessible directly through member names. - all projections must be non-NULL. Sequences are represented through potentially empty lists; optional types are potentially represented through Py_None. bool is gone; use Py_True/Py_False. The only primitive type remaining is int (which only occurs in lineno) - the types currently have only a constructor, a dealloc function, and an _Check macro. - Naming is this: for cross-object-file visible symbols (functions and global variables), a Py_ prefix is used. Otherwise, I use the type name or constructor name directly. There is a #define for the Py_<type>_New function, so you can also write <type>(params). Parameter order for the types is: projections first, then attributes. - For compatibility with the current code, the Sum base types also have the _kind enumeration (although that appears not to get initialized right now). For asdl_c, I see the following things as TODOs: - add support for traversing the types from C, through tp_members (read-only). Optionally add support for pickling. - add support for garbage collection. I don't expect this to be necessary right now, but will be if the API is exposed, and it doesn't cost much. The bigger chunk of necessary changes is in using these, starting with ast.c. Feel free to commit any changes to that branch that you consider helpful. To avoid duplicated work, posting a note here might also help. Regards, Martin

Show replies by date

Neal Norwitz

1 Dec 1 Dec

12:10 p.m.

On 11/30/05, "Martin v. Löwis" wrote:

...

The bigger chunk of necessary changes is in using these, starting with ast.c.

I got a few more files to compile. The following files (all under Python/) need some loving care and are looking for a kind soul to adopt them: ast.c, compile.c, future.c, symtable.c Of these, future.c is by far the easiest to get compiling. n

Brett Cannon

1:10 p.m.

On 11/30/05, "Martin v. Löwis" wrote:

...

I created

http://svn.python.org/projects/python/branches/ast-objects/

You can convert your repository to that branch with

svn switch svn+ssh://pythondev@svn.python.org/python/branches/ast-objects

If you would rather do a separate checkout, do svn checkout svn+ssh://pythondev@svn.python.org/python/branches/ast-objects If you want a read-only checkout, see the newly updated entry on checking out projects in the dev FAQ at http://www.python.org/dev/devfaq.html#how-do-i-get-a-checkout-of-the-reposit... . -Brett

Jeremy Hylton

6:41 p.m.

Martin, I'm not sure what your intent for this work is, but I'd like to create a parallel arena branch and compare the results. I'll start work on that tomorrow. Jeremy On 11/30/05, "Martin v. Löwis" wrote:

...

I created

http://svn.python.org/projects/python/branches/ast-objects/

You can convert your repository to that branch with

svn switch svn+ssh://pythondev@svn.python.org/python/branches/ast-objects

in the toplevel directory. In particular, this features

http://svn.python.org/projects/python/branches/ast-objects/Parser/asdl_c.py http://svn.python.org/projects/python/branches/ast-objects/Include/Python-as... http://svn.python.org/projects/python/branches/ast-objects/Python/Python-ast...

The status is currently this: - asdl_c generates a type hierarchy: "Sum" productions have one type per constructor, inheriting from a type for the sum; plain products only have a type for the product. - attributes are in the base type, accessible through o->_base.attr; projections of the product types are accessible directly through member names. - all projections must be non-NULL. Sequences are represented through potentially empty lists; optional types are potentially represented through Py_None. bool is gone; use Py_True/Py_False. The only primitive type remaining is int (which only occurs in lineno) - the types currently have only a constructor, a dealloc function, and an _Check macro. - Naming is this: for cross-object-file visible symbols (functions and global variables), a Py_ prefix is used. Otherwise, I use the type name or constructor name directly. There is a #define for the Py_<type>_New function, so you can also write <type>(params). Parameter order for the types is: projections first, then attributes. - For compatibility with the current code, the Sum base types also have the _kind enumeration (although that appears not to get initialized right now).

For asdl_c, I see the following things as TODOs: - add support for traversing the types from C, through tp_members (read-only). Optionally add support for pickling. - add support for garbage collection. I don't expect this to be necessary right now, but will be if the API is exposed, and it doesn't cost much.

The bigger chunk of necessary changes is in using these, starting with ast.c.

Feel free to commit any changes to that branch that you consider helpful. To avoid duplicated work, posting a note here might also help.

Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu

Neal Norwitz

2 Dec 2 Dec

12:16 a.m.

On 12/1/05, Jeremy Hylton wrote:

...

Martin,

I'm not sure what your intent for this work is, but I'd like to create a parallel arena branch and compare the results. I'll start work on that tomorrow.

I think this is a good thing. It will be much easier to compare implementations if we have some substantial code reflecting each technique. n

"Martin v. Löwis"

4:08 a.m.

Jeremy Hylton wrote:

...

I'm not sure what your intent for this work is, but I'd like to create a parallel arena branch and compare the results. I'll start work on that tomorrow.

I certainly want the PyObject* branch to become "life" at some time; I think this is the way to go, and that an arena approach is fundamentally flawed. That said: go ahead and create a branch. This is one of the things that subversion makes convenient, and it allows people to actually judge the results when we are done. I'm personally not worried about the duplicated work: if we actually carry out the experiment of multiple alternative (or perhaps supplementing) implementations, we have much better grounds to pick the approach for the mainline. Regards, Martin

Jeremy Hylton

5 Dec 5 Dec

11:07 a.m.

On 12/1/05, "Martin v. Löwis" wrote:

...

Jeremy Hylton wrote:

...
I'm not sure what your intent for this work is, but I'd like to create a parallel arena branch and compare the results. I'll start work on that tomorrow.

I certainly want the PyObject* branch to become "life" at some time; I think this is the way to go, and that an arena approach is fundamentally flawed.

I have implemented a version of the arena API that handles freeing memory in ast.c. It worked out rather like I expected, although I still haven't thought much about how it would extend to the rest of the compiler. It seems like the same approach should apply, although I think the primary concern was the complexity of memory management in ast.c. The way the arena approach works is to free all the AST nodes at the end of compilation. This approach isn't all that different than the one it replaced. In the trunk, there is a single call to free_mod() at the end of compilation and it recursively frees all the sub-objects. One way to think about the arena changes is to replace a set of recursive function calls based on the tree structure with a flat list of all AST nodes created during object creation. The real advantage is in the error cases, where all the memory gets freed even though all the nodes aren't attached to a single tree. Can you expand on why you think this approach is fundamentally flawed?

...

That said: go ahead and create a branch. This is one of the things that subversion makes convenient, and it allows people to actually judge the results when we are done. I'm personally not worried about the duplicated work: if we actually carry out the experiment of multiple alternative (or perhaps supplementing) implementations, we have much better grounds to pick the approach for the mainline.

Sure does. It seems like the code generation from the AST description also makes this kind of experimentation easier. Jeremy

"Martin v. Löwis"

12:21 p.m.

Jeremy Hylton wrote:

...

Can you expand on why you think this approach is fundamentally flawed?

I would expect that you allocate in the process memory that needs to outlive the arena, e.g. python strings. The fundamental problem is that the arena deallocation cannot know whether such memory exists, and what to do with it. So two things may happen: - you mistakenly allocate long-lived memory from the arena, and then discard it when you discard the arena. This gives you dangling pointer. The problem here is that at the allocation point, you may not know (yet) either whether this is going to survive the arena or not. - you allocate memory outside of the arena to survive it, and then something goes wrong, and you deallocate the arena. Yet, the outside memory remains garbage. IOW, there would be no problem if you were *completely* done when you throw away the arena. This is not the case, though: eventually you end up with byte code that need to persist.

...

Sure does. It seems like the code generation from the AST description also makes this kind of experimentation easier.

Indeed. I wish there was a way to generate ast.c from a transformation description as well. Regards, Martin

Jeremy Hylton

7:16 p.m.

On 12/5/05, "Martin v. Löwis" wrote:

...

Jeremy Hylton wrote: I would expect that you allocate in the process memory that needs to outlive the arena, e.g. python strings. The fundamental problem is that the arena deallocation cannot know whether such memory exists, and what to do with it.

I can see that problem occurring with an all-or-nothing solution, but not if you have the freedom to allocate from an arena or from some other mechanism. If there are multiple ways to allocate memory, there is some increased programming burden (you have to remember how each pointer was allocated) but you gain flexibility. The ast-arena branch allocates most memory from an arena, but allocates identifiers on the regular heap as PyObjects. It does keep a list of these PyObjects so that it can DECREF them later.

...

IOW, there would be no problem if you were *completely* done when you throw away the arena. This is not the case, though: eventually you end up with byte code that need to persist.

Right. The bytecode can't be allocated from the arena, but the AST can. The AST is an internal abstraction.

...

...
Sure does. It seems like the code generation from the AST description also makes this kind of experimentation easier.

Indeed. I wish there was a way to generate ast.c from a transformation description as well.

I'm sure there's a way to generate a parser from the the description, but that seemed like a bigger project than I wanted to tackle. GIven how long it took to finish the AST without a new parser, I think it was a wise choice :-). Jeremy

James Y Knight

9:22 p.m.

On Dec 5, 2005, at 8:46 AM, Jeremy Hylton wrote:

...

On 12/5/05, "Martin v. Löwis" wrote:

...
Jeremy Hylton wrote: I would expect that you allocate in the process memory that needs to outlive the arena, e.g. python strings. The fundamental problem is that the arena deallocation cannot know whether such memory exists, and what to do with it.

I can see that problem occurring with an all-or-nothing solution, but not if you have the freedom to allocate from an arena or from some other mechanism. If there are multiple ways to allocate memory, there is some increased programming burden (you have to remember how each pointer was allocated) but you gain flexibility. The ast-arena branch allocates most memory from an arena, but allocates identifiers on the regular heap as PyObjects. It does keep a list of these PyObjects so that it can DECREF them later.

ISTM that having to remember which pointers are arena-allocated and which are normally-refcounted-allocated removes the major gain that an arena method is supposed to bring: resistance to mistakes. I'd find having a single way to allocate and track memory easier than multiple. Then you just have to follow the single set of best practices for memory management, and you're all set. (and with PyObjects, the same practices the rest of python uses, another win.) I'd also like to parrot the concern others have had that if the AST nodes are not made of PyObjects, then a mirror hierarchy of PyObject- ified AST nodes will have to be created, which seems like quite a wasteful duplication. If it is required that there be a collection of AST python objects (which I think it is), is there really any good reason to make the "real" AST objects not be the _only_ AST objects? I've not heard one. James

Jeremy Hylton

10:06 p.m.

On 12/5/05, James Y Knight wrote:

...

On Dec 5, 2005, at 8:46 AM, Jeremy Hylton wrote:

...
I can see that problem occurring with an all-or-nothing solution, but not if you have the freedom to allocate from an arena or from some other mechanism. If there are multiple ways to allocate memory, there is some increased programming burden (you have to remember how each pointer was allocated) but you gain flexibility. The ast-arena branch allocates most memory from an arena, but allocates identifiers on the regular heap as PyObjects. It does keep a list of these PyObjects so that it can DECREF them later.

ISTM that having to remember which pointers are arena-allocated and which are normally-refcounted-allocated removes the major gain that an arena method is supposed to bring: resistance to mistakes. I'd find having a single way to allocate and track memory easier than multiple. Then you just have to follow the single set of best practices for memory management, and you're all set. (and with PyObjects, the same practices the rest of python uses, another win.)

It's a question of degree, right? If you can find a small number of rules that are easy to understand then you are still likely to avoid mistakes. For example, the current ast-arena branch uses two rules: All AST nodes are allocated from the arena. All PyObjects attached to an AST node (identifiers and constants) are associated with the arena, i.e. they are DECREFed when it is freed.

...

I'd also like to parrot the concern others have had that if the AST nodes are not made of PyObjects, then a mirror hierarchy of PyObject- ified AST nodes will have to be created, which seems like quite a wasteful duplication. If it is required that there be a collection of AST python objects (which I think it is), is there really any good reason to make the "real" AST objects not be the _only_ AST objects? I've not heard one.

The PyObject-ified AST nodes are only needed if user code requests an AST from the compiler. That is, if we add a new feature that exposes AST, we would need AST objects represented in Python code. I think this feature would be great to add, but it doesn't seem like a primary concern for the internal compiler implementation. There is no need for PyObject-ified AST objects in the internal compiler. (I think this fact is obvious, since the compiler exists but PyObject-ified AST objects don't.) The question, then, is the simplest way to provide Python code with access to the AST objects. I still think that a set of pure Python classes to represent the AST nodes is a good approach. You define a simple serialization format for ASTs and the serialized AST can be passed from the interpreter to user code and back. The user code gets a mutable tree of AST nodes that it can reserialize for compilation to bytecode. This strategy is exactly like the existing parser module. One advantage of this approach is the AST objects in each language are simpler to use. The C AST nodes provide an easy API for C programmers and the Python AST nodes provide an easy API for Python programmers. Put another way, since the AST code is all generated from a high level description, the implementation doesn't matter at all. What matters is the API exposed in each programming language. If the best API happens to admit a shared implementation, that's great. If it doesn't, no loss. Jeremy

Neal Norwitz

6 Dec 6 Dec

12:54 a.m.

On 12/5/05, Jeremy Hylton wrote:

...

On 12/5/05, James Y Knight wrote:

...
ISTM that having to remember which pointers are arena-allocated and which are normally-refcounted-allocated removes the major gain that an arena method is supposed to bring: resistance to mistakes. I'd find having a single way to allocate and track memory easier than multiple. Then you just have to follow the single set of best practices for memory management, and you're all set. (and with PyObjects, the same practices the rest of python uses, another win.)

It's a question of degree, right? If you can find a small number of rules that are easy to understand then you are still likely to avoid mistakes.

...

From what I've seen of both, the arena is easier to deal with even

This is my understanding of the two approaches from what I've seen so far (Jeremy or Martin should correct me if I'm wrong). With current arena impl: * need to call PyArena_AddPyObject() for any allocated PyObject * need to call PyArena_AddMallocPointer() for any malloc()ed memory (there are current no manual calls like this, all the calls are in generated code?) With the PyObject imp: * need to init all PyObjects to NULL * need to Py_XDECREF() on exit * need to goto error if there is any failure Both impls have a bit more details, but those are the highlights. though it is different from the rest of python. There is only one thing to remember. I didn't look at the changes much, but from what I saw I think it may be better to move the arenas off the branch and onto the head now. It appears to be much easier to get right since there is virtually no error handling code in line. It's all taken care of in a few central places. We can then decide between the arenas in the head vs PyObjects.

...

...
I'd also like to parrot the concern others have had that if the AST nodes are not made of PyObjects, then a mirror hierarchy of PyObject- ified AST nodes will have to be created, which seems like quite a wasteful duplication. If it is required that there be a collection of AST python objects (which I think it is), is there really any good reason to make the "real" AST objects not be the _only_ AST objects? I've not heard one.

The PyObject-ified AST nodes are only needed if user code requests an AST from the compiler. That is, if we add a new feature that exposes AST, we would need AST objects represented in Python code. I think this feature would be great to add, but it doesn't seem like a primary concern for the internal compiler implementation.

FWIW, I agree with this approach. I don't care that much about the internal AST for its own sake. I want to consume the AST and I only care about the internals insofar as the result is correct and maintainable. So my view of the best approach is one that is easy to get right and maintain. That's why I think the arena should be moved to the head now. From what I saw it was much easier to get right, it removed a bunch of code and should be more maintainable. I will also probably work on the PyObject approach, since if that's more maintainable I'd prefer that in the end. I don't know which approach is best. I also really like Martin's idea about generating a lot more (all?) of the manually written Python/ast.c code. I'd prefer much less C code to maintain. n

Brett Cannon

2:59 a.m.

On 12/5/05, Neal Norwitz wrote:

...

On 12/5/05, Jeremy Hylton wrote: [SNIP] I didn't look at the changes much, but from what I saw I think it may be better to move the arenas off the branch and onto the head now. It appears to be much easier to get right since there is virtually no error handling code in line. It's all taken care of in a few central places.

We can then decide between the arenas in the head vs PyObjects.

I am also +1 with merging the arena into the trunk. The arena approach compared to the existing solution is a lot easier to use. With almost all calls to the arena in the auto-generated constructor code, one just has to make sure that key places have PyArena_Free() to free the arena and that errors propagate up to those points. But, as Neal is suggesting, this should not prevent the PyObject version from moving forward since it could still turn out to be the better solution.

...

...
...
I'd also like to parrot the concern others have had that if the AST nodes are not made of PyObjects, then a mirror hierarchy of PyObject- ified AST nodes will have to be created, which seems like quite a wasteful duplication. If it is required that there be a collection of AST python objects (which I think it is), is there really any good reason to make the "real" AST objects not be the _only_ AST objects? I've not heard one.

The PyObject-ified AST nodes are only needed if user code requests an AST from the compiler. That is, if we add a new feature that exposes AST, we would need AST objects represented in Python code. I think this feature would be great to add, but it doesn't seem like a primary concern for the internal compiler implementation.

FWIW, I agree with this approach. I don't care that much about the internal AST for its own sake. I want to consume the AST and I only care about the internals insofar as the result is correct and maintainable.

It really comes down to how people expect to use the exposure of the AST. If we try to make sure there is no horrible overhead in getting the AST to Python code and then to the bytecode compiler then it can be used for optimizations (e.g., the existing peepholer could be rewritten in Python and just a default transformation that the AST is passed through). But if we don't want to make sure that AST access is used for optimization transformation but more for non-performance critical uses (e.g., error checking ala PyChecker or refactoring tools) then the simplest, easiest to maintain solution should win out. Personally I want the former abilities for academic experimentation reasons. I don't think that a bunch of optimizations are suddenly going to appear out of nowhere for Python code, but I still would like to be able to experiment with some without having to worry about a performance penalty for doing so. Granted, though, if we byte-compiled scripts passed in on the command-line we would definitely help minimize the performance impact. Interpreter input might be a little slower, but then again since it will be such bite-sized chunks of AST a couple more Python calls shouldn't be that significant. Plus I don't know if serialization will be that much slower than passing the AST itself out since doing a full transformation on an AST might be extremely more costly than just getting the AST to the Python code in the first place.

...

So my view of the best approach is one that is easy to get right and maintain. That's why I think the arena should be moved to the head now. From what I saw it was much easier to get right, it removed a bunch of code and should be more maintainable.

I will also probably work on the PyObject approach, since if that's more maintainable I'd prefer that in the end. I don't know which approach is best.

I also really like Martin's idea about generating a lot more (all?) of the manually written Python/ast.c code. I'd prefer much less C code to maintain.

A new sprint topic for PyCon for Guido to give us a month deadline on after we have worked on it for three years! =) -Brett

Nick Coghlan

5:14 p.m.

Neal Norwitz wrote:

...

This is my understanding of the two approaches from what I've seen so far (Jeremy or Martin should correct me if I'm wrong).

With current arena impl: * need to call PyArena_AddPyObject() for any allocated PyObject * need to call PyArena_AddMallocPointer() for any malloc()ed memory (there are current no manual calls like this, all the calls are in generated code?)

With the PyObject imp: * need to init all PyObjects to NULL * need to Py_XDECREF() on exit * need to goto error if there is any failure

...
From what I've seen of both, the arena is easier to deal with even

Both impls have a bit more details, but those are the highlights. though it is different from the rest of python. There is only one thing to remember.

As Fredrik pointed out a while back, the PyObject approach doesn't *have* to involve manual decref operations - PyObject's come with a ready made arena structure, in the form of PyList. However, whether the automatic management is done with a list or with Jeremy's arena structure, the style is still different from most of CPython, and either way there's going to be a small learning curve associated with getting used to it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

"Martin v. Löwis"

7 Dec 7 Dec

3:36 a.m.

Nick Coghlan wrote:

...

As Fredrik pointed out a while back, the PyObject approach doesn't *have* to involve manual decref operations - PyObject's come with a ready made arena structure, in the form of PyList.

That doesn't really work: PyList_Append (which you would have to use) duplicates the reference, so you would still have to decref it explicitly. Of course, you could do so right away, instead of doing it on exit. Regards, Martin

6715

Age (days ago)

6721

Last active (days ago)

List overview

Download

14 comments

6 participants

participants (6)

"Martin v. Löwis"
Brett Cannon
James Y Knight
Jeremy Hylton
Neal Norwitz
Nick Coghlan

ast-objects branch created

tags

participants (6)