Mailman 3 Optimization of Python ASTs: How should we deal with constant values? - Python-Dev

Optimization of Python ASTs: How should we deal with constant values?

Thomas Lee

April 30, 2008

10:15 a.m.

Hi all, I've been working on optimization of the AST, including the porting of the old bytecode-level optimizations to the AST level. A few questions have come up in the process of doing this, all of which are probably appropriate for discussion on this list. The code I'm referring to here can be found in the tlee-ast-optimize branch. Most of the relevant code is in Python/optimize.c and Python/peephole.c. Nearly all of the bytecode-level optimizations have been moved to the AST optimizer with a few exceptions. Most of those waiting to be ported are stuck in limbo due to the fact we can't yet inject arbitrary PyObject constants into the AST. Examples are tuples of constants and the optimization of "LOAD_GLOBAL/LOAD_NAME None" as "LOAD_CONST None". This leaves us with a few options: 1. A new AST expr node for constant values for types other than Str/Num I imagine this to be something like Const(PyObject* v), which is effectively translated to a "LOAD_CONST v" by the compiler. This trades the purity of the AST for a little practicality. A "Const" node has no real source representation, it would exist solely for the purpose of injecting PyObject constants into the AST. 2. Allow arbitrary annotations to be applied to the AST as compiler hints. For example, each AST node might have an optional dict that contains a set of annotation values. Then when traversing the AST, the compiler might do something along the lines of: if (expr->annotations) { PyObject* constvalue = PyDict_GetItemString(expr->annotations, "constantvalue"); if (constvalue) ADDOP_O(c, LOAD_CONST, constvalue, consts) else VISIT(c, expr, expr) } This is a more general solution if we want to keep other compiler hints down the track, but unless somebody can think of another use case this is probably overkill. 3. Keep these particular optimizations at the bytecode level. It would be great to be able to perform the optimizations at a higher level, but this would require no effort at all. This would mean two passes over the same code at two different levels. If anybody familiar with this stuff could weigh in on the matter, it would be much appreciated. I've got a list of other issues that I need to discuss here, but this would be a great start. Thanks, Tom

Show replies by date

Brett Cannon

April 2008

5:49 p.m.

New subject: Optimization of Python ASTs: How should we deal with constant values?

On Wed, Apr 30, 2008 at 3:15 AM, Thomas Lee <tom@vector-seven.com> wrote:

...

Hi all,

I've been working on optimization of the AST, including the porting of the old bytecode-level optimizations to the AST level. A few questions have come up in the process of doing this, all of which are probably appropriate for discussion on this list. The code I'm referring to here can be found in the tlee-ast-optimize branch. Most of the relevant code is in Python/optimize.c and Python/peephole.c.

Nearly all of the bytecode-level optimizations have been moved to the AST optimizer with a few exceptions. Most of those waiting to be ported are stuck in limbo due to the fact we can't yet inject arbitrary PyObject constants into the AST. Examples are tuples of constants and the optimization of "LOAD_GLOBAL/LOAD_NAME None" as "LOAD_CONST None".

This leaves us with a few options:

1. A new AST expr node for constant values for types other than Str/Num

I imagine this to be something like Const(PyObject* v), which is effectively translated to a "LOAD_CONST v" by the compiler. This trades the purity of the AST for a little practicality. A "Const" node has no real source representation, it would exist solely for the purpose of injecting PyObject constants into the AST.

Slight issue with this is that people like Jython are standardizing on our AST, so adding to it does up their burden if they have no use for it.

...

2. Allow arbitrary annotations to be applied to the AST as compiler hints.

For example, each AST node might have an optional dict that contains a set of annotation values. Then when traversing the AST, the compiler might do something along the lines of:

if (expr->annotations) { PyObject* constvalue = PyDict_GetItemString(expr->annotations, "constantvalue"); if (constvalue) ADDOP_O(c, LOAD_CONST, constvalue, consts) else VISIT(c, expr, expr) }

This is a more general solution if we want to keep other compiler hints down the track, but unless somebody can think of another use case this is probably overkill.

Possibly, but the AST stuff is so new who knows if it truly will be overkill or not down the road. I personally prefer this approach.

...

3. Keep these particular optimizations at the bytecode level.

It would be great to be able to perform the optimizations at a higher level, but this would require no effort at all. This would mean two passes over the same code at two different levels.

This might end up being the case, but I would rather avoid the overhead if possible. The question becomes how many other possible optimizations might come up that would only work reasonably at the bytecode level. And you forgot option 4): ditching the optimizations that only work on the bytecode. I am not advocating this, but it is an option. -Brett

"Martin v. Löwis"

8:36 p.m.

...

This leaves us with a few options:

5. Reuse/Abuse Num(object) for arbitrary constants. AFAICT, this should work out of the box.

...

1. A new AST expr node for constant values for types other than Str/Num

I imagine this to be something like Const(PyObject* v), which is effectively translated to a "LOAD_CONST v" by the compiler. This trades the purity of the AST for a little practicality. A "Const" node has no real source representation, it would exist solely for the purpose of injecting PyObject constants into the AST.

I think this is the way to go. It doesn't violate purity: it is an *abstract* syntax, meaning that there doesn't need to be a 1:1 relationship to source syntax. However, it is still possible to reproduce source code from this Const node. I also don't worry about Jython conflicts. The grammar has a version number precisely so that you can refer to a specific version if you need to. Regards, Martin

Thomas Lee

May 2008

2:21 p.m.

Martin v. Löwis wrote:

...

...
This leaves us with a few options:

5. Reuse/Abuse Num(object) for arbitrary constants. AFAICT, this should work out of the box.

Eek. It *does* seem like Num would work out of the box, but would this be a good idea? What about *replacing* Num with Const? Might make optimizations specifically for numeric values slightly hairier, and semantically I think they might be different enough to warrant separate AST nodes despite the similarity in implementation at the compiler level. FWIW, I read Num as "numeric literal" and Const as "arbitrary constant", but that's probably only because I've seen the immediate need for constants with non-Num values in the process of writing the AST optimizer.

...

...
1. A new AST expr node for constant values for types other than Str/Num

I imagine this to be something like Const(PyObject* v), which is effectively translated to a "LOAD_CONST v" by the compiler. This trades the purity of the AST for a little practicality. A "Const" node has no real source representation, it would exist solely for the purpose of injecting PyObject constants into the AST.

I think this is the way to go. It doesn't violate purity: it is an *abstract* syntax, meaning that there doesn't need to be a 1:1 relationship to source syntax. However, it is still possible to reproduce source code from this Const node.

I'm leaning toward this, too. It's dirt simple and quite clean to implement.

...

I also don't worry about Jython conflicts. The grammar has a version number precisely so that you can refer to a specific version if you need to.

Any Jython folk care to weigh in on this? If there are no major objections I think I'm going to forge ahead with an independant Const() node. Cheers, T

"Martin v. Löwis"

5:38 p.m.

Thomas Lee wrote:

...

Martin v. Löwis wrote:

...
...
This leaves us with a few options:

5. Reuse/Abuse Num(object) for arbitrary constants. AFAICT, this should work out of the box.

Eek. It *does* seem like Num would work out of the box, but would this be a good idea?

No. I suggested it just for completeness.

...

What about *replacing* Num with Const? Might make optimizations specifically for numeric values slightly hairier, and semantically I think they might be different enough to warrant separate AST nodes despite the similarity in implementation at the compiler level.

I think they should be separate. Const would be a mere addition; for compatibility with other uses of the AST, that's actually better than simultaneous removal of Num. Regards, Martin

Jeremy Hylton

3:39 p.m.

New subject: Optimization of Python ASTs: How should we deal with constant values?

On Fri, May 2, 2008 at 1:38 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:

...

Thomas Lee wrote:

...
Martin v. Löwis wrote:

...
...
This leaves us with a few options:

5. Reuse/Abuse Num(object) for arbitrary constants. AFAICT, this should work out of the box.

Eek. It *does* seem like Num would work out of the box, but would this be a good idea?

No. I suggested it just for completeness.

...
What about *replacing* Num with Const? Might make optimizations specifically for numeric values slightly hairier, and semantically I think they might be different enough to warrant separate AST nodes despite the similarity in implementation at the compiler level.

I think they should be separate. Const would be a mere addition; for compatibility with other uses of the AST, that's actually better than simultaneous removal of Num.

Adding Const sounds good to me. Jeremy

...

Regards, Martin

_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu

Nick Coghlan

2:56 p.m.

New subject: Optimization of Python ASTs: How should we deal with constant values?

Jeremy Hylton wrote:

...

On Fri, May 2, 2008 at 1:38 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:

...
Thomas Lee wrote:

...
Martin v. Löwis wrote:

...
...
This leaves us with a few options:

5. Reuse/Abuse Num(object) for arbitrary constants. AFAICT, this should work out of the box.

Eek. It *does* seem like Num would work out of the box, but would this be a good idea?

No. I suggested it just for completeness.

...
What about *replacing* Num with Const? Might make optimizations specifically for numeric values slightly hairier, and semantically I think they might be different enough to warrant separate AST nodes despite the similarity in implementation at the compiler level.

I think they should be separate. Const would be a mere addition; for compatibility with other uses of the AST, that's actually better than simultaneous removal of Num.

Adding Const sounds good to me.

As Thomas mentions in a later message, making it possible to annotate nodes would permit Functions to be annotated as being a generator at the AST stage (currently it is left to the bytecode compiler's symtable generation pass to make that determination). Although I guess an alternative solution to that would be to have separate AST nodes for Functions and Generators as well... Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

Thomas Lee

3:43 p.m.

New subject: Optimization of Python ASTs: How should we deal with constant values?

Nick Coghlan wrote:

...

As Thomas mentions in a later message, making it possible to annotate nodes would permit Functions to be annotated as being a generator at the AST stage (currently it is left to the bytecode compiler's symtable generation pass to make that determination).

Although I guess an alternative solution to that would be to have separate AST nodes for Functions and Generators as well...

I've actually backtracked a little and gone back down the Const path again. I know this is the third time I've changed my mind, but it's primarily because annotations tend to get a little clunky (or my implementation was, at least). Using Const nodes feels a lot more natural inside the optimizer. I think it's going to stick, at least in the short term. Rather than separate FunctionDef and GeneratorDef nodes, I think a new bool attribute (is_generator?) on FunctionDef should do the job nicely. Further, I'm thinking we can move the "generator detection code" from the symtable into Python/ast.c so the generator/function information is available to the optimizer. This is made a little tricky by the absence of the contextual information available that is normally available when flagging generators in the symtable. When generating AST nodes for a suite, we know nothing about the parent node in which the suite resides. Still, it might be doable. If this winds up being ugly, we might need to fall back to the original plan of a separate pass over function bodies to detect yield expressions. I'll look into all this tomorrow night, along with any other crazy suggestions. For now I need to sleep a few hours. :) Thanks for the feedback, it's much appreciated. Cheers, T

Jeremy Hylton

5:20 p.m.

New subject: Optimization of Python ASTs: How should we deal with constant values?

On Wed, May 7, 2008 at 11:43 AM, Thomas Lee <tom@vector-seven.com> wrote:

...

Nick Coghlan wrote:

...
As Thomas mentions in a later message, making it possible to annotate

nodes would permit Functions to be annotated as being a generator at the AST stage (currently it is left to the bytecode compiler's symtable generation pass to make that determination).

...
Although I guess an alternative solution to that would be to have separate

AST nodes for Functions and Generators as well...

...
I've actually backtracked a little and gone back down the Const path again. I know this is the third time I've changed my mind, but it's primarily because annotations tend to get a little clunky (or my implementation was, at least). Using Const nodes feels a lot more natural inside the optimizer. I think it's going to stick, at least in the short term.

Rather than separate FunctionDef and GeneratorDef nodes, I think a new bool attribute (is_generator?) on FunctionDef should do the job nicely. Further, I'm thinking we can move the "generator detection code" from the symtable into Python/ast.c so the generator/function information is available to the optimizer.

Does the optimizer run after the symtable? I'd expect the symbol table information to be essential for almost all analysis, so it should be fine to wait until that pass to mark the generators. Jeremy

...

This is made a little tricky by the absence of the contextual information available that is normally available when flagging generators in the symtable. When generating AST nodes for a suite, we know nothing about the parent node in which the suite resides. Still, it might be doable. If this winds up being ugly, we might need to fall back to the original plan of a separate pass over function bodies to detect yield expressions.

I'll look into all this tomorrow night, along with any other crazy suggestions. For now I need to sleep a few hours. :)

Thanks for the feedback, it's much appreciated.

Cheers, T

Nick Coghlan

noon

New subject: Optimization of Python ASTs: How should we deal with constant values?

Jeremy Hylton wrote:

...

On Wed, May 7, 2008 at 11:43 AM, Thomas Lee <tom@vector-seven.com> wrote:

...
Nick Coghlan wrote:

...
As Thomas mentions in a later message, making it possible to annotate nodes would permit Functions to be annotated as being a generator at the AST stage (currently it is left to the bytecode compiler's symtable generation pass to make that determination). Although I guess an alternative solution to that would be to have separate AST nodes for Functions and Generators as well...

I've actually backtracked a little and gone back down the Const path again. I know this is the third time I've changed my mind, but it's primarily because annotations tend to get a little clunky (or my implementation was, at least). Using Const nodes feels a lot more natural inside the optimizer. I think it's going to stick, at least in the short term.

Rather than separate FunctionDef and GeneratorDef nodes, I think a new bool attribute (is_generator?) on FunctionDef should do the job nicely. Further, I'm thinking we can move the "generator detection code" from the symtable into Python/ast.c so the generator/function information is available to the optimizer.

Does the optimizer run after the symtable? I'd expect the symbol table information to be essential for almost all analysis, so it should be fine to wait until that pass to mark the generators.

There are a lot of micro-optimisations that are actually context independent, so moving them before the symtable pass should be quite feasible - e.g. replacing "return None" with "return", stripping dead code after a return statement, changing a "if not" statement into an "if" statement with the two suites reversed, changing "(1, 2, 3)" into a stored constant, folding "1 + 2" into the constant "3". I believe the goal is to see how many of the current bytecode optimisations can actually be brought forward to the AST generation stage, rather than waiting until after the bytecode symtable calculation and compilation passes. The current structure goes: tokenisation->AST construction->symtable construction->bytecode compilation->bytecode optimisation My understanding of what Thomas is trying to do is to make it look more like this: tokenisation->AST construction->AST optimisation->symtable construction->bytecode compilation There may still be a direct bytecode optimisation step left at the end if it turns out something relies on information that isn't worked out until the symtable construction pass (and that can't easily be worked out earlier, such as adding an 'is_generator' flag to the FunctionDef AST node). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

Jeremy Hylton

1:18 p.m.

New subject: Optimization of Python ASTs: How should we deal with constant values?

On Thu, May 8, 2008 at 8:00 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Jeremy Hylton wrote:

...
On Wed, May 7, 2008 at 11:43 AM, Thomas Lee <tom@vector-seven.com> wrote:

...
Nick Coghlan wrote:

...
As Thomas mentions in a later message, making it possible to annotate

nodes would permit Functions to be annotated as being a generator at the AST stage (currently it is left to the bytecode compiler's symtable generation pass to make that determination).

...
Although I guess an alternative solution to that would be to have separate

AST nodes for Functions and Generators as well...

...
I've actually backtracked a little and gone back down the Const path again. I know this is the third time I've changed my mind, but it's primarily because annotations tend to get a little clunky (or my implementation was, at least). Using Const nodes feels a lot more natural inside the optimizer. I think it's going to stick, at least in the short term.

Rather than separate FunctionDef and GeneratorDef nodes, I think a new bool attribute (is_generator?) on FunctionDef should do the job nicely. Further, I'm thinking we can move the "generator detection code" from the symtable into Python/ast.c so the generator/function information is available to the optimizer.

Does the optimizer run after the symtable? I'd expect the symbol table information to be essential for almost all analysis, so it should be fine to wait until that pass to mark the generators.

There are a lot of micro-optimisations that are actually context independent, so moving them before the symtable pass should be quite feasible - e.g. replacing "return None" with "return", stripping dead code after a return statement, changing a "if not" statement into an "if" statement with the two suites reversed, changing "(1, 2, 3)" into a stored constant, folding "1 + 2" into the constant "3".

I believe the goal is to see how many of the current bytecode optimisations can actually be brought forward to the AST generation stage, rather than waiting until after the bytecode symtable calculation and compilation passes.

The current structure goes:

tokenisation->AST construction->symtable construction->bytecode compilation->bytecode optimisation

My understanding of what Thomas is trying to do is to make it look more like this:

tokenisation->AST construction->AST optimisation->symtable construction->bytecode compilation

There may still be a direct bytecode optimisation step left at the end if it turns out something relies on information that isn't worked out until the symtable construction pass (and that can't easily be worked out earlier, such as adding an 'is_generator' flag to the FunctionDef AST node).

The symbol table pass is not important for deciding where to place the optimizations. Either the optimization operates on the AST itself, in which case it needs to run before bytecode compilation or it operators on the lowered bytecode representation, in which case it needs to run after bytecode compilation. Since the symbol table passes doesn't change the representation, you ought to be able to put the optimizations before or after it. But I haven't looked closely at the actual code, so maybe I'm missing something. Jeremy

...

Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

Thomas Lee

11:22 p.m.

New subject: Optimization of Python ASTs: How should we deal with constant values?

Nick Coghlan wrote:

...

There are a lot of micro-optimisations that are actually context independent, so moving them before the symtable pass should be quite feasible - e.g. replacing "return None" with "return", stripping dead code after a return statement, changing a "if not" statement into an "if" statement with the two suites reversed, changing "(1, 2, 3)" into a stored constant, folding "1 + 2" into the constant "3".

I believe the goal is to see how many of the current bytecode optimisations can actually be brought forward to the AST generation stage, rather than waiting until after the bytecode symtable calculation and compilation passes.

That's been the aim so far. It's been largely successful with the exception of a few edge cases (most notably the functions vs. generator stuff). The elimination of unreachable paths (whether they be things like "if 0: ..." or "return; ... more code ...") completely breaks generators since we might potentially be blowing away "yield" statements during the elimination process. The rest of the optimizations, as far as I can see, are much less scary.

...

The current structure goes:

tokenisation->AST construction->symtable construction->bytecode compilation->bytecode optimisation

My understanding of what Thomas is trying to do is to make it look more like this:

tokenisation->AST construction->AST optimisation->symtable construction->bytecode compilation

That's exactly right. I made a quick and dirty attempt at moving the AST optimization step after the symtable generation on the train home last night to see if Jeremy's suggestion gives us anything. Now, it does make the detection of generators a little easier, but it really doesn't give us all that much else. I'm happy to post a patch so you can see what I mean, but for now I think visiting the FunctionDef subtree to check for Yield nodes will be fine. Again, I might be wrong (as I've often been throughout this process!) but let's see how it goes. Obviously a bit less efficient, but function bodies really shouldn't be all that deep anyway. I've got a good idea about how I'm going to go forward with this. Cheers, T

Adam Olsen

11:36 p.m.

New subject: Optimization of Python ASTs: How should we deal with constant values?

On Thu, May 8, 2008 at 5:22 PM, Thomas Lee <tom@vector-seven.com> wrote:

...

Nick Coghlan wrote:

...
There are a lot of micro-optimisations that are actually context independent, so moving them before the symtable pass should be quite feasible - e.g. replacing "return None" with "return", stripping dead code after a return statement, changing a "if not" statement into an "if" statement with the two suites reversed, changing "(1, 2, 3)" into a stored constant, folding "1 + 2" into the constant "3".

I believe the goal is to see how many of the current bytecode optimisations can actually be brought forward to the AST generation stage, rather than waiting until after the bytecode symtable calculation and compilation passes.

That's been the aim so far. It's been largely successful with the exception of a few edge cases (most notably the functions vs. generator stuff). The elimination of unreachable paths (whether they be things like "if 0: ..." or "return; ... more code ...") completely breaks generators since we might potentially be blowing away "yield" statements during the elimination process.

Also breaks various sanity checks relating to the global statement. -- Adam Olsen, aka Rhamphoryncus

Thomas Lee

11:54 p.m.

New subject: Optimization of Python ASTs: How should we deal with constant values?

Adam Olsen wrote:

...

On Thu, May 8, 2008 at 5:22 PM, Thomas Lee <tom@vector-seven.com> wrote:

...
Nick Coghlan wrote:

...
There are a lot of micro-optimisations that are actually context independent, so moving them before the symtable pass should be quite feasible - e.g. replacing "return None" with "return", stripping dead code after a return statement, changing a "if not" statement into an "if" statement with the two suites reversed, changing "(1, 2, 3)" into a stored constant, folding "1 + 2" into the constant "3".

I believe the goal is to see how many of the current bytecode optimisations can actually be brought forward to the AST generation stage, rather than waiting until after the bytecode symtable calculation and compilation passes.

That's been the aim so far. It's been largely successful with the exception of a few edge cases (most notably the functions vs. generator stuff). The elimination of unreachable paths (whether they be things like "if 0: ..." or "return; ... more code ...") completely breaks generators since we might potentially be blowing away "yield" statements during the elimination process.

Also breaks various sanity checks relating to the global statement.

What sanity checks are these exactly? Is this related to the lnotab? Cheers, T

Steve Holden

1:12 a.m.

New subject: Optimization of Python ASTs: How should we deal with constant values?

Thomas Lee wrote:

...

Adam Olsen wrote:

...
On Thu, May 8, 2008 at 5:22 PM, Thomas Lee <tom@vector-seven.com> wrote:

...
Nick Coghlan wrote:

...
There are a lot of micro-optimisations that are actually context independent, so moving them before the symtable pass should be quite feasible - e.g. replacing "return None" with "return", stripping dead code after a return statement, changing a "if not" statement into an "if" statement with the two suites reversed, changing "(1, 2, 3)" into a stored constant, folding "1 + 2" into the constant "3".

I believe the goal is to see how many of the current bytecode optimisations can actually be brought forward to the AST generation stage, rather than waiting until after the bytecode symtable calculation and compilation passes.

That's been the aim so far. It's been largely successful with the exception of a few edge cases (most notably the functions vs. generator stuff). The elimination of unreachable paths (whether they be things like "if 0: ..." or "return; ... more code ...") completely breaks generators since we might potentially be blowing away "yield" statements during the elimination process.

Also breaks various sanity checks relating to the global statement.

What sanity checks are these exactly? Is this related to the lnotab?

Cheers, T _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gman...

While not strictly related to the global statement, perhaps Adam refers to the possibility of optimizing away code with an assignment which would make a name be recognized as local? If you're worried about "yield" disappearing you should also be worried about assignments disappearing, since that might cause names to be interpreted as globals. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/

Nick Coghlan

1:38 a.m.

New subject: Optimization of Python ASTs: How should we deal with constant values?

Steve Holden wrote:

...

While not strictly related to the global statement, perhaps Adam refers to the possibility of optimizing away code with an assignment which would make a name be recognized as local? If you're worried about "yield" disappearing you should also be worried about assignments disappearing, since that might cause names to be interpreted as globals.

And once you start annotating functions as generators or not, and variable names as locals or cell variables or globals, you're starting to build up a substantial fraction of the information that is already collected during the symtable construction pass. Perhaps the initial attempt at this should just focus on identifying those operations which have the potential to alter the results of the symtable construction, and leave those to the bytecode optimisation step for the moment. Doing the symtable pass twice seems fairly undesirable, even if it does let us trim some dead code out of the AST. Cheers, Nick. P.S. Here's a nonsensical example where optimising away the dead code after the return statement would significantly change the code's semantics:

...

...
...
def silly(): ... def inner(): ... print cell_var ... return inner ... cell_var = None ... silly()() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in inner NameError: free variable 'cell_var' referenced before assignment in enclosing scope

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

Thomas Lee

1:51 a.m.

New subject: Optimization of Python ASTs: How should we deal with constant values?

Nick Coghlan wrote:

...

Steve Holden wrote:

...
While not strictly related to the global statement, perhaps Adam refers to the possibility of optimizing away code with an assignment which would make a name be recognized as local? If you're worried about "yield" disappearing you should also be worried about assignments disappearing, since that might cause names to be interpreted as globals.

And once you start annotating functions as generators or not, and variable names as locals or cell variables or globals, you're starting to build up a substantial fraction of the information that is already collected during the symtable construction pass.

Perhaps the initial attempt at this should just focus on identifying those operations which have the potential to alter the results of the symtable construction, and leave those to the bytecode optimisation step for the moment. Doing the symtable pass twice seems fairly undesirable, even if it does let us trim some dead code out of the AST.

Sounds good. We can always come back to it. Cheers, T

Adam Olsen

1:38 a.m.

New subject: Optimization of Python ASTs: How should we deal with constant values?

On Thu, May 8, 2008 at 5:54 PM, Thomas Lee <tom@vector-seven.com> wrote:

...

Adam Olsen wrote:

...
On Thu, May 8, 2008 at 5:22 PM, Thomas Lee <tom@vector-seven.com> wrote:

...
Nick Coghlan wrote:

...
There are a lot of micro-optimisations that are actually context independent, so moving them before the symtable pass should be quite feasible - e.g. replacing "return None" with "return", stripping dead code after a return statement, changing a "if not" statement into an "if" statement with the two suites reversed, changing "(1, 2, 3)" into a stored constant, folding "1 + 2" into the constant "3".

I believe the goal is to see how many of the current bytecode optimisations can actually be brought forward to the AST generation stage, rather than waiting until after the bytecode symtable calculation and compilation passes.

That's been the aim so far. It's been largely successful with the exception of a few edge cases (most notably the functions vs. generator stuff). The elimination of unreachable paths (whether they be things like "if 0: ..." or "return; ... more code ...") completely breaks generators since we might potentially be blowing away "yield" statements during the elimination process.

Also breaks various sanity checks relating to the global statement.

What sanity checks are these exactly? Is this related to the lnotab?

Here we are. In 2.4.4:

...

...
...
def foo(): ... print test ... if 0: ... import test ... foo() Traceback (most recent call last): File "<stdin>", line 1, in ? File "<stdin>", line 2, in foo NameError: global name 'test' is not defined

2.5.1 correctly raises an UnboundLocalError instead. Searching the bug DB for old bugs involving the optimizer showed several variations on this, such as "if 0: yield None" at module scope not raising a SyntaxError (Issue 1875, still open in trunk). Clearly there needs to be a general approach to avoid affecting these checks with optimizations. -- Adam Olsen, aka Rhamphoryncus

Frank Wierzbicki

6:49 p.m.

New subject: Optimization of Python ASTs: How should we deal with constant values?

On Fri, May 2, 2008 at 10:21 AM, Thomas Lee <tom@vector-seven.com> wrote:

...

Any Jython folk care to weigh in on this? If there are no major objections I think I'm going to forge ahead with an independant Const() node. I suspect that having a marker for non-int non-str constants could also be used for some optimization of Jython bytecode emissions as well, though of course it would depend on the details. If the path from raw AST to optimized AST isn't too crazy Jython can just grow the same logic. We have really good AST comparison tests these days, so I bet it won't be a huge problem for us when the time comes.

-Frank

6110

Age (days ago)

6128

Last active (days ago)

List overview

Download

18 comments

8 participants

participants (8)

"Martin v. Löwis"
Adam Olsen
Brett Cannon
Frank Wierzbicki
Jeremy Hylton
Nick Coghlan
Steve Holden
Thomas Lee

Optimization of Python ASTs: How should we deal with constant values?

Thomas Lee

Thomas Lee

Thomas Lee

Thomas Lee

Adam Olsen

Thomas Lee

Thomas Lee

Adam Olsen

tags

participants (8)