[Python-Dev] Optimization of Python ASTs: How should we deal with constant values?

Thu May 8 14:00:35 CEST 2008

Jeremy Hylton wrote:
> On Wed, May 7, 2008 at 11:43 AM, Thomas Lee <tom at vector-seven.com> wrote:
>> Nick Coghlan wrote:
>>
>>> As Thomas mentions in a later message, making it possible to annotate
>> nodes would permit Functions to be annotated as being a generator at the AST
>> stage (currently it is left to the bytecode compiler's symtable generation
>> pass to make that determination).
>>> Although I guess an alternative solution to that would be to have separate
>> AST nodes for Functions and Generators as well...
>>>
>>  I've actually backtracked a little and gone back down the Const path again.
>> I know this is the third time I've changed my mind, but it's primarily
>> because annotations tend to get a little clunky (or my implementation was,
>> at least). Using Const nodes feels a lot more natural inside the optimizer.
>> I think it's going to stick, at least in the short term.
>>
>>  Rather than separate FunctionDef and GeneratorDef nodes, I think a new bool
>> attribute (is_generator?) on FunctionDef should do the job nicely. Further,
>> I'm thinking we can move the "generator detection code" from the symtable
>> into Python/ast.c so the generator/function information is available to the
>> optimizer.
> 
> Does the optimizer run after the symtable?  I'd expect the symbol
> table information to be essential for almost all analysis, so it
> should be fine to wait until that pass to mark the generators.

There are a lot of micro-optimisations that are actually context 
independent, so moving them before the symtable pass should be quite 
feasible - e.g. replacing "return None" with "return", stripping dead 
code after a return statement, changing a "if not" statement into an 
"if" statement with the two suites reversed, changing "(1, 2, 3)" into a 
stored constant, folding "1 + 2" into the constant "3".

I believe the goal is to see how many of the current bytecode 
optimisations can actually be brought forward to the AST generation 
stage, rather than waiting until after the bytecode symtable calculation 
and compilation passes.

The current structure goes:

tokenisation->AST construction->symtable construction->bytecode 
compilation->bytecode optimisation

My understanding of what Thomas is trying to do is to make it look more 
like this:

tokenisation->AST construction->AST optimisation->symtable 
construction->bytecode compilation

There may still be a direct bytecode optimisation step left at the end 
if it turns out something relies on information that isn't worked out 
until the symtable construction pass (and that can't easily be worked 
out earlier, such as adding an 'is_generator' flag to the FunctionDef 
AST node).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org