[Python-Dev] Re: Magic number needs upgrade

Raymond Hettinger python@rcn.com
Tue, 22 Apr 2003 18:01:05 -0400

> > Now that we have new bytecode optimizations, the pyc file magic
> > number needs to be changed.

We have several options:

1. change the magic number to accomodate NOP.

2. install an additional step that eliminates the NOPs from
    the bytecode (they are not strictly necessary).  this will make
    the code even shorter and faster without a need to change the
    magic number.  i've got this in my hip pocket if we decide that
    this is the way to go.  the generated code is beautiful.

3. eliminate the last two optimizations which were the only ones
    that needed a NOP:

    a)   compare_op (is, in,is not, not in)  unary_not -->
                  compare_op(is not, not in, is, in)   nop
     b)  unary_not jump_if_false (tgt) -->
                   nop   jump_if_true (tgt) 

> I wonder what
> the wisdom is of adding more code complexity.

Part of the benefit is that there will no longer be any need to re-arrange
branches and conditionals in order to avoid 'not'.  As of now, it has 
near-zero cost in most situations (except when used with and/or).

> We're still holding off on Ping and Aahz's changes (see the
> cache-attr-branch) and Thomas and Brett's CALL_ATTR optimizations, for
> similar reasons (inconclusive evidence of speedups in real programs).
> What makes Raymond's changes different?

* They are thoroughly tested.

* They are decoupled from the surrounding code and
   will survive changes to ceval.c and newcompile.c.

* They provide some benefits without hurting anything else.

* They provide a framework for others to build upon.
   The scanning loop and basic block tester make it
    a piece of cake to add/change/remove new code transformations.

CALL_ATTR ought to go in when it is ready.  It certainly provides
measurable speed-up in the targeted behavior.  It just needs more
polish so that it doesn't slow down other pathways.  The benefit
is real, but in real programs it is being offset by reduced performance
in non-targeted behavior.  With some more work, it ought to be a 
real gem.  Unfortunately, it is tightly coupled to the implementation
of new and old-style class.   Still, it looks like a winner.

What we're seeing is a consequence of Amdahl's law and Python's
broad scope.  Instead of a single hotspot, Python exercises many
different types of code and each needs to be optimized separately.
People have taken on many of these and collectively they are having
a great effect.  The proposals by Ping, Aahz, Brett, and Thomas
are import steps to address untouched areas.   

I took on the task of making sure that the basic pure python code
slithers along quickly.  The basics like "while", "for", "if", "not"
have all been improved.  Lowering the cost of those constructs
will result in less effort towards by-passing them with vectorized 
code (map, etc).  Code in something like sets.py won't show much
benefit because so much effort had been directed at using filter,
map, dict.update, and other high volume c-coded functions and

Any one person's optimizations will likely help by a few percent 
at most.  But, taken together, they will be a big win.

> I also wonder why this is done unconditionally, rather than only with
> -O.

Neal, Brett, and I had discussed this a bit and I came to the conclusion
that these code transformations are like the ones already built into the
compiler -- they have some benefit, but cost almost nothing (two passes
over the code string at compile time).  The -O option makes sense for
optimizations that have a high time overhead, throw-away debugging
information, change semantics, or reduce feature access.  IOW, -O is
for when you're trading something away in return for a bit of speed
in production code.

There is essentially no benefit to not using the optimized bytecode.

Raymond Hettinger