[Patches] [ python-Patches-707257 ] Improve code generation

SourceForge.net noreply@sourceforge.net
Sun, 23 Mar 2003 14:57:54 -0800


Patches item #707257, was opened at 2003-03-20 20:03
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Raymond Hettinger (rhettinger)
Assigned to: Tim Peters (tim_one)
Summary: Improve code generation

Initial Comment:
Adds a single function to improve generated bytecode.

Has a two line attachment point, so it is completely 
de-coupled from both the compiler and ceval.c.

The first pass looks for the sequence LOAD_CONST 1, 
JUMP_IF_FALSE xx, POP_TOP.  It replaces the first 
instruction with JUMP_FORWARD +4.

The second pass looks for jumps to an unconditional 
jump.  The first jump target is replaced with the 
second jump target.

Both are safe, general purpose optimizations.  
Together, they eliminate 100% of the "while 1" loop 
overhead.

The structure of the code allows for other code 
improvements to be easily added.  This one focuses 
on low hanging fruit. It takes a simple, safe approach 
that does not change bytecode size or order and does 
not need a basic block analysis.

Improves timings on pybench, pystone, and two of 
my real applications.  timeit.py shows dramatic 
improvement to code using "while 1".

python timeit.py "while 1: break"

python timeit.py -s "i=0" "while 1:" "    if i==1: 
break" "    else: i=1"


----- Example -----

Disassembly of

def f(x):
    while 1:
        x -= 1
        if x == 0:
            break

shows two lines changing from:

  3 LOAD_CONST               1 (1)
38 JUMP_ABSOLUTE            3

and improving to:

3 JUMP_FORWARD             4 (to 10)
38 JUMP_ABSOLUTE           10

All of the other lines are left unchanged.

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2003-03-23 17:57

Message:
Logged In: YES 
user_id=80475

Thanks for the thorough code review and for being positive 
on the inclusion of the patch.  Attached is a revision that 
delays PyString_Size and bypasses situations with 
extended arguments.

For the dead code fragment, I'm more comfortable with 
the DUP_TOP POP_TOP than use of STOP_CODE but it is 
probably a matter of taste.  A more sophisticated 
approach would not have any dead code but I've aimed for 
the simplest thing that could possibly work.

The unconditional jump test is performed on a different 
opcode than the test for equality to JUMP_ABSOLUTE, so
the two tests cannot be combined.  The first operates on 
codestr[tgt] and the second on codestr[i].

I had tried a single big loop instead of three little loops but 
there was a loss of clarity.  Since the recognizers quickly 
skip over mismatches, the total loop time is very small.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-03-23 10:23

Message:
Logged In: YES 
user_id=33168

Generally I think I'd like to see only one loop over the
code (should scale better than having N loops--1 per
optimization).  Perhaps making each optimization into it's
own function--e.g., opt_while_1, opt_swap, opt_jump_jump.

* In optimize_code, PyString_Size() is called before
verifying code is a string.  If the code isn't a string, an
exception will be left-over.  Suggest setting clen after the
string check.

* I don't think the code works with EXTENDED_ARGS.  This can
happen if there are more than 64k variables etc.  Perhaps if
you get an EXTENDED_ARG you should just bail?

* the DUP_TOP and POP_TOP are never supposed to be executed,
right?  I would use STOP_CODE to indicate the ops were
invalid.  I can also see where others would find this
suggestion objectionable.  There is no NOP though.  Ideally,
we would remove the dead code, rather than have the JUMP,
etc.  This would mean possibly changing all subsequent
JUMP_ABSOLUTEs though.  I don't recommend changing this,
just lamenting.

(I particularly like the BUILD/UNPACK of 2 becoming a
ROT_TWO, BTW :-)

* Why in the jumps to jumps loop don't you set codestr[i] =
opcode if opcode == JUMP_FORWARD, then do away with the if
(opcode != JUMP_ABSOLUTE)?  The check for UNCONDITIONAL_JUMP
already guarantees you have either JUMP_FORWARD or
JUMP_ABSOLUTE.

* same problem with EXTENDED_ARG for SETARG though.  You
probably need a check before the SETARG to make sure tgttgt
< 64k.
Other than the EXTENDED_ARG and string size issues, the code
looks fine and makes sense.  In general, I'm positive on the
idea of doing this.  However, I'm not sure this change is
appropriate for 2.3, partially because the beta is coming. 
I'm a little (very little) concerned the speed penalty for
compiling.  I realize this is a one-time (at most) cost, so
it's almost definitely insignificant.

I'd like Tim or Guido to approve the approach for
acceptance.  Assigning to Tim.  Regardless of whether this
patch is accepted for 2.3, I think all of these should be
implemented in 2.4!  Hopefully at that time there will be
the new AST compiler which we can modify more easily and
make even more optimizations.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2003-03-21 18:50

Message:
Logged In: YES 
user_id=357491

Ah, forgot about the planned refactoring for 2.4.  Oops.  =)

OK, I will keep this in the back of my head until the refactor gets done.

And in case it wasn't clear, I am all for getting this patch in.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2003-03-21 18:04

Message:
Logged In: YES 
user_id=80475

Not really.  There is no need to go wild before the compiler 
is refactored.

Loading another update that includes theller's idea to 
handle all constants evaluating to true.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2003-03-21 17:43

Message:
Logged In: YES 
user_id=357491

Do I hear a PEP coming?  =)

If anyone is serious about coming up with a hook for peephole optimizing (I am thinking of something similar to how import hooks are handled; a list kept in sys that contains functions that get passed opcode about to be written out to a .pyc file) then email me (unless starting a feature request would be better?).  I am up to writing a PEP and trying to get this to work.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2003-03-21 15:28

Message:
Logged In: YES 
user_id=80475

Right, it takes a LOAD_GLOBAL to fetch True using a 
dictionary lookup.  In constast, 2 is quickly fetched with 
LOAD_CONST.

Adding a hook is easy enough, but I'll leave that for 
another day (I've already exceeded my quota of API 
change requests).  This patch focuses on "the simplest 
thing that could possibly work".

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2003-03-21 15:09

Message:
Logged In: YES 
user_id=11105

Looks better now.
So it seems 'while True:' or 'while 2:' is worse than 'while
1:' ;-) ?

I like Brett's suggestion about adding an (additional) hook
here which allows to pass the code to Python (?) code for
further peephole optimizing.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2003-03-21 14:18

Message:
Logged In: YES 
user_id=80475

Attached a revised patch:
* Adds PyMem_Free   (theller's review comment)
* Applies macro form of string/tuple operations
* All exits now return a new reference
* Attach point is now a single line

Walter, until GvR moves to prevent shadowing of globals, 
it would be unsafe to optimize "while True".

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2003-03-21 03:21

Message:
Logged In: YES 
user_id=89016

"while True:" should be optimized too.

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2003-03-21 03:02

Message:
Logged In: YES 
user_id=11105

Isn't there a PyMem_Free missing at the end?

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2003-03-21 02:43

Message:
Logged In: YES 
user_id=357491

OK, fair enough.  I buy the argument.  =)

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2003-03-20 21:11

Message:
Logged In: YES 
user_id=80475

The -O option was useful when the optimization involved a 
trade-off.  It used to be that you lost line numbering when -
O was turned on.  In contrast, this patch is a pure win and 
does not affect anything else including dis and pdb.

Other bytecode optimizations have been implemented 
directly in the compiler code (for instance, negatives 
before a constant) and those were not linked to the -O 
option.  IOW, I recommend against attaching this to a 
command line switch.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2003-03-20 20:56

Message:
Logged In: YES 
user_id=357491

Perhaps this should be made something that is done with the -O option?  Since this is changing the outputted bytecode from what the parser spits out I think it is classified as an optimization and thus should be made an optional optimization instead of a required one.

Love the idea, though.  Personally, I would love to see some pluggable system developed for -O that allows for easy adding of peephole optimizations.  This patch seems to be taking the initial steps toward a setup like that.

Besides, the poor -O option isn't worth much of anything these days thanks to Michael.  =)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=707257&group_id=5470