[Patches] [ python-Patches-910929 ] Optimize list comprehensions

Mon Mar 8 17:48:07 EST 2004

Patches item #910929, was opened at 2004-03-06 08:22
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=910929&group_id=5470

Category: Core (C code)
Group: Python 2.4
Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Raymond Hettinger (rhettinger)
Assigned to: Nobody/Anonymous (nobody)
Summary: Optimize list comprehensions

Initial Comment:
Save about 35% on the per pass overhead of list
comprehensions.

Adds a new opcode, LIST_APPEND, which is faster than
the current call to CALL_FUNCTION 1 on the bound
method, list.append(), and the subsequent call to
POP_TOP to clear the returned None value.

The resulting disassembled code is suprisingly light
and concise.

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-08 17:48

Message:
Logged In: YES 
user_id=80475

I am currently working on building out the byte code
optimizer.  

Let me know if you would like to participate. 

The three infrastructure components needed for further
buildouts are:
* Check a range to see if it is a basic block
* Allow a tempory NOP code to be eliminated on a final pass
which makes jump target fixups.
* Fixup all of the line number references.

I made the first two but not the last.

For example,
  A basic block containing:
    [BUILD_TUPLE 2 UNPACK_SEQUENCE 2]
  or
    [BUILD_LIST 2 UNPACK_SEQUENCE 2]
  should be directly replaced with
    [ROT_TWO NOP NOP NOP NOP NOP]
  and be compressed on a final pass to
    [ROT_TWO]
  This gives a three fold speedup for: a,b=b,a 

The other candidate transformations are:

  [BUILD_TUPLE 3 UNPACK_SEQUENCE 3] --> [ROT_THREE ROT_TWO]

  [LOAD_CONST 2  BINARY_MULTIPLY] --> [DUP BINARY_ADD]

  [RETURN_VALUE anyLoadInstruction RETURN_vALUE] -->
[RETURN_VALUE]

  [STORE_FAST n LOAD_FAST n] --> [DUP STORE_FAST n]

  [UNARY_NOT  JUMP_IF_FALSE tgt  POP_TOP ... tgt: POP_TOP]
--> [JUMP_IF_TRUE tgt  POP_TOP ... tgt: POP_TOP]

  [UNARY_NOT  JUMP_IF_TRUE tgt  POP_TOP ... tgt: POP_TOP]
--> [JUMP_IF_FALSE tgt  POP_TOP ... tgt: POP_TOP]

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2004-03-08 15:35

Message:
Logged In: YES 
user_id=4771

No, Neal's comments made me realize I didn't think about nested loops. In these, you have several iterators lying around on the stack on top of the list. Definitely not clean.

I guess your patch is the cleanest so far. More subtle optimizations should be left to a bytecode optimizer. (btw I wonder why we don't consider these more.)

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-08 12:51

Message:
Logged In: YES 
user_id=80475

Armin, do you have a working patch (which handles nested
list comps) that implements:  LIST_APPEND [list, ignored,
value]  ->  [list, ignored].

If you can get it to work, it would simplify and speedup the
code a bit.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2004-03-08 12:38

Message:
Logged In: YES 
user_id=33168

My last comment about "problems" referred to removing the
DELETE_FAST by Armin.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2004-03-08 12:36

Message:
Logged In: YES 
user_id=33168

You may have problems with inner list comps:  
[x+y for x in list1 for y in list2]

If you only have a single list comp, and LIST_APPEND didn't
pop the list off the stack, I think you could do:
          0 BUILD_LIST               0
          3 LOAD_FAST                0 (iter)
          6 GET_ITER
    >>    7 FOR_ITER                 7 (to 17)
         10 STORE_FAST               2 (elem)
         13 LIST_APPEND
         14 JUMP_ABSOLUTE            7
    >>   17 RETURN_VALUE

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-08 12:31

Message:
Logged In: YES 
user_id=80475

The actual checkin kept the final DELETE_FAST to allow all
references to the list to be removable.

Will take a look at the proposed variants.  Off-hand, they
do not have a clean feel to them, but they do save a trip
around the eval loop.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2004-03-08 05:58

Message:
Logged In: YES 
user_id=4771

The patch also removes the final DELETE_FAST. Why? I think deleting this old reference to the list is a good idea.

Another faster (but slightly less clear) approach would be to change the behavior of LIST_APPEND so that the '_' variable can be completely avoided. In stack manipulation notation:

LIST_APPEND [list, ignored, value]  ->  [list, ignored]

where 'ignored' is the iterator still on the stack.  Admittedly this is quite an unexpected behavior, so maybe LIST_APPEND should be called LIST_COMPREHEND or somesuch.

An intermediate solution with no change in LIST_APPEND would introduce a DUP_OVER stack manipulation opcode:

DUP_OVER  [a, b]  ->  [a, b, a]

which could replace the LOAD_FAST '_' at the beginning of the loop, getting the list object from over the iterator.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-07 02:41

Message:
Logged In: YES 
user_id=80475

Applied to:

Python/ceval.c 2.379
Python/compile.c 2.299
Include/opcode.h 2.44
Lib/opcode.py 1.5

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=910929&group_id=5470