[Patches] [ python-Patches-670367 ] Micro-optimizations for ceval.c

Sat, 18 Jan 2003 21:12:06 -0800

Patches item #670367, was opened at 2003-01-18 14:07
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=670367&group_id=5470

Category: None
Group: Python 2.3
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Raymond Hettinger (rhettinger)
Assigned to: Raymond Hettinger (rhettinger)
Summary: Micro-optimizations for ceval.c

Initial Comment:
Make the code slightly shorter, faster, and easier to 
read.

* Eliminate unused DUP_TOPX code for x==1.  
compile.c always generates DUP_TOP instead.

* Since only two cases remain for DUP_TOPX, replace 
the switch-case with if-elseif.

* The in-lined integer compare does a CheckExact on 
both arguments.  Since the second is a little more 
likely to fail, test it first.

* The switch-case for IS/IS_NOT and IN/NOT_IN can 
separate the regular and inverted cases with no 
additional work.  For all four paths, saves a test and 
jump.

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2003-01-19 00:12

Message:
Logged In: YES 
user_id=80475

Applied as Python/ceval.c 2.347.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-01-18 22:50

Message:
Logged In: YES 
user_id=31435

Small non-obscure changes that reduce instruction count 
(etc) can be made even if timing shows a slowdown.  This is 
extremely dependent on platform-specific and compiler 
release-specific compiler optimization, and ceval.c is so 
complex it kicks every linear-time optimizer into 
unpredictable behavior,  It's the accumulation, over time, of 
many small changes "in the right direction" that pays off 
over time.  On the way there, timing will bounce around 
seemingly at random, and in different ways under different 
compilers and architectures.  Don't sweat it -- I don't, and 
I've done as much to speed Python as anyone <wink>.

Marked Accepted and back to Raymond.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2003-01-18 20:50

Message:
Logged In: YES 
user_id=80475

Typo in the last message:

The savings for "5 in []" is 4%.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2003-01-18 20:40

Message:
Logged In: YES 
user_id=80475

The attached timing code shows about a 7% speed-up for 
the IS test.  I didn't expect this much but then IS is a low 
overhead test.

The same test for IS NOT shows 5%.

The same test for "5 in []" shows only 1%.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2003-01-18 20:07

Message:
Logged In: YES 
user_id=80475

These micro-optimizations are so small that they don't 
warrant setting-up timing code to show a 1% change.  
Neal's micro-optimizations were not individually tested and 
timed.  Likewise, Tim commented that these are all steps 
in the right direction even if they can't be individually 
timed. OTOH, if any of these look like a step backwards, 
they ought to be zapped. All that being said, I would still 
make a timing suite if weren't so difficult to get consistent 
timings on WindowsME.

The benefit for the switch-case is observable in the 
assembly output which shows one-fewer test jump on 
every path and no other changes.

The elimination of an unused case for dup_topx is a 
continuation of what GvR okayed on python-dev.  There is 
a slight savings in code volume.  There will be no time 
saved because the code was never called.

The conversion from switch-case to if-else follows the 
recommendations from Intel's Software Optimization 
Cookbook.  It says that the if-else form has a chance at 
branch prediction, while the calculated jump does not.  
More importantly, the code is just a little bit clearer than 
the 2 case switch.

Changing the order of the PyInt_CheckExact tests came 
from the observation that "anInt in aContainer" would pass 
the first test but not the second.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-01-18 17:52

Message:
Logged In: YES 
user_id=21627

What kind of speed-up have you observed for what test case?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=670367&group_id=5470