[Patches] [ python-Patches-943898 ] A simple 3-4% speed-up for PCs

SourceForge.net noreply at sourceforge.net
Wed May 12 11:27:10 EDT 2004


Patches item #943898, was opened at 2004-04-28 13:33
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=943898&group_id=5470

Category: Core (C code)
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Armin Rigo (arigo)
>Assigned to: Tim Peters (tim_one)
Summary: A simple 3-4% speed-up for PCs

Initial Comment:
The result of a few experiments looking at the assembler produced by gcc for eval_frame():

* on PCs, reading the arguments as an unsigned short instead of two bytes is a good win.

* oparg is more "local" with this patch: its value doesn't need to be saved across an iteration of the main loop, allowing it to live in a register only.

* added an explicit "case STOP_CODE:" so that the switch starts at 0 instead of 1 -- that's one instruction less with gcc.

* it seems not to pay off to move reading the argument at the start of each case of an operation that expects one, even though it removes the unpredictable branch "if (HAS_ARG(op))".

This patch should be timed on other platforms to make sure that it doesn't slow things down.  If it does, then only reading the arg as an unsigned short could be checked in -- it is compilation-conditional over the fact that shorts are 2 bytes in little endian order.

By the way, anyone knows why 'stack_pointer' isn't a 'register' local?  I bet it would make a difference on PowerPC, for example, with compilers that care about this keyword.

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2004-05-12 10:27

Message:
Logged In: YES 
user_id=80475

Tim, I remember you having some options about these sort of
optimizations.  Will you take a brief look at Armin's latest
patch.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2004-05-10 09:26

Message:
Logged In: YES 
user_id=4771

Tested on a MacOSX box, the patch also gives a 5% speed-up
there.  Allowing stack_pointer to be in a register is a very
good idea.  (all tests with Pystone)

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2004-05-10 05:48

Message:
Logged In: YES 
user_id=4771

The short trick might be a bit fragile.  For example, the current patch would incorrectly use it on machines where unaligned accesses are forbidden.

I isolated the other issue I talked about (making stack_pointer a register variable) in a separate patch.  This patch alone is clearly safe.  It should give a bit of speed-up on any machine but it definitely gives 5% on PCs with gcc by forcing the two most important local variables into specific registers.  (If someone knows the corresponding syntax for other compilers, it can be added in the #if.)

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-04-28 18:45

Message:
Logged In: YES 
user_id=80475

With MSVC++ 6.0 under WinME on a Pentium III, there is no
change in timing (measurements accurate within 0.25%):

I wonder if the speedup from retrieving the unsigned short
is offset by alignment penalties when the starting address
is odd.


----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2004-04-28 16:02

Message:
Logged In: YES 
user_id=4771

stack_pointer isn't a register because its address is taken at two places.  This is a really bad idea for optimization.  Instead of &stack_pointer, we should do:

PyObject **sp = stack_pointer;
... use &sp ...
stack_pointer = sp;

I'm pretty sure this simple change along with a 'register' declaration of stack_pointer gives a good speed-up on all architectures with plenty of registers.

For PCs I've experimented with forcing one or two locals into specific registers, with the gcc syntax  asm("esi"), asm("ebx"), etc.  Forcing stack_pointer and next_instr gives another 3-4% of improvement.

Next step is to see if this can be done with #if's for common compilers beside gcc.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=943898&group_id=5470



More information about the Patches mailing list