[Patches] [ python-Patches-602191 ] single shared ticker

noreply@sourceforge.net noreply@sourceforge.net
Tue, 03 Sep 2002 11:46:54 -0700


Patches item #602191, was opened at 2002-08-29 22:01
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=602191&group_id=5470

Category: Core (C code)
Group: None
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Skip Montanaro (montanaro)
Summary: single shared ticker

Initial Comment:
Per discussion on python-dev, here's a patch that gets rid of the 
per-thread ticker, instead sharing a single one amongst all threads 
(and long ints).


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2002-09-03 14:46

Message:
Logged In: YES 
user_id=31435

I get a 1.5% speedup on Win2K, baseline versus whatever 
this patch does.  Go for it,

Skip, as I said before,

"""
_Py_CheckInterval needn't be delcared volatile, BTW; i.e., 
take the "volatile" out of

+ PyAPI_DATA(volatile int) _Py_CheckInterval;
"""

The patch won't compile under MSVC6 unless this is done 
(as is, the declaration conflicts with the definition).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-09-03 14:21

Message:
Logged In: YES 
user_id=6380

FWIW, I measure a similar slowdown with the latest patch,
compared to current CVS.

But I still think it's good to check this in now, Skip. Then
in a separate patch we'll increment Py_CheckInterval to 100.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-08-30 18:36

Message:
Logged In: YES 
user_id=44345

I just went back and re-ran everything.  Here are the results.



methodology:
  * adjust source
  * recompile (always gcc 2.96 w/ -O3)
  * run pystones twice, ignoring values
  * run pystones three times, reporting values


------------------------------------------------------------------------------
baseline (none of this stuff)

Pystone(1.1) time for 50000 passes = 7.68
This machine benchmarks at 6510.42 pystones/second
Pystone(1.1) time for 50000 passes = 7.68
This machine benchmarks at 6510.42 pystones/second
Pystone(1.1) time for 50000 passes = 7.67
This machine benchmarks at 6518.9 pystones/second


------------------------------------------------------------------------------
baseline + check interval set to 100

Pystone(1.1) time for 50000 passes = 7.64
This machine benchmarks at 6544.5 pystones/second
Pystone(1.1) time for 50000 passes = 7.63
This machine benchmarks at 6553.08 pystones/second
Pystone(1.1) time for 50000 passes = 7.62
This machine benchmarks at 6561.68 pystones/second


------------------------------------------------------------------------------
global, volatile _Py_Ticker == 10
global, nonvolatile _Py_CheckInterval == 10
Jeremy's Py_AddPendingCall shortcut enabled

Pystone(1.1) time for 50000 passes = 7.67
This machine benchmarks at 6518.9 pystones/second
Pystone(1.1) time for 50000 passes = 7.7
This machine benchmarks at 6493.51 pystones/second
Pystone(1.1) time for 50000 passes = 7.68
This machine benchmarks at 6510.42 pystones/second


------------------------------------------------------------------------------
global, volatile _Py_Ticker == 100
global, nonvolatile _Py_CheckInterval == 100
Jeremy's Py_AddPendingCall shortcut enabled

Pystone(1.1) time for 50000 passes = 7.67
This machine benchmarks at 6518.9 pystones/second
Pystone(1.1) time for 50000 passes = 7.6
This machine benchmarks at 6578.95 pystones/second
Pystone(1.1) time for 50000 passes = 7.62
This machine benchmarks at 6561.68 pystones/second


------------------------------------------------------------------------------
global, nonvolatile _Py_Ticker == 100
global, nonvolatile _Py_CheckInterval == 100
Jeremy's Py_AddPendingCall shortcut disabled

Pystone(1.1) time for 50000 passes = 7.67
This machine benchmarks at 6518.9 pystones/second
Pystone(1.1) time for 50000 passes = 7.66
This machine benchmarks at 6527.42 pystones/second
Pystone(1.1) time for 50000 passes = 7.64
This machine benchmarks at 6544.5 pystones/second



----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-08-30 16:58

Message:
Logged In: YES 
user_id=44345

Sorry, compared to the previous version (_Py_CheckInterval == 10, 
Jeremy's shortcut installed, but _Py_Ticker not yet declared volatile).  My 
first submission didn't include any performance tests.  That was the first 
thing Guido asked for, so I started running pystones with each change.

Let me know what, if anything, you'd like me to report performance-wise.

Skip


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-08-30 16:02

Message:
Logged In: YES 
user_id=31435

A performance hit compared to what?  There are half a 
dozen variations stacked up now.   _Py_CheckInterval is 
back to 10 in the latest patch, and perhaps that has 
something to do with it.

_Py_CheckInterval needn't be delcared volatile, BTW; i.e., 
take the "volatile" out of

+ PyAPI_DATA(volatile int) _Py_CheckInterval;

I can't time this today, but you should be just as keen to 
get x-platform verification when claiming a performance hit 
as when claiming a performance boost.  Chances are that it 
will slobber all over the place across compilers; ceval is 
extremely touchy.  I'm sitting on a major slowdown under 
MSCV6 after the SET_LINENO thing, and I'm not panicked 
about that <wink>.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-08-30 15:08

Message:
Logged In: YES 
user_id=44345

This version declares the ticker volatile.  Obvious performance hit.  Does it 
need to be volatile if the Jeremy's shortcut is removed?


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-08-30 14:26

Message:
Logged In: YES 
user_id=44345

bump initial check interval to 100, per request from Tim & Guido.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-08-30 13:10

Message:
Logged In: YES 
user_id=31435

Note that the ticker has to be declared volatile.  Also

"""
 Py_MakePendingCalls should also be changed then to
reset ticker to 0 in its "early exit because the coincidences 
I'm relying on haven't happened yet" cases.
"""

You can't predict whether ceval will slow down or speed up, 
so don't bother with being confused <wink>.  Get the code 
right first.  Good Things will follow.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-08-30 12:53

Message:
Logged In: YES 
user_id=44345

Here's a new version that includes Jeremy's shortcut.  With 
_Py_CheckInterval initialized to 10 here are the pystones numbers I get:

with my initial patch & Jeremy's ticker shortcut:

Pystone(1.1) time for 50000 passes = 7.67
This machine benchmarks at 6518.9 pystones/second
Pystone(1.1) time for 50000 passes = 7.67
This machine benchmarks at 6518.9 pystones/second
Pystone(1.1) time for 50000 passes = 7.65
This machine benchmarks at 6535.95 pystones/second


back to just my initial patch without the shortcut:

Pystone(1.1) time for 50000 passes = 7.59
This machine benchmarks at 6587.62 pystones/second
Pystone(1.1) time for 50000 passes = 7.56
This machine benchmarks at 6613.76 pystones/second
Pystone(1.1) time for 50000 passes = 7.56
This machine benchmarks at 6613.76 pystones/second

I'm perplexed by the performance difference.  Again, I think these 
performance numbers should be checked by some other people.  BTW, I 
configured with

    OPT=-O3 ../configure

in my build directory.  I'm using gcc 2.96 and glibc 2.2.4.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-08-30 12:01

Message:
Logged In: YES 
user_id=31435

I'd be more interested in a patch that also implemented 
Jeremy's suggestion.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-08-30 11:43

Message:
Logged In: YES 
user_id=44345

minor correction to the patch - initialize both _Py_CheckInterval and 
_Py_Ticker to 10 so direct pystones comparisons can be made.


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-08-30 00:15

Message:
Logged In: YES 
user_id=44345

Here's an updated patch.  It creates two internal globals, _Py_Ticker and 
_Py_CheckInterval.  They are accessed from sysmodule.c, ceval.c and 
longobject.c


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=602191&group_id=5470