[Patches] [ python-Patches-602191 ] single shared ticker
noreply@sourceforge.net
noreply@sourceforge.net
Tue, 03 Sep 2002 11:46:54 -0700
Patches item #602191, was opened at 2002-08-29 22:01
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=602191&group_id=5470
Category: Core (C code)
Group: None
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Skip Montanaro (montanaro)
Summary: single shared ticker
Initial Comment:
Per discussion on python-dev, here's a patch that gets rid of the
per-thread ticker, instead sharing a single one amongst all threads
(and long ints).
----------------------------------------------------------------------
>Comment By: Tim Peters (tim_one)
Date: 2002-09-03 14:46
Message:
Logged In: YES
user_id=31435
I get a 1.5% speedup on Win2K, baseline versus whatever
this patch does. Go for it,
Skip, as I said before,
"""
_Py_CheckInterval needn't be delcared volatile, BTW; i.e.,
take the "volatile" out of
+ PyAPI_DATA(volatile int) _Py_CheckInterval;
"""
The patch won't compile under MSVC6 unless this is done
(as is, the declaration conflicts with the definition).
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2002-09-03 14:21
Message:
Logged In: YES
user_id=6380
FWIW, I measure a similar slowdown with the latest patch,
compared to current CVS.
But I still think it's good to check this in now, Skip. Then
in a separate patch we'll increment Py_CheckInterval to 100.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2002-08-30 18:36
Message:
Logged In: YES
user_id=44345
I just went back and re-ran everything. Here are the results.
methodology:
* adjust source
* recompile (always gcc 2.96 w/ -O3)
* run pystones twice, ignoring values
* run pystones three times, reporting values
------------------------------------------------------------------------------
baseline (none of this stuff)
Pystone(1.1) time for 50000 passes = 7.68
This machine benchmarks at 6510.42 pystones/second
Pystone(1.1) time for 50000 passes = 7.68
This machine benchmarks at 6510.42 pystones/second
Pystone(1.1) time for 50000 passes = 7.67
This machine benchmarks at 6518.9 pystones/second
------------------------------------------------------------------------------
baseline + check interval set to 100
Pystone(1.1) time for 50000 passes = 7.64
This machine benchmarks at 6544.5 pystones/second
Pystone(1.1) time for 50000 passes = 7.63
This machine benchmarks at 6553.08 pystones/second
Pystone(1.1) time for 50000 passes = 7.62
This machine benchmarks at 6561.68 pystones/second
------------------------------------------------------------------------------
global, volatile _Py_Ticker == 10
global, nonvolatile _Py_CheckInterval == 10
Jeremy's Py_AddPendingCall shortcut enabled
Pystone(1.1) time for 50000 passes = 7.67
This machine benchmarks at 6518.9 pystones/second
Pystone(1.1) time for 50000 passes = 7.7
This machine benchmarks at 6493.51 pystones/second
Pystone(1.1) time for 50000 passes = 7.68
This machine benchmarks at 6510.42 pystones/second
------------------------------------------------------------------------------
global, volatile _Py_Ticker == 100
global, nonvolatile _Py_CheckInterval == 100
Jeremy's Py_AddPendingCall shortcut enabled
Pystone(1.1) time for 50000 passes = 7.67
This machine benchmarks at 6518.9 pystones/second
Pystone(1.1) time for 50000 passes = 7.6
This machine benchmarks at 6578.95 pystones/second
Pystone(1.1) time for 50000 passes = 7.62
This machine benchmarks at 6561.68 pystones/second
------------------------------------------------------------------------------
global, nonvolatile _Py_Ticker == 100
global, nonvolatile _Py_CheckInterval == 100
Jeremy's Py_AddPendingCall shortcut disabled
Pystone(1.1) time for 50000 passes = 7.67
This machine benchmarks at 6518.9 pystones/second
Pystone(1.1) time for 50000 passes = 7.66
This machine benchmarks at 6527.42 pystones/second
Pystone(1.1) time for 50000 passes = 7.64
This machine benchmarks at 6544.5 pystones/second
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2002-08-30 16:58
Message:
Logged In: YES
user_id=44345
Sorry, compared to the previous version (_Py_CheckInterval == 10,
Jeremy's shortcut installed, but _Py_Ticker not yet declared volatile). My
first submission didn't include any performance tests. That was the first
thing Guido asked for, so I started running pystones with each change.
Let me know what, if anything, you'd like me to report performance-wise.
Skip
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-08-30 16:02
Message:
Logged In: YES
user_id=31435
A performance hit compared to what? There are half a
dozen variations stacked up now. _Py_CheckInterval is
back to 10 in the latest patch, and perhaps that has
something to do with it.
_Py_CheckInterval needn't be delcared volatile, BTW; i.e.,
take the "volatile" out of
+ PyAPI_DATA(volatile int) _Py_CheckInterval;
I can't time this today, but you should be just as keen to
get x-platform verification when claiming a performance hit
as when claiming a performance boost. Chances are that it
will slobber all over the place across compilers; ceval is
extremely touchy. I'm sitting on a major slowdown under
MSCV6 after the SET_LINENO thing, and I'm not panicked
about that <wink>.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2002-08-30 15:08
Message:
Logged In: YES
user_id=44345
This version declares the ticker volatile. Obvious performance hit. Does it
need to be volatile if the Jeremy's shortcut is removed?
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2002-08-30 14:26
Message:
Logged In: YES
user_id=44345
bump initial check interval to 100, per request from Tim & Guido.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-08-30 13:10
Message:
Logged In: YES
user_id=31435
Note that the ticker has to be declared volatile. Also
"""
Py_MakePendingCalls should also be changed then to
reset ticker to 0 in its "early exit because the coincidences
I'm relying on haven't happened yet" cases.
"""
You can't predict whether ceval will slow down or speed up,
so don't bother with being confused <wink>. Get the code
right first. Good Things will follow.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2002-08-30 12:53
Message:
Logged In: YES
user_id=44345
Here's a new version that includes Jeremy's shortcut. With
_Py_CheckInterval initialized to 10 here are the pystones numbers I get:
with my initial patch & Jeremy's ticker shortcut:
Pystone(1.1) time for 50000 passes = 7.67
This machine benchmarks at 6518.9 pystones/second
Pystone(1.1) time for 50000 passes = 7.67
This machine benchmarks at 6518.9 pystones/second
Pystone(1.1) time for 50000 passes = 7.65
This machine benchmarks at 6535.95 pystones/second
back to just my initial patch without the shortcut:
Pystone(1.1) time for 50000 passes = 7.59
This machine benchmarks at 6587.62 pystones/second
Pystone(1.1) time for 50000 passes = 7.56
This machine benchmarks at 6613.76 pystones/second
Pystone(1.1) time for 50000 passes = 7.56
This machine benchmarks at 6613.76 pystones/second
I'm perplexed by the performance difference. Again, I think these
performance numbers should be checked by some other people. BTW, I
configured with
OPT=-O3 ../configure
in my build directory. I'm using gcc 2.96 and glibc 2.2.4.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-08-30 12:01
Message:
Logged In: YES
user_id=31435
I'd be more interested in a patch that also implemented
Jeremy's suggestion.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2002-08-30 11:43
Message:
Logged In: YES
user_id=44345
minor correction to the patch - initialize both _Py_CheckInterval and
_Py_Ticker to 10 so direct pystones comparisons can be made.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2002-08-30 00:15
Message:
Logged In: YES
user_id=44345
Here's an updated patch. It creates two internal globals, _Py_Ticker and
_Py_CheckInterval. They are accessed from sysmodule.c, ceval.c and
longobject.c
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=602191&group_id=5470