[Python-Dev] proposal+patch: sys.gettickeraccumulation()
Ralf W. Grosse-Kunstleve
rwgk at cci.lbl.gov
Sat Nov 13 05:20:03 CET 2004
Proposal:
>>> sys.gettickeraccumulation.__doc__
'getgettickeraccumulation() -> current count of bytecodes processed by the interpreter.'
Target environments:
Python-based systems with (many) extension modules.
Motivation:
A fast, easy, and non-intrusive method for estimating how much time
is spent in Python code and how much time in extension modules.
E.g. if numarray is used it is often difficult to estimate how much
time is spent in the numarray extension and how much in
application-specific Python code. Is it worth investing time
reimplementing the application-specific code in C or C++? If most of
the time is already spent in the numarray extension the answer will
be no.
Method:
Determination of "micro-seconds per tick:"
time.time() / sys.gettickeraccumulation() * 1.e6
A "tick" is the processing of one "Python virtual instruction."
See also "setcheckinterval" here:
http://www.python.org/doc/current/lib/module-sys.html
A pure Python program will spent the bulk of the time interpreting
bytecode. This will lead to a small value for "micro-seconds per
tick." In contrast, if almost all the time is spent in extension
modules, a calculation may run a long time without increasing
sys.gettickeraccumulation(). For example:
time ticks time/tick in micro-seconds
0.00% Python 0.008 20 405.908
10.00% Python 0.072 444680 0.163
20.00% Python 0.138 889340 0.155
30.00% Python 0.203 1334000 0.152
...
100.00% Python 0.659 4446620 0.148
With higher resolution:
time ticks time/tick in micro-seconds
0.00% Python 6.758 20 337893.748
0.10% Python 0.728 444680 1.638
0.20% Python 0.795 889340 0.894
0.30% Python 0.837 1334000 0.627
0.40% Python 0.923 1778660 0.519
0.50% Python 0.978 2223320 0.440
0.60% Python 1.091 2667980 0.409
0.70% Python 1.111 3112640 0.357
0.80% Python 1.191 3557300 0.335
0.90% Python 1.236 4001960 0.309
1.00% Python 1.315 4446620 0.296
In real applications we observed times/tick of around 10 on the same
platform, indicating that significant performance increases could
only be achieved through extensive recoding in a compiled language.
On the other hand if we see values smaller than 1 we know that
significant performance increases are achievable with a reasonable
effort.
Implementation:
A full patch based on Python-2.4b2 is attached. The essence is this
additional line in Python/ceval.c:
+ _Py_TickerAccumulation += _Py_CheckInterval - _Py_Ticker;
Note that _Py_Ticker and _Py_CheckInterval exist already in the
Python distribution. The impact of the additional code on the runtime
performance of a pure Python program is minute. On Xeon/Linux the
factor is smaller than 1.00005. The factor is even smaller if
extension modules are used.
The full patch, the patched files, and a complete Python distribution
including the patched files can be found here:
http://cci.lbl.gov/~rwgk/python/Python-2.4b2_ticker_patch
http://cci.lbl.gov/~rwgk/python/Python-2.4b2_ticker_patched_files.tar.gz
http://cci.lbl.gov/~rwgk/python/Python-2.4b2_ticker.tar.gz
diff -u -r Python-2.4b2/Include/ceval.h Python-2.4b2_ticker/Include/ceval.h
--- Python-2.4b2/Include/ceval.h 2004-10-10 19:40:35.000000000 -0700
+++ Python-2.4b2_ticker/Include/ceval.h 2004-11-12 18:16:28.000000000 -0800
@@ -68,6 +68,11 @@
/* this used to be handled on a per-thread basis - now just two globals */
PyAPI_DATA(volatile int) _Py_Ticker;
+#ifndef HAVE_LONG_LONG
+PyAPI_DATA(volatile long) _Py_TickerAccumulation;
+#else
+PyAPI_DATA(volatile PY_LONG_LONG) _Py_TickerAccumulation;
+#endif
PyAPI_DATA(int) _Py_CheckInterval;
/* Interface for threads.
diff -u -r Python-2.4b2/Objects/longobject.c Python-2.4b2_ticker/Objects/longobject.c
--- Python-2.4b2/Objects/longobject.c 2004-09-19 23:14:54.000000000 -0700
+++ Python-2.4b2_ticker/Objects/longobject.c 2004-11-12 18:20:53.000000000 -0800
@@ -38,6 +38,7 @@
#define SIGCHECK(PyTryBlock) \
if (--_Py_Ticker < 0) { \
+ _Py_TickerAccumulation += _Py_CheckInterval - _Py_Ticker; \
_Py_Ticker = _Py_CheckInterval; \
if (PyErr_CheckSignals()) { PyTryBlock; } \
}
diff -u -r Python-2.4b2/PC/os2emx/python24.def Python-2.4b2_ticker/PC/os2emx/python24.def
--- Python-2.4b2/PC/os2emx/python24.def 2004-10-10 19:40:50.000000000 -0700
+++ Python-2.4b2_ticker/PC/os2emx/python24.def 2004-11-12 18:16:47.000000000 -0800
@@ -743,6 +743,7 @@
"_Py_CheckRecursionLimit"
"_Py_CheckInterval"
"_Py_Ticker"
+ "_Py_TickerAccumulation"
; From python24_s.lib(compile)
"PyCode_New"
diff -u -r Python-2.4b2/Python/ceval.c Python-2.4b2_ticker/Python/ceval.c
--- Python-2.4b2/Python/ceval.c 2004-10-10 19:40:50.000000000 -0700
+++ Python-2.4b2_ticker/Python/ceval.c 2004-11-12 18:22:55.000000000 -0800
@@ -373,6 +373,7 @@
pendinglast = j;
_Py_Ticker = 0;
+ _Py_TickerAccumulation = 0;
things_to_do = 1; /* Signal main loop */
busy = 0;
/* XXX End critical section */
@@ -476,6 +477,11 @@
per thread, now just a pair o' globals */
int _Py_CheckInterval = 100;
volatile int _Py_Ticker = 100;
+#ifndef HAVE_LONG_LONG
+volatile long _Py_TickerAccumulation = 0;
+#else
+volatile PY_LONG_LONG _Py_TickerAccumulation = 0;
+#endif
PyObject *
PyEval_EvalCode(PyCodeObject *co, PyObject *globals, PyObject *locals)
@@ -776,6 +782,7 @@
a try: finally: block uninterruptable. */
goto fast_next_opcode;
}
+ _Py_TickerAccumulation += _Py_CheckInterval - _Py_Ticker;
_Py_Ticker = _Py_CheckInterval;
tstate->tick_counter++;
#ifdef WITH_TSC
diff -u -r Python-2.4b2/Python/sysmodule.c Python-2.4b2_ticker/Python/sysmodule.c
--- Python-2.4b2/Python/sysmodule.c 2004-08-12 11:19:17.000000000 -0700
+++ Python-2.4b2_ticker/Python/sysmodule.c 2004-11-12 18:51:14.000000000 -0800
@@ -442,6 +442,20 @@
"getcheckinterval() -> current check interval; see setcheckinterval()."
);
+static PyObject *
+sys_gettickeraccumulation(PyObject *self, PyObject *args)
+{
+#ifndef HAVE_LONG_LONG
+ return PyInt_FromLong(_Py_TickerAccumulation + _Py_CheckInterval - _Py_Ticker);
+#else
+ return PyLong_FromLongLong(_Py_TickerAccumulation + _Py_CheckInterval - _Py_Ticker);
+#endif
+}
+
+PyDoc_STRVAR(gettickeraccumulation_doc,
+"gettickeraccumulation() -> current count of bytecodes processed by the interpreter."
+);
+
#ifdef WITH_TSC
static PyObject *
sys_settscdump(PyObject *self, PyObject *args)
@@ -763,6 +777,8 @@
setcheckinterval_doc},
{"getcheckinterval", sys_getcheckinterval, METH_NOARGS,
getcheckinterval_doc},
+ {"gettickeraccumulation", sys_gettickeraccumulation, METH_NOARGS,
+ gettickeraccumulation_doc},
#ifdef HAVE_DLOPEN
{"setdlopenflags", sys_setdlopenflags, METH_VARARGS,
setdlopenflags_doc},
More information about the Python-Dev
mailing list