tuple[int] is 30% slower than list[int]. Please let me explain the problem. (1, 2, 3)[1] has compiled as BINARY_SUBSCR operation. ceval.c has optimization for list: case BINARY_SUBSCR: w = POP(); v = TOP(); if (PyList_CheckExact(v) && PyInt_CheckExact(w)) { /* INLINE: list[int] */ Py_ssize_t i = PyInt_AsSsize_t(w); if (i < 0) i += PyList_GET_SIZE(v); if (i >= 0 && i < PyList_GET_SIZE(v)) { x = PyList_GET_ITEM(v, i); Py_INCREF(x); } else goto slow_get; } else slow_get: x = PyObject_GetItem(v, w); Py_DECREF(v); Py_DECREF(w); SET_TOP(x); if (x != NULL) continue; break; For tuple code follows slow way via tp_as_mapping slot and calls tuplesubscript from Objects/tupleobject.c Looks like tuple is goot applicant to be optimized as list does. tuples is used in python programs very often and getting element from tuple by index as common operation as list[int]. Is there reasons to prevent special case for tuples in BINARY_SUBSCR?
On Sun, 2011-07-24 at 01:20 +0300, Andrew Svetlov wrote:
tuple[int] is 30% slower than list[int]. Please let me explain the problem.
(1, 2, 3)[1] has compiled as BINARY_SUBSCR operation. ceval.c has optimization for list:
case BINARY_SUBSCR: w = POP(); v = TOP(); if (PyList_CheckExact(v) && PyInt_CheckExact(w)) { /* INLINE: list[int] */ Py_ssize_t i = PyInt_AsSsize_t(w); if (i < 0) i += PyList_GET_SIZE(v); if (i >= 0 && i < PyList_GET_SIZE(v)) { x = PyList_GET_ITEM(v, i); Py_INCREF(x); } else goto slow_get; } else slow_get: x = PyObject_GetItem(v, w); Py_DECREF(v); Py_DECREF(w); SET_TOP(x); if (x != NULL) continue; break;
For tuple code follows slow way via tp_as_mapping slot and calls tuplesubscript from Objects/tupleobject.c
Looks like tuple is goot applicant to be optimized as list does. tuples is used in python programs very often and getting element from tuple by index as common operation as list[int].
Is there reasons to prevent special case for tuples in BINARY_SUBSCR?
In latest trunk this optimisation seems to have gone away, the code is now: TARGET(BINARY_SUBSCR) w = POP(); v = TOP(); x = PyObject_GetItem(v, w); Py_DECREF(v); Py_DECREF(w); SET_TOP(x); if (x != NULL) DISPATCH(); break; The implementation of PyObject_GetItem doesn't appear to have changed though. Maybe this optimisation was no longer worth it in practice? Ryan -- Ryan Kelly http://www.rfk.id.au | This message is digitally signed. Please visit ryan@rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
On Sun, 24 Jul 2011 09:13:07 +1000 Ryan Kelly <ryan@rfk.id.au> wrote:
In latest trunk this optimisation seems to have gone away, the code is now:
TARGET(BINARY_SUBSCR) w = POP(); v = TOP(); x = PyObject_GetItem(v, w); Py_DECREF(v); Py_DECREF(w); SET_TOP(x); if (x != NULL) DISPATCH(); break;
The implementation of PyObject_GetItem doesn't appear to have changed though. Maybe this optimisation was no longer worth it in practice?
The optimization was probably removed because PyInt objects don't exist anymore. There's a related but more ambitious patch at http://bugs.python.org/issue10044. In practice however, such micro-optimizations usually have little or no effect. Regards Antoine.
You right. Sorry, I missed changes in ceval.c for py3k. Please note, simple test like: from timeit import timeit print('list', timeit("l[0]", "l = [1]")) print('tuple', timeit("l[0]", "l = (1,)")) Has results: andrew@ocean ~/p/cpython> python2.7 z.py ('list', 0.03479599952697754) ('tuple', 0.046610116958618164) andrew@ocean ~/p/cpython> python3.2 z.py list 0.04870104789733887 tuple 0.04825997352600098 For python2.7 list[int] microoptimization saves 25-30%, while 3.2 (and trunk) very close to "unoptimized" 2.7 version. On Sun, Jul 24, 2011 at 2:27 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sun, 24 Jul 2011 09:13:07 +1000 Ryan Kelly <ryan@rfk.id.au> wrote:
In latest trunk this optimisation seems to have gone away, the code is now:
TARGET(BINARY_SUBSCR) w = POP(); v = TOP(); x = PyObject_GetItem(v, w); Py_DECREF(v); Py_DECREF(w); SET_TOP(x); if (x != NULL) DISPATCH(); break;
The implementation of PyObject_GetItem doesn't appear to have changed though. Maybe this optimisation was no longer worth it in practice?
The optimization was probably removed because PyInt objects don't exist anymore. There's a related but more ambitious patch at http://bugs.python.org/issue10044.
In practice however, such micro-optimizations usually have little or no effect.
Regards
Antoine.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com
Le dimanche 24 juillet 2011 à 03:15 +0300, Andrew Svetlov a écrit :
You right. Sorry, I missed changes in ceval.c for py3k. Please note, simple test like:
from timeit import timeit
print('list', timeit("l[0]", "l = [1]")) print('tuple', timeit("l[0]", "l = (1,)"))
Has results:
andrew@ocean ~/p/cpython> python2.7 z.py ('list', 0.03479599952697754) ('tuple', 0.046610116958618164)
andrew@ocean ~/p/cpython> python3.2 z.py list 0.04870104789733887 tuple 0.04825997352600098
For python2.7 list[int] microoptimization saves 25-30%, while 3.2 (and trunk) very close to "unoptimized" 2.7 version.
My point is that on non-trivial benchmarks, the savings are almost zero. If you look at the (much more complex) patch I linked to, the savings are at most 10% on a couple of select benchmarks, other benchmarks showing almost no difference. Regards Antoine.
I undrestand your point. Thank you for explanation. On Sun, Jul 24, 2011 at 3:18 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Le dimanche 24 juillet 2011 à 03:15 +0300, Andrew Svetlov a écrit :
You right. Sorry, I missed changes in ceval.c for py3k. Please note, simple test like:
from timeit import timeit
print('list', timeit("l[0]", "l = [1]")) print('tuple', timeit("l[0]", "l = (1,)"))
Has results:
andrew@ocean ~/p/cpython> python2.7 z.py ('list', 0.03479599952697754) ('tuple', 0.046610116958618164)
andrew@ocean ~/p/cpython> python3.2 z.py list 0.04870104789733887 tuple 0.04825997352600098
For python2.7 list[int] microoptimization saves 25-30%, while 3.2 (and trunk) very close to "unoptimized" 2.7 version.
My point is that on non-trivial benchmarks, the savings are almost zero. If you look at the (much more complex) patch I linked to, the savings are at most 10% on a couple of select benchmarks, other benchmarks showing almost no difference.
Regards
Antoine.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com
On Jul 23, 2011, at 5:18 PM, Antoine Pitrou wrote:
My point is that on non-trivial benchmarks, the savings are almost zero.
That could be said of any optimization in Python. Typical Python scripts exercise many features at time, so any one optimization by itself if almost useless. Collectively however, we've gotten nice speed-ups on real programs. Raymond
participants (4)
-
Andrew Svetlov -
Antoine Pitrou -
Raymond Hettinger -
Ryan Kelly