[New-bugs-announce] [issue30049] Don't cache tp_iternext

Serhiy Storchaka report at bugs.python.org
Wed Apr 12 04:39:23 EDT 2017


New submission from Serhiy Storchaka:

Some operations cache the value of the tp_iternext slot and call it in a loop. But calling tp_iternext can run arbitrary code, release GIL and change the tp_iternext slot in the same or other thread. This can leads to visible behavior difference, such as different iterated values, different raised exceptions or infinite loop. In the past this even could cause a crash (see issue3720), but seems now this is impossible.

Examples (list constructor caches tp_iternext, tuple constructor doesn't):

>>> def make_iter():
...     class Iterator:
...         def __iter__(self):
...             return self
...         def __next__(self):
...             del Iterator.__next__
...             return 1
...     return Iterator()
... 
>>> tuple(make_iter())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Iterator' object is not iterable
>>> list(make_iter())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: __next__
>>> 
>>> def make_iter2():
...     it2 = iter((2,))
...     def subiter():
...         Iterator.__next__ = Iterator.next2
...         yield 1
...     class Iterator(filter):
...         def next2(self):
...             return next(it2)
...     return Iterator(lambda x: True, subiter())
... 
>>> tuple(make_iter2())
(1, 2)
>>> list(make_iter2())
[1]

The tp_iternext is cached for performance, and removing the caching can cause performance regression. But actually the difference is very small. I found a measurable difference (up to 5%) only in following artificial examples in which tp_iternext is called in very tight and long loops:

$ ./python -m perf timeit --compare-to ./python-orig -s 'a = [0]*1000000+[1]' -- 'next(filter(None, a))'python-orig: ..................... 47.2 ms +- 0.2 ms
python: ..................... 49.7 ms +- 0.3 ms

Mean +- std dev: [python-orig] 47.2 ms +- 0.2 ms -> [python] 49.7 ms +- 0.3 ms: 1.05x slower (+5%)

$ ./python -m perf timeit --compare-to ./python-orig -s 'from itertools import repeat, islice' -- 'next(islice(repeat(1), 1000000, None))'
python-orig: ..................... 15.4 ms +- 0.1 ms
python: ..................... 16.0 ms +- 0.2 ms

Mean +- std dev: [python-orig] 15.4 ms +- 0.1 ms -> [python] 16.0 ms +- 0.2 ms: 1.04x slower (+4%)

$ ./python -m perf timeit --compare-to ./python-orig -s 'from itertools import repeat; from collections import deque' -- 'deque(repeat(1, 1000000), 0)'
python-orig: ..................... 14.2 ms +- 0.1 ms
python: ..................... 14.8 ms +- 0.2 ms

Mean +- std dev: [python-orig] 14.2 ms +- 0.1 ms -> [python] 14.8 ms +- 0.2 ms: 1.05x slower (+5%)

In all other other cases, when involved creation of a collection (list, bytearray, deque (with maxlen != 0) constructors, ''.join), or calling other code (builtins all, max, map, itertools functions), or for shorter loops the difference is hardly distinguished from the random noise.

$ ./python -m perf timeit --compare-to ./python-orig -s 'a = [0]*1000' -- 'list(iter(a))'
python-orig: ..................... 31.8 us +- 0.3 us
python: ..................... 31.8 us +- 0.4 us

Mean +- std dev: [python-orig] 31.8 us +- 0.3 us -> [python] 31.8 us +- 0.4 us: 1.00x faster (-0%)
Not significant!

$ ./python -m perf timeit --compare-to ./python-orig -s 'a = [1]*1000' -- 'all(a)'
python-orig: ..................... 47.4 us +- 0.2 us
python: ..................... 48.0 us +- 0.3 us

Mean +- std dev: [python-orig] 47.4 us +- 0.2 us -> [python] 48.0 us +- 0.3 us: 1.01x slower (+1%)

$ ./python -m perf timeit --compare-to ./python-orig -s 'a = [1]*1000' -- 'max(a)'
python-orig: ..................... 108 us +- 1 us
python: ..................... 108 us +- 1 us

Mean +- std dev: [python-orig] 108 us +- 1 us -> [python] 108 us +- 1 us: 1.00x faster (-0%)
Not significant!

$ ./python -m perf timeit --compare-to ./python-orig -s 'a = [0]*1000000+[1]' -- 'next(filter(lambda x: x, a))'
python-orig: ..................... 527 ms +- 8 ms
python: ..................... 528 ms +- 2 ms

Mean +- std dev: [python-orig] 527 ms +- 8 ms -> [python] 528 ms +- 2 ms: 1.00x slower (+0%)
Not significant!

$ ./python -m perf timeit --compare-to ./python-orig -s 'from itertools import repeat, islice' -- 'next(islice(repeat(1), 100, None))'
python-orig: ..................... 4.72 us +- 0.05 us
python: ..................... 4.72 us +- 0.04 us

Mean +- std dev: [python-orig] 4.72 us +- 0.05 us -> [python] 4.72 us +- 0.04 us: 1.00x faster (-0%)
Not significant!

$ ./python -m perf timeit --compare-to ./python-orig -s 'from itertools import repeat; from collections import deque' -- 'deque(repeat(1, 100), 0)'
python-orig: ..................... 4.16 us +- 0.11 us
python: ..................... 4.11 us +- 0.05 us

Mean +- std dev: [python-orig] 4.16 us +- 0.11 us -> [python] 4.11 us +- 0.05 us: 1.01x faster (-1%)

----------
components: Extension Modules, Interpreter Core
messages: 291530
nosy: rhettinger, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Don't cache tp_iternext
type: behavior
versions: Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30049>
_______________________________________


More information about the New-bugs-announce mailing list