[Cython] AddTraceback() slows down generators

Thu Jan 26 21:19:27 CET 2012

2012/1/27 Stefan Behnel <stefan_ml at behnel.de>:
> Robert Bradshaw, 21.01.2012 23:09:
>> On Sat, Jan 21, 2012 at 10:50 AM, Stefan Behnel wrote:
>>> I did some callgrind profiling on Cython's generators and was surprised to
>>> find that AddTraceback() represents a serious performance penalty for short
>>> running generators.
>>>
>>> I profiled a compiled Python implementation of itertools.groupby(), which
>>> yields (key, group) tuples where the group is an iterator again. I ran this
>>> code in Python for benchmarking:
>>>
>>> """
>>> L = sorted(range(1000)*5)
>>>
>>> all(list(g) for k,g in groupby(L))
>>> """
>>>
>>> Groups tend to be rather short in real code, often just one or a couple of
>>> items, so unpacking the group iterator into a list will usually be a quick
>>> loop and then the generator raises StopIteration on termination and builds
>>> a traceback for it. According to callgrind (which, I should note, tends to
>>> overestimate the amount of time spent in memory allocation), the iteration
>>> during the group unpacking takes about 30% of the overall runtime of the
>>> all() loop, and the AddTraceback() call at the end of each group traversal
>>> takes up to 25% (!) on my side. That means that more than 80% of the group
>>> unpacking time goes into raising StopIteration from the generators. I
>>> attached the call graph with the relative timings.
>>>
>>> About half of the exception raising time is eaten by PyString_FromFormat()
>>> that builds the function-name + line-position string (which, I may note, is
>>> basically a convenience feature). This string is a constant for a
>>> generator's StopIteration exception, at least for each final return point
>>> in a generator, but here it is being recreated over and over again, for
>>> each exception that gets raised.
>>>
>>> Even if we keep creating a new frame instance each time (which should be ok
>>> because CPython has a frame instance cache already and we'd only create one
>>> during the generator lifetime), the whole code object could actually be
>>> cached after the first creation, preferably bound to the lifetime of the
>>> generator creator function/method. Or, more generally, one code object per
>>> generator termination point, which will be a single point in the majority
>>> of cases. For the specific code above, that should shave off almost 20% of
>>> the overall runtime of the all() loop.
>>>
>>> I think that's totally worth doing.
>>
>> Makes sense to me. I did some caching like this for profiling.
>
> Here's a ticket for now:
>
> http://trac.cython.org/cython_trac/ticket/760
>

I think that could be easily fixed. CPython doesn't add any traceback
info for generator's ending.

https://github.com/vitek/cython/commit/63620bc2a29f3064bbdf7a49eefffaae4e3c369d

-- 
vitja.