[Cython] Generators & closure optimization

Vitja Makarov vitja.makarov at gmail.com
Mon Dec 26 20:07:58 CET 2011

2011/12/25 Stefan Behnel <stefan_ml at behnel.de>:
> Stefan Behnel, 21.12.2011 21:17:
>> Vitja Makarov, 21.12.2011 19:48:
>>> Some time ago we were talking about generators optimization by copying
>>> local variables from closure into local scope.
>> Yes, I think that will make it easier for the C compiler to make
>> optimistic
>> assumptions about external values.
>>> Now I think that should be a good idea to implement this for both
>>> generators and regular closure functions. So local var will be used
>>> for reference and assignment should be made to both copies. Of course
>>> there are some variables that shouldn't be copied: non-local vars,
>>> arrays, C++ classes and structures.
>> Basically, anything that external code can modify. That makes it a bit
>> tricky to do it also for 'normal' closure functions - the whole idea is
>> that there is more than one function that can refer to a variable.
>>> Also it may be a good idea to move outer scope pointer into local
>>> variable.
>>> So I'm wondering what is a good test to measure actual speedup?
>> http://blog.behnel.de/index.php?p=163
>> Just take the plain Python versions of the iterparse functions and compare
>> them before and after the change. The raw C implementation in CPython
>> gives
>> a good baseline.
>> Actually, it would be generally interesting to run the Cython versions
>> through callgrind to see where the time is actually being spent.
> Another useful test is the "nqueens" benchmark from the CPython test suite.
> It's regularly run on Jenkins in Py2.7 and 3.3.
> https://sage.math.washington.edu:8091/hudson/view/bench/
> Note that it mostly uses generator expressions, which could easily benefit
> from a couple of further optimisations by specialising them, e.g. by
> providing a length hint.
> http://trac.cython.org/cython_trac/ticket/756

I have implemented local variable copying, it could be found here:


I didn't noticed significant speedup running nqueens test. Indeed I'm
not sure it's speedup.
It's all about <+-2%. Perhaps some better test is required.

Btw, I got ~8% speedup for this dummy test:
def foo():
    cdef int i, r
    cdef list o
    o = []
    def bar():
        return len(o)
    for i in range(10000000):
        r += len(o)
    return r

What's iterparse()?


More information about the cython-devel mailing list