[Cython] Generators & closure optimization

Stefan Behnel stefan_ml at behnel.de
Mon Dec 26 21:05:49 CET 2011

Vitja Makarov, 26.12.2011 20:07:
> 2011/12/25 Stefan Behnel:
>> Stefan Behnel, 21.12.2011 21:17:
>>> Vitja Makarov, 21.12.2011 19:48:
>>>> Some time ago we were talking about generators optimization by copying
>>>> local variables from closure into local scope.
>>> Yes, I think that will make it easier for the C compiler to make
>>> optimistic assumptions about external values.
>>>> Now I think that should be a good idea to implement this for both
>>>> generators and regular closure functions. So local var will be used
>>>> for reference and assignment should be made to both copies. Of course
>>>> there are some variables that shouldn't be copied: non-local vars,
>>>> arrays, C++ classes and structures.
>>> Basically, anything that external code can modify. That makes it a bit
>>> tricky to do it also for 'normal' closure functions - the whole idea is
>>> that there is more than one function that can refer to a variable.
>>>> Also it may be a good idea to move outer scope pointer into local
>>>> variable.
>>>> So I'm wondering what is a good test to measure actual speedup?
>>> http://blog.behnel.de/index.php?p=163
>>> Just take the plain Python versions of the iterparse functions and compare
>>> them before and after the change. The raw C implementation in CPython
>>> gives a good baseline.
>>> Actually, it would be generally interesting to run the Cython versions
>>> through callgrind to see where the time is actually being spent.
>> Another useful test is the "nqueens" benchmark from the CPython test suite.
>> It's regularly run on Jenkins in Py2.7 and 3.3.
>> https://sage.math.washington.edu:8091/hudson/view/bench/
>> Note that it mostly uses generator expressions, which could easily benefit
>> from a couple of further optimisations by specialising them, e.g. by
>> providing a length hint.
>> http://trac.cython.org/cython_trac/ticket/756
> I have implemented local variable copying, it could be found here:
> https://github.com/vitek/cython/tree/_copy_closure


> I didn't noticed significant speedup running nqueens test. Indeed I'm
> not sure it's speedup.
> It's all about<+-2%. Perhaps some better test is required.

Well, yes, I'd guess that generator expressions are really not the target 
of this optimisation. They simply don't do enough in between two iterations.

> Btw, I got ~8% speedup for this dummy test:
> def foo():
>      cdef int i, r
>      cdef list o
>      o = []
>      def bar():
>          return len(o)
>      for i in range(10000000):
>          bar()
>          r += len(o)
>      return r

Any speedup will help someone, I guess. And it's good to know that it makes 
a difference if the closure code isn't plain trivial.

> What's iterparse()?

Ah, sorry, my bad. Too much lxml stuff lately, I guess. I really meant 
"itertools". See the blog link above.

Also, consider using callgrind for performance analysis. For one, it'll 
print the number of executed instructions after a test run. For Cython 
generated ode, it's usually a good sign if that decreases after a change. 
For a deeper analysis of such a profiling run, you can then use kcachegrind.


More information about the cython-devel mailing list