[Cython] Generators & closure optimization

Vitja Makarov vitja.makarov at gmail.com
Tue Dec 27 19:34:52 CET 2011


2011/12/27 Stefan Behnel <stefan_ml at behnel.de>:
> Vitja Makarov, 26.12.2011 20:07:
>>
>> 2011/12/25 Stefan Behnel:
>>
>>> Stefan Behnel, 21.12.2011 21:17:
>>>>
>>>> Vitja Makarov, 21.12.2011 19:48:
>>>>>
>>>>>
>>>>> Some time ago we were talking about generators optimization by copying
>>>>> local variables from closure into local scope.
>>>>
>>>>
>>>> Yes, I think that will make it easier for the C compiler to make
>>>> optimistic assumptions about external values.
>>>>
>>>>> Now I think that should be a good idea to implement this for both
>>>>> generators and regular closure functions. So local var will be used
>>>>> for reference and assignment should be made to both copies. Of course
>>>>> there are some variables that shouldn't be copied: non-local vars,
>>>>> arrays, C++ classes and structures.
>>>>
>>>>
>>>>
>>>> Basically, anything that external code can modify. That makes it a bit
>>>> tricky to do it also for 'normal' closure functions - the whole idea is
>>>> that there is more than one function that can refer to a variable.
>>>>
>>>>> Also it may be a good idea to move outer scope pointer into local
>>>>> variable.
>>>>>
>>>>> So I'm wondering what is a good test to measure actual speedup?
>>>>
>>>>
>>>>
>>>> http://blog.behnel.de/index.php?p=163
>>>>
>>>> Just take the plain Python versions of the iterparse functions and
>>>> compare
>>>> them before and after the change. The raw C implementation in CPython
>>>> gives a good baseline.
>>>>
>>>> Actually, it would be generally interesting to run the Cython versions
>>>> through callgrind to see where the time is actually being spent.
>>>
>>>
>>> Another useful test is the "nqueens" benchmark from the CPython test
>>> suite.
>>> It's regularly run on Jenkins in Py2.7 and 3.3.
>>>
>>> https://sage.math.washington.edu:8091/hudson/view/bench/
>>>
>>> Note that it mostly uses generator expressions, which could easily
>>> benefit
>>> from a couple of further optimisations by specialising them, e.g. by
>>> providing a length hint.
>>>
>>> http://trac.cython.org/cython_trac/ticket/756
>>>
>>
>> I have implemented local variable copying, it could be found here:
>>
>> https://github.com/vitek/cython/tree/_copy_closure
>
>
> Cool.
>
>
>
>> I didn't noticed significant speedup running nqueens test. Indeed I'm
>> not sure it's speedup.
>> It's all about<+-2%. Perhaps some better test is required.
>
>
> Well, yes, I'd guess that generator expressions are really not the target of
> this optimisation. They simply don't do enough in between two iterations.
>
>
>
>> Btw, I got ~8% speedup for this dummy test:
>> def foo():
>>     cdef int i, r
>>     cdef list o
>>     o = []
>>     def bar():
>>         return len(o)
>>     for i in range(10000000):
>>         bar()
>>         r += len(o)
>>     return r
>
>
> Any speedup will help someone, I guess. And it's good to know that it makes
> a difference if the closure code isn't plain trivial.
>
>
>> What's iterparse()?
>
>
> Ah, sorry, my bad. Too much lxml stuff lately, I guess. I really meant
> "itertools". See the blog link above.
>
> Also, consider using callgrind for performance analysis. For one, it'll
> print the number of executed instructions after a test run. For Cython
> generated ode, it's usually a good sign if that decreases after a change.
> For a deeper analysis of such a profiling run, you can then use kcachegrind.
>
>

I tried count() generator from your blog: optimized version is about 1% slower.

I think this optimization would make sens for regular closures and
generators that do some real work inside.

-- 
vitja.


More information about the cython-devel mailing list