[Cython] speed.pypy.org

Stefan Behnel stefan_ml at behnel.de
Sat Apr 16 10:20:12 CEST 2011


Robert Bradshaw, 16.04.2011 08:53:
> On Fri, Apr 15, 2011 at 1:20 PM, Stefan Behnel wrote:
>> Stefan Behnel, 11.04.2011 15:08:
>>>
>>> I'm currently discussing with Maciej Fijalkowski (PyPy) how to get Cython
>>> running on speed.pypy.org (that's what I wrote "cythonrun" for). If it
>>> works out well, we may have it up in a couple of days.
>>>
>>> I would expect that Cython won't be a big winner in this game, given that
>>> it will only compile plain untyped Python code. It's also going to fail
>>> entirely in some of the benchmarks. But I think it's worth having it up
>>> there, simply as a way for us to see where we are performance-wise and to
>>> get quick (nightly) feed-back about optimisations we try. The benchmark
>>> suite is also a nice set of real-world Python code that will allow us to
>>> find compliance issues.
>>
>> Ok, here's what I have so far. I fixed a couple of bugs in Cython and got at
>> least some of the benchmarks running. Note that they are actually simple
>> ones, only a single module. Basically all complex benchmarks fail due to
>> known bugs, such as Cython def functions not accepting attribute assignments
>> (e.g. on wrapping). There's also a problem with code that uses platform
>> specific names conditionally, such as WindowsError when running on Windows.
>> Cython complains about non-builtin names here. I'm considering to turn that
>> into a visible warning instead of an error, so that the name would instead
>> be looked up dynamically to let the code fail at runtime *iff* it reaches
>> the name lookup.
>
> Given the usefulness of the error, and the (relative) lack of issues
> with it so far, I'd rather not turn it into only a warning by default
> (though an option might be nice). Another option would be to whitelist
> the presumably small, finite set of names that are platform-dependent.

I agree, this has caught countless bugs in the past. I think a whitelist 
makes sense, but note that this does not obey Python semantics, either. In 
Python, any unknown name is just fine as long as it's not being looked up. 
Even though the use cases for this are clearly less common than the cases 
where it bites users.

I'm currently changing the builtins caching support to simply not cache 
unknown names, so that they will be looked up at runtime at the point where 
they are used (even though, of cause, they are compile time errors by 
default). In combination with a whitelist and with an option to make 
unknown builtins a warning instead of an error, this will give us a pretty 
good default trade-off between Python semantics, safety and performance, 
with an easy option for better Python compatibility.


>> Anyway, here are the numbers. I got them with "auto_cpdef" enabled, although
>> that doesn't even seem to make that a big difference. The baseline is a
>> self-compiled Python 2.7.1+ (about a month old).
>
> Cool.  So basically everything is faster, usually somewhere between a
> 50-100% improvement. There's lots of room for improvement, though a
> JIT has a significant advantage that we don't get for untyped code.

Sure, we won't be as fast as PyPy for plain untyped Python code. But the 
benchmark suite gives us a clear target, both in terms of performance and 
compatibility.

Stefan


More information about the cython-devel mailing list