[Cython] speed.pypy.org

Sat Apr 16 10:30:31 CEST 2011

On Sat, Apr 16, 2011 at 1:20 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Robert Bradshaw, 16.04.2011 08:53:
>>
>> On Fri, Apr 15, 2011 at 1:20 PM, Stefan Behnel wrote:
>>>
>>> Stefan Behnel, 11.04.2011 15:08:
>>>>
>>>> I'm currently discussing with Maciej Fijalkowski (PyPy) how to get
>>>> Cython
>>>> running on speed.pypy.org (that's what I wrote "cythonrun" for). If it
>>>> works out well, we may have it up in a couple of days.
>>>>
>>>> I would expect that Cython won't be a big winner in this game, given
>>>> that
>>>> it will only compile plain untyped Python code. It's also going to fail
>>>> entirely in some of the benchmarks. But I think it's worth having it up
>>>> there, simply as a way for us to see where we are performance-wise and
>>>> to
>>>> get quick (nightly) feed-back about optimisations we try. The benchmark
>>>> suite is also a nice set of real-world Python code that will allow us to
>>>> find compliance issues.
>>>
>>> Ok, here's what I have so far. I fixed a couple of bugs in Cython and got
>>> at
>>> least some of the benchmarks running. Note that they are actually simple
>>> ones, only a single module. Basically all complex benchmarks fail due to
>>> known bugs, such as Cython def functions not accepting attribute
>>> assignments
>>> (e.g. on wrapping). There's also a problem with code that uses platform
>>> specific names conditionally, such as WindowsError when running on
>>> Windows.
>>> Cython complains about non-builtin names here. I'm considering to turn
>>> that
>>> into a visible warning instead of an error, so that the name would
>>> instead
>>> be looked up dynamically to let the code fail at runtime *iff* it reaches
>>> the name lookup.
>>
>> Given the usefulness of the error, and the (relative) lack of issues
>> with it so far, I'd rather not turn it into only a warning by default
>> (though an option might be nice). Another option would be to whitelist
>> the presumably small, finite set of names that are platform-dependent.
>
> I agree, this has caught countless bugs in the past. I think a whitelist
> makes sense, but note that this does not obey Python semantics, either. In
> Python, any unknown name is just fine as long as it's not being looked up.
> Even though the use cases for this are clearly less common than the cases
> where it bites users.

Well, we certainly want to provide a way for users to disable it, and
would trigger it with the -pedantic flag.

> I'm currently changing the builtins caching support to simply not cache
> unknown names, so that they will be looked up at runtime at the point where
> they are used (even though, of cause, they are compile time errors by
> default). In combination with a whitelist and with an option to make unknown
> builtins a warning instead of an error, this will give us a pretty good
> default trade-off between Python semantics, safety and performance, with an
> easy option for better Python compatibility.

+1

>>> Anyway, here are the numbers. I got them with "auto_cpdef" enabled,
>>> although
>>> that doesn't even seem to make that a big difference. The baseline is a
>>> self-compiled Python 2.7.1+ (about a month old).
>>
>> Cool.  So basically everything is faster, usually somewhere between a
>> 50-100% improvement. There's lots of room for improvement, though a
>> JIT has a significant advantage that we don't get for untyped code.
>
> Sure, we won't be as fast as PyPy for plain untyped Python code. But the
> benchmark suite gives us a clear target, both in terms of performance and
> compatibility.

My thoughts as well. With good control flow in place, we can also look
into generating multiple branches for code (e.g. the likely, fast one
that assumes no overflows, etc. and bailing to the slow, safe one.)

- Robert