[Python-Dev] VM and Language summit info for those not at Pycon (and those that are!)

Stefan Behnel stefan_ml at behnel.de
Sun Mar 20 21:59:13 CET 2011


[warning, long post ahead]

Guido van Rossum, 20.03.2011 17:17:
> Hi Stefan,

Hi!

> I'm glad to see Cython picking up steam and trying to compete with
> CPython, PyPy, and possibly others.

We do, although our main focus is much more on targeted manual optimisation 
rather than whole applications. For example, we currently only boot-strap 
seven out of some 60 modules of Cython itself into C code, as those are the 
most critical parts (mostly related to parsing and syntax tree traversal) 
and the rest simply takes too long to run through the C compiler for too 
little overall benefit. Compiling only the top-4 pure Python modules of the 
Cython compiler, optimised with externally provided type annotations, lets 
the overall compiler run about twice as fast.


> It's true that few in the core
> development group know much about Cython -- essentially my own
> understanding is still that it's like Pyrex, which was a
> mostly-Python-compatible syntax for writing C extensions, but
> certainly not able to compile all existing Python code successfully,
> due to small syntactic incompatibilities (e.g. new reserved words).

We decide the available syntax/keywords based on the file extension (.py vs 
.pyx), and you can compile (most) Python 3 code by putting a comment at the 
top of your source file that specifies the language level (or by passing 
"-3" on the command line).

A nice feature is that we support CPython 2.3 through 3.2 with the same 
generated C code file, so you can ship the quality assured generated C code 
and then put your trust into a C compiler to do the right thing for a given 
runtime on a given machine, without requiring normal users to have (the 
right version of) Cython installed on their side.

Cython always works at a per-module level, does a bit of function/closure 
local static type inference and optimistic optimisations and has a few 
intentional semantic differences compared to Python, such as mostly fixed 
builtins (unless re-assigned directly inside of a module, or deemed 
uninteresting, such as "open"). This allows the compiler to apply a huge 
set of optimisations to the syntax tree and the generated C code that 
substantially speed up the usage of builtin types as well as common idioms 
for looping and iteration.

We still can't compile "all existing Python code", though, as there are 
several known bugs regarding Python semantics. However, we recently managed 
to get pretty close to Python language feature completeness and hope to 
finish that up soon.

In short: if it works, it works, and it usually only gets better. ;-)


> I also thought that Cython was mostly popular is scientific computing
> circles. But I may well be wrong on all counts. Please enlighten me.

It does have a large and quickly growing user base in scientific computing. 
I would even say that Cython is growing into one of the main features that 
Python has in that field, right next to NumPy/SciPy, which are also 
starting to migrate parts of their code to Cython to improve its 
portability and maintainability. Cython is a great way to get numeric code 
up to speed without descending all the way down into a low-level language.

Cython is IMHO also the most advanced FFI that CPython has, much faster 
than any other wrapper tool and also more natural to use than ctypes. We 
natively support interaction with C and C++ code, and there's external 
support (fwrap) for Fortran as well, as that's extremely important in the 
high-performance computing area.

Finally, Cython is a programming language in its own right, that aims to 
close the not-so-small gap between Python and C/C++, both in terms of 
performance and usability. For example, we support many Python programming 
idioms on C data types, such as for-loop iteration over sliced pointers, 
and tightly adapt the code generated for a given language construct to the 
data types it operates on. While code can often be rewritten in a more 
C-ish way if performance truly demands it, we rather try to avoid that need 
by constantly improving and tailoring the generated code instead.

'nough sold? :-)

There are also Cython language semantics that we are still working on, such 
as support for some tricky C++ features. We generally try to follow Python 
semantics whenever possible, but need to accept in some cases that C/C++ 
have language semantics that we cannot hide from the user or that are 
actually helpful for users (yes, that really happens :). Advancing the 
language in these fields is pretty interesting business.


> I'd like to hear more about Cython's compatibility -- e.g. does it
> compile Django?

Never been that ambitious. As I said, Cython works at the module level, and 
you'd likely only compile some performance critical modules of an 
application anyway, in order to keep the overall code development more 
flexible.

A common approach is to profile an application, take the top-k modules and 
try to compile them in Cython. If they fail to compile, adapt the code as 
needed to get them compiling, then profile the application again to see 
what that gave you (Cython supports cProfile). Optimise the top-j 
functions/classes/methods by adding static type declarations to drop the 
code deeper into C, until it's fast enough. If you can't get it fast enough 
that way, change the code in well selected places to make it more C-ish. If 
that's not enough either, rewrite the critical part of it in C or maybe 
Fortran, then call that from Cython code. The last step obviously leaves 
Python code compatibility, but you can usually get away with a separate 
wrapper module and a conditional import, which is simple enough.

OTOH, you can also use "pyximport" to integrate Cython as a JIT-like 
compiler that tries to compile a Python module on import and if that works, 
use the compiled version instead of the plain Python version. I might try 
that with the Django benchmark once Cython's current feature branches are 
merged into mainline.


> How does it do on the benchmark suite used by
> PyPy (originated with Unladen Swallow)?

As I already mentioned, I only tried some of the simpler modules so far. 
It's usually quite a bit faster than CPython for what I tried, especially 
the numeric computation ones from Debian's shootout can be made to run a 
couple of hundred times faster (more or less as fast as C code), *if* you 
apply manual code modifications or at least externally provide static types 
and drop Python classes into optimised extension types (which can also be 
done externally). So it's usually the required manual work that stops us 
from getting better results (and it also smells like cheating if you change 
the benchmarked code).

Without manual interaction, speed-ups commonly only range from 10-30% 
compared to CPython, with the lower speed-ups often due to an extensive 
usage of Python classes and CPython specific optimisation tricks that 
Cython could do better if it understood their intention.


> IMO it's up to Cython to prove its worth.

It also depends on what you consider it worth *for*. CPython could start 
using Cython gradually in very fine steps, and I'd argue that the benefits 
for CPython are far beyond plain "run&win" performance improvements. I 
think the main selling point for Cython code in CPython is that it opens up 
an extremely wide field of code optimisations without requiring C code to 
be written and maintained.

Even for non-CPython runtimes, Cython code would likely be easier to port 
than C code, as it has a much better signal-to-noise ratio.

Stefan



More information about the Python-Dev mailing list