[Python-Dev] VM and Language summit info for those not at Pycon (and those that are!)
Stefan Behnel
stefan_ml at behnel.de
Sun Mar 20 21:59:13 CET 2011
[warning, long post ahead]
Guido van Rossum, 20.03.2011 17:17:
> Hi Stefan,
Hi!
> I'm glad to see Cython picking up steam and trying to compete with
> CPython, PyPy, and possibly others.
We do, although our main focus is much more on targeted manual optimisation
rather than whole applications. For example, we currently only boot-strap
seven out of some 60 modules of Cython itself into C code, as those are the
most critical parts (mostly related to parsing and syntax tree traversal)
and the rest simply takes too long to run through the C compiler for too
little overall benefit. Compiling only the top-4 pure Python modules of the
Cython compiler, optimised with externally provided type annotations, lets
the overall compiler run about twice as fast.
> It's true that few in the core
> development group know much about Cython -- essentially my own
> understanding is still that it's like Pyrex, which was a
> mostly-Python-compatible syntax for writing C extensions, but
> certainly not able to compile all existing Python code successfully,
> due to small syntactic incompatibilities (e.g. new reserved words).
We decide the available syntax/keywords based on the file extension (.py vs
.pyx), and you can compile (most) Python 3 code by putting a comment at the
top of your source file that specifies the language level (or by passing
"-3" on the command line).
A nice feature is that we support CPython 2.3 through 3.2 with the same
generated C code file, so you can ship the quality assured generated C code
and then put your trust into a C compiler to do the right thing for a given
runtime on a given machine, without requiring normal users to have (the
right version of) Cython installed on their side.
Cython always works at a per-module level, does a bit of function/closure
local static type inference and optimistic optimisations and has a few
intentional semantic differences compared to Python, such as mostly fixed
builtins (unless re-assigned directly inside of a module, or deemed
uninteresting, such as "open"). This allows the compiler to apply a huge
set of optimisations to the syntax tree and the generated C code that
substantially speed up the usage of builtin types as well as common idioms
for looping and iteration.
We still can't compile "all existing Python code", though, as there are
several known bugs regarding Python semantics. However, we recently managed
to get pretty close to Python language feature completeness and hope to
finish that up soon.
In short: if it works, it works, and it usually only gets better. ;-)
> I also thought that Cython was mostly popular is scientific computing
> circles. But I may well be wrong on all counts. Please enlighten me.
It does have a large and quickly growing user base in scientific computing.
I would even say that Cython is growing into one of the main features that
Python has in that field, right next to NumPy/SciPy, which are also
starting to migrate parts of their code to Cython to improve its
portability and maintainability. Cython is a great way to get numeric code
up to speed without descending all the way down into a low-level language.
Cython is IMHO also the most advanced FFI that CPython has, much faster
than any other wrapper tool and also more natural to use than ctypes. We
natively support interaction with C and C++ code, and there's external
support (fwrap) for Fortran as well, as that's extremely important in the
high-performance computing area.
Finally, Cython is a programming language in its own right, that aims to
close the not-so-small gap between Python and C/C++, both in terms of
performance and usability. For example, we support many Python programming
idioms on C data types, such as for-loop iteration over sliced pointers,
and tightly adapt the code generated for a given language construct to the
data types it operates on. While code can often be rewritten in a more
C-ish way if performance truly demands it, we rather try to avoid that need
by constantly improving and tailoring the generated code instead.
'nough sold? :-)
There are also Cython language semantics that we are still working on, such
as support for some tricky C++ features. We generally try to follow Python
semantics whenever possible, but need to accept in some cases that C/C++
have language semantics that we cannot hide from the user or that are
actually helpful for users (yes, that really happens :). Advancing the
language in these fields is pretty interesting business.
> I'd like to hear more about Cython's compatibility -- e.g. does it
> compile Django?
Never been that ambitious. As I said, Cython works at the module level, and
you'd likely only compile some performance critical modules of an
application anyway, in order to keep the overall code development more
flexible.
A common approach is to profile an application, take the top-k modules and
try to compile them in Cython. If they fail to compile, adapt the code as
needed to get them compiling, then profile the application again to see
what that gave you (Cython supports cProfile). Optimise the top-j
functions/classes/methods by adding static type declarations to drop the
code deeper into C, until it's fast enough. If you can't get it fast enough
that way, change the code in well selected places to make it more C-ish. If
that's not enough either, rewrite the critical part of it in C or maybe
Fortran, then call that from Cython code. The last step obviously leaves
Python code compatibility, but you can usually get away with a separate
wrapper module and a conditional import, which is simple enough.
OTOH, you can also use "pyximport" to integrate Cython as a JIT-like
compiler that tries to compile a Python module on import and if that works,
use the compiled version instead of the plain Python version. I might try
that with the Django benchmark once Cython's current feature branches are
merged into mainline.
> How does it do on the benchmark suite used by
> PyPy (originated with Unladen Swallow)?
As I already mentioned, I only tried some of the simpler modules so far.
It's usually quite a bit faster than CPython for what I tried, especially
the numeric computation ones from Debian's shootout can be made to run a
couple of hundred times faster (more or less as fast as C code), *if* you
apply manual code modifications or at least externally provide static types
and drop Python classes into optimised extension types (which can also be
done externally). So it's usually the required manual work that stops us
from getting better results (and it also smells like cheating if you change
the benchmarked code).
Without manual interaction, speed-ups commonly only range from 10-30%
compared to CPython, with the lower speed-ups often due to an extensive
usage of Python classes and CPython specific optimisation tricks that
Cython could do better if it understood their intention.
> IMO it's up to Cython to prove its worth.
It also depends on what you consider it worth *for*. CPython could start
using Cython gradually in very fine steps, and I'd argue that the benefits
for CPython are far beyond plain "run&win" performance improvements. I
think the main selling point for Cython code in CPython is that it opens up
an extremely wide field of code optimisations without requiring C code to
be written and maintained.
Even for non-CPython runtimes, Cython code would likely be easier to port
than C code, as it has a much better signal-to-noise ratio.
Stefan
More information about the Python-Dev
mailing list