[Python-ideas] A General Outline for Just-in-Time Acceleration of Python
Nick Coghlan
ncoghlan at gmail.com
Sun Jun 15 07:51:02 CEST 2014
On 14 June 2014 17:53, David Mertz <mertz at gnosis.cx> wrote:
> Is there ever a case where removing all the type annotations from Cython
> code does not produce code that can run in PyPy? I don't know Cython well
> enough to be certain the answer is 'no', but I think so. So a function a
> little like my 'silly()' function--but that did something actually
> interesting in the loop--might run faster by removing the annotation and
> running it in PyPy. Or is might NOT, of course; the answer is not obvious
> without looking at the exact code in question, and probably not without
> actually timing it.
>
> But the idea is that let's say I have some code with a loop and some numeric
> operations inside that loop that I'm currently running using CPython. There
> are at least two ways I might speed up that code:
>
> A) Edit the code to contain some type annotations, and compile it with
> Cython. However, if I do this, I *might* have to modify some other
> constructs in the overall code block to get it to compile (i.e. if there's
> any polymorphism about variable types).
>
> B) Run the unchanged code using PyPy.
C) Compile the unchanged code with Cython
There's a myth that Cython requires type annotations to speed up
Python code. This is not accurate: just bypassing the main eval loop
and some aspects of function call handling can provide a respectable
speed-up, even when Cython is still performing all the operations
through the abstract object API rather than being able to drop back to
platform native types. The speed increases aren't as significant as
those on offer in PyPy, but they're not trivial, and they don't come
at the cost of potential incompatibility with other C extensions (See
https://github.com/cython/cython/wiki/FAQ#is-cython-faster-than-cpython
for more details)
The difference I see between the PyPy approach and the Cython approach
to optimisation is between a desire to "just make Python fast, even if
it means breaking compatibility with C extensions" (you could call
this the "Python as application programming language" perspective,
which puts PyPy in a head-to-head contest with the JVM and the .NET
CLR, moreso than with CPython) and the "make CPython CPU bottlenecks
fast, even if doing so requires some additional static annotations"
(you could call this the "CPython as orchestration platform" approach,
which leverages the rich CPython C API and the ubiquity of C dynamic
linking support at the operating system level to interoperate with
existing software components, rather than treating reliance on
"native" software as something to be avoided as the application
programming runtimes tend to do). Those differences in perspective can
then create significant barriers to productive communication between
different communities of developers and users (for folks that use
Python as an orchestration language, PyPy's poorer orchestration
support is a dealbreaker, while for PyPy developers focused on
applications programming use cases, the lack of interest from system
integrators can be intensely frustrating).
cffi is a potential path to improving PyPy's handling of the
"orchestration platform" use case, but it still has a long way to go
to catch up to CPython on that front. NumPyPy in particular still has
a fair bit of work to do in catching up to NumPy
(http://buildbot.pypy.org/numpy-status/latest.html is an excellent
resource for checking in on the progress of that effort).
In all areas of Python optimisation, though, there's a lot of work to
be done in cracking the discoverability and distribution channel
problem. I assume redistributors are still wary of offering PyPy
support because it's a completely new way of building language
runtimes and they aren't sure they understand it yet. Customer demand
can overcome that wariness, but if existing Python users are using
Python in an orchestration role rather than primarily as an
applications programming language, then that demand may not be there.
If customers aren't even aware that these optimisation tools exist in
the first place, then that will also hinder the generation of demand.
This is hinted at by the fact that even Cython (let alone PyPy) isn't
as well supported by redistributors as Python 3, suggesting that
customers and redistributors may not be looking far enough outside
python.org and python-dev for opportunities to enhance their Python
environments.
To use a Red Hat specific example, CPython itself is available as a
core supported part of the operating system (with 2.3, 2.4, 2.6 and
2.7 all still supported), while CPython 2.7 and 3.3 are also available
as explicitly supported offerings through Red Hat Software
Collections. Both PyPy and Cython are also available for Red Hat
Enterprise Linux & derivatives, but only through the community
provided "Extra Packages for Enterprise Linux" repositories. The newer
Numba JIT compiler for CPython isn't even in EPEL - you have to build
it from source yourself, or acquire it via other means (likely conda).
>From a redistribution perspective, engineering staff can certainly
suggest "Hey, these would be good things to offer our customers", but
that's never going to be as compelling as customers coming to us (or
other Python vendors) and asking "Hey, what can you do for me to make
my Python code run faster?". Upstream has good answers for a lot of
these problems, but commercial redistributors usually aren't going to
bite unless they can see a clear business case for it.
Lowering the *cost* of redistribution is also at the heart of a lot of
the work going on around metadata 2.0 on distutils-sig - at the
moment, the repackaging process (getting from PyPI formats to
redistributor formats) can be incredibly manual (not only when the
project is first repackaged, but sometimes even on subsequent
updates), which is one of the reasons repackaged Python projects tend
to be measured in the dozens, or at best hundreds, compared to the
tens of thousands available upstream, and why there tend to be
significant lag times between upstream updates and updates of
repackaged versions.
Regards,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
More information about the Python-ideas
mailing list