[Python-ideas] A General Outline for Just-in-Time Acceleration of Python

Fri Jun 13 11:36:15 CEST 2014

Hi,

2014-06-13 6:07 GMT+02:00 Neil Girdhar <mistersheik at gmail.com>:
> I was wondering what work is being done on Python to make it faster.

PyPy is 100% compatible with CPython and it is much faster.

Numba is also fast, maybe faster than PyPy in some cases (I read that
it can uses a GPU) but it's more specialized to numerical computation.

> I understand that cpython is incrementally improved.  I'm not sure, but I
> think that pypy acceleration works by compiling a restricted set of Python.
> And I think I heard something about Guido working on a different model for
> accelerating Python.  I apologize in advance that I didn't look into these
> projects in a lot of detail.  My number one dream about computer languages
> is for me to be able to write in a language as easy as Python and have it
> run as quickly as if it were written.

I started to take notes about how CPython can be made faster:
http://haypo-notes.readthedocs.org/faster_cpython.html

See for example my section "Why Python is slow?":
http://haypo-notes.readthedocs.org/faster_cpython.html#why-python-is-slow

In short: because Python is a dynamic language (the code can be
modified at runtime, a single variable can store different types,
almost everything can be modified at runtime), the compiler cannot do
much assumption on the Python code and so it's very hard to emit fast
code (bytecode).

>  I do believe that this is possible
> (since in theory someone could look at my Python code and port it to C++).

There are projects to compile Python to C++. See for example pythran:
http://pythonhosted.org/pythran/

But these projects only support a subset of Python. The C++ language
is less dynamic than Python.

> What I'm suggesting instead is for every iteration of a "code block", the
> runtime stochastically decides whether to collect statistics about that
> iteration.  Those statistics include the the time running the block, the
> time perform attribute accesses including type method lookups and so on.
> Basically, the runtime is trying to guess the potential savings of
> optimizing this block.

You should really take a look at PyPy. It implements a *very
efficient* tracing JIT. The problem is to not make the program slower
when you trace it. PyPy makes some compromises to avoid this overhead,
it only optimizes loop with more than N iterations (1000?) for
example.

> If the block is run many times and the potential savings are large, then
> stochastically again, the block is promoted to a second-level statistics
> collection.   This level collects statistics about all of the external
> couplings of the block, like the types and values of the passed-in and
> returned values.

Sorry, this is not the real technical problem :-) No, the real problem
is to detect environment changes, remove the specialized code
(optimized for the old environment) and maybe re-optimize the code
later.

Environment: modules, classes (types), functions, "constants", etc. If
anything is modified, the code must be regenerated.

A specialized code is a compiled version of your Python code which is
based on different assumptions to run faster. For example, if your
function calls the builtin function "len", you can make the assumption
that the len function returns an int. But if the builtin "len"
function is replaced by something else, you must call the new len
function.

With a JIT, you can detect changes of the envrionment and regenerate
optimized functions during the execution of the application. You can
for example add a "timestamp" (counter incremented at each change) in
dictionaries and check if the timestamp changed.

My notes about that:
http://haypo-notes.readthedocs.org/faster_cpython.html#learn-types

> The saving is that the code block
> * can be transformed into a faster bytecode, which includes straight
> assembly instructions in some sections since types or values can now be
> assumed,

My plan is to add the infrastructure to support specialized code in CPython:

- support multiple codes in a single function
- each code has an environment to decide if it can be used or not
- notify (or at least detect) changes of the environment (notify when
the Python code is changed: modules, classes, functions)

It should work well for functions, but I don't see yet how to
implement these things for instances of classes because you can also
override methods in an instance.

> * can use data structures that make type or access assumptions (for example
> a list that always contains ints can use a flattened representation; a large
> set that is repeatedly having membership checked with many negative results
> might benefit from an auxiliary bloom filter, etc.)

Again, please see PyPy: it has very efficient data structures. I don't
think that such changes can be made in CPython. CPython code is too
old, too many users rely on the current implementation: rely on the "C
API". There are for example a PyList_GET_ITEM() macro to access
directly an item of a list. This macro is not part of the stable API,
but I guess that most C modules such macro (depending on C
structures).