New subject: A General Outline for Just-in-Time Acceleration of Python

13 Jun 2014

      I was wondering what work is being done on Python to make it faster.  I 
understand that cpython is incrementally improved.  I'm not sure, but I 
think that pypy acceleration works by compiling a restricted set of Python. 
 And I think I heard something about Guido working on a different model for 
accelerating Python.  I apologize in advance that I didn't look into these 
projects in a lot of detail.  My number one dream about computer languages 
is for me to be able to write in a language as easy as Python and have it 
run as quickly as if it were written.  I do believe that this is possible 
(since in theory someone could look at my Python code and port it to C++).

Unfortunately, I don't have time to work on this goal, but I still wanted 
to get feedback about some ideas I have about reaching this goal.

First, I don't think it's important for a "code block" (say, a small 
section of code with less coupling to statements outside the block than to 
within the block)   to run quickly on its first iteration. 

What I'm suggesting instead is for every iteration of a "code block", the 
runtime stochastically decides whether to collect statistics about that 
iteration.  Those statistics include the the time running the block, the 
time perform attribute accesses including type method lookups and so on. 
 Basically, the runtime is trying to guess the potential savings of 
optimizing this block.

If the block is run many times and the potential savings are large, then 
stochastically again, the block is promoted to a second-level statistics 
collection.   This level collects statistics about all of the external 
couplings of the block, like the types and values of the passed-in and 
returned values.

Using the second-level statistics, the runtime can now guess whether the 
block should be promoted to a third level whereby any consistencies are 
exploited.  For example, if the passed-in parameter types and return value 
type of the "min" function are (int, int, int) for 40% of the statistics 
and (float, float, float) for 59%, and other types for the remaining 1%, 
then two precompiled versions of min are generated: one for int and one for 
float.

These precompiled code blocks have different costs than regular Python 
blocks.  They need to pay the following costs:
* a check for the required invariants (parameter types above, but it could 
be parameter values, or other invariants)
* they need to install hooks on objects that must remain invariant during 
the execution of the block; if the invariants are ever violated during the 
execution of the block, then all of the computations done during this 
execution of the block must be discarded
* therefore a third cost is the probability of discarded the computation 
times the average cost of the doing the wasted computation.

The saving is that the code block
* can be transformed into a faster bytecode, which includes straight 
assembly instructions in some sections since types or values can now be 
assumed,
* can use data structures that make type or access assumptions (for example 
a list that always contains ints can use a flattened representation; a 
large set that is repeatedly having membership checked with many negative 
results might benefit from an auxiliary bloom filter, etc.)

In summary the runtime performs stochastic, incremental promotion of code 
blocks from first-level, to second-level, to multiple precompiled versions. 
 It can also demote a code block.  The difference between the costs of the 
different levels is statistically estimated.

Examples of optimizations that can be commonly accomplished using such a 
system are:
* global variables are folded into code as constants.  (Even if they change 
rarely, you pay the discarding penalty described above plus the 
recompilation cost; the benefit of inline use of the constant (and any 
constant folding) might outweigh these costs.)
* lookup of member functions, which almost never change
* flattening of homogeneously-typed lists

Best,

Neil

A General Outline for Just-in-Time Acceleration of Python

Neil Girdhar

David Mertz

Neil Girdhar

David Mertz

Nick Coghlan

Neil Girdhar

Tim Delaney

Victor Stinner

Joseph Martinot-Lagarde

David Mertz

Joseph Martinot-Lagarde

David Mertz

Joseph Martinot-Lagarde

Nick Coghlan

Stefan Behnel

Stefan Behnel

Chris Angelico

Stefan Behnel

Chris Angelico

Victor Stinner

tags

participants (8)