[Python-ideas] A General Outline for Just-in-Time Acceleration of Python
Neil Girdhar
mistersheik at gmail.com
Fri Jun 13 06:07:56 CEST 2014
I was wondering what work is being done on Python to make it faster. I
understand that cpython is incrementally improved. I'm not sure, but I
think that pypy acceleration works by compiling a restricted set of Python.
And I think I heard something about Guido working on a different model for
accelerating Python. I apologize in advance that I didn't look into these
projects in a lot of detail. My number one dream about computer languages
is for me to be able to write in a language as easy as Python and have it
run as quickly as if it were written. I do believe that this is possible
(since in theory someone could look at my Python code and port it to C++).
Unfortunately, I don't have time to work on this goal, but I still wanted
to get feedback about some ideas I have about reaching this goal.
First, I don't think it's important for a "code block" (say, a small
section of code with less coupling to statements outside the block than to
within the block) to run quickly on its first iteration.
What I'm suggesting instead is for every iteration of a "code block", the
runtime stochastically decides whether to collect statistics about that
iteration. Those statistics include the the time running the block, the
time perform attribute accesses including type method lookups and so on.
Basically, the runtime is trying to guess the potential savings of
optimizing this block.
If the block is run many times and the potential savings are large, then
stochastically again, the block is promoted to a second-level statistics
collection. This level collects statistics about all of the external
couplings of the block, like the types and values of the passed-in and
returned values.
Using the second-level statistics, the runtime can now guess whether the
block should be promoted to a third level whereby any consistencies are
exploited. For example, if the passed-in parameter types and return value
type of the "min" function are (int, int, int) for 40% of the statistics
and (float, float, float) for 59%, and other types for the remaining 1%,
then two precompiled versions of min are generated: one for int and one for
float.
These precompiled code blocks have different costs than regular Python
blocks. They need to pay the following costs:
* a check for the required invariants (parameter types above, but it could
be parameter values, or other invariants)
* they need to install hooks on objects that must remain invariant during
the execution of the block; if the invariants are ever violated during the
execution of the block, then all of the computations done during this
execution of the block must be discarded
* therefore a third cost is the probability of discarded the computation
times the average cost of the doing the wasted computation.
The saving is that the code block
* can be transformed into a faster bytecode, which includes straight
assembly instructions in some sections since types or values can now be
assumed,
* can use data structures that make type or access assumptions (for example
a list that always contains ints can use a flattened representation; a
large set that is repeatedly having membership checked with many negative
results might benefit from an auxiliary bloom filter, etc.)
In summary the runtime performs stochastic, incremental promotion of code
blocks from first-level, to second-level, to multiple precompiled versions.
It can also demote a code block. The difference between the costs of the
different levels is statistically estimated.
Examples of optimizations that can be commonly accomplished using such a
system are:
* global variables are folded into code as constants. (Even if they change
rarely, you pay the discarding penalty described above plus the
recompilation cost; the benefit of inline use of the constant (and any
constant folding) might outweigh these costs.)
* lookup of member functions, which almost never change
* flattening of homogeneously-typed lists
Best,
Neil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140612/523c77fe/attachment.html>
More information about the Python-ideas
mailing list