A General Outline for Just-in-Time Acceleration of Python

I was wondering what work is being done on Python to make it faster. I understand that cpython is incrementally improved. I'm not sure, but I think that pypy acceleration works by compiling a restricted set of Python. And I think I heard something about Guido working on a different model for accelerating Python. I apologize in advance that I didn't look into these projects in a lot of detail. My number one dream about computer languages is for me to be able to write in a language as easy as Python and have it run as quickly as if it were written. I do believe that this is possible (since in theory someone could look at my Python code and port it to C++). Unfortunately, I don't have time to work on this goal, but I still wanted to get feedback about some ideas I have about reaching this goal. First, I don't think it's important for a "code block" (say, a small section of code with less coupling to statements outside the block than to within the block) to run quickly on its first iteration. What I'm suggesting instead is for every iteration of a "code block", the runtime stochastically decides whether to collect statistics about that iteration. Those statistics include the the time running the block, the time perform attribute accesses including type method lookups and so on. Basically, the runtime is trying to guess the potential savings of optimizing this block. If the block is run many times and the potential savings are large, then stochastically again, the block is promoted to a second-level statistics collection. This level collects statistics about all of the external couplings of the block, like the types and values of the passed-in and returned values. Using the second-level statistics, the runtime can now guess whether the block should be promoted to a third level whereby any consistencies are exploited. For example, if the passed-in parameter types and return value type of the "min" function are (int, int, int) for 40% of the statistics and (float, float, float) for 59%, and other types for the remaining 1%, then two precompiled versions of min are generated: one for int and one for float. These precompiled code blocks have different costs than regular Python blocks. They need to pay the following costs: * a check for the required invariants (parameter types above, but it could be parameter values, or other invariants) * they need to install hooks on objects that must remain invariant during the execution of the block; if the invariants are ever violated during the execution of the block, then all of the computations done during this execution of the block must be discarded * therefore a third cost is the probability of discarded the computation times the average cost of the doing the wasted computation. The saving is that the code block * can be transformed into a faster bytecode, which includes straight assembly instructions in some sections since types or values can now be assumed, * can use data structures that make type or access assumptions (for example a list that always contains ints can use a flattened representation; a large set that is repeatedly having membership checked with many negative results might benefit from an auxiliary bloom filter, etc.) In summary the runtime performs stochastic, incremental promotion of code blocks from first-level, to second-level, to multiple precompiled versions. It can also demote a code block. The difference between the costs of the different levels is statistically estimated. Examples of optimizations that can be commonly accomplished using such a system are: * global variables are folded into code as constants. (Even if they change rarely, you pay the discarding penalty described above plus the recompilation cost; the benefit of inline use of the constant (and any constant folding) might outweigh these costs.) * lookup of member functions, which almost never change * flattening of homogeneously-typed lists Best, Neil

Other a sprinkling of the word "stochastic" around this post (why that word, not the more obvious "random"?), this basically exactly describes what PyPy does. On Thu, Jun 12, 2014 at 9:07 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
I was wondering what work is being done on Python to make it faster. I understand that cpython is incrementally improved. I'm not sure, but I think that pypy acceleration works by compiling a restricted set of Python. And I think I heard something about Guido working on a different model for accelerating Python. I apologize in advance that I didn't look into these projects in a lot of detail. My number one dream about computer languages is for me to be able to write in a language as easy as Python and have it run as quickly as if it were written. I do believe that this is possible (since in theory someone could look at my Python code and port it to C++).
Unfortunately, I don't have time to work on this goal, but I still wanted to get feedback about some ideas I have about reaching this goal.
First, I don't think it's important for a "code block" (say, a small section of code with less coupling to statements outside the block than to within the block) to run quickly on its first iteration.
What I'm suggesting instead is for every iteration of a "code block", the runtime stochastically decides whether to collect statistics about that iteration. Those statistics include the the time running the block, the time perform attribute accesses including type method lookups and so on. Basically, the runtime is trying to guess the potential savings of optimizing this block.
If the block is run many times and the potential savings are large, then stochastically again, the block is promoted to a second-level statistics collection. This level collects statistics about all of the external couplings of the block, like the types and values of the passed-in and returned values.
Using the second-level statistics, the runtime can now guess whether the block should be promoted to a third level whereby any consistencies are exploited. For example, if the passed-in parameter types and return value type of the "min" function are (int, int, int) for 40% of the statistics and (float, float, float) for 59%, and other types for the remaining 1%, then two precompiled versions of min are generated: one for int and one for float.
These precompiled code blocks have different costs than regular Python blocks. They need to pay the following costs: * a check for the required invariants (parameter types above, but it could be parameter values, or other invariants) * they need to install hooks on objects that must remain invariant during the execution of the block; if the invariants are ever violated during the execution of the block, then all of the computations done during this execution of the block must be discarded * therefore a third cost is the probability of discarded the computation times the average cost of the doing the wasted computation.
The saving is that the code block * can be transformed into a faster bytecode, which includes straight assembly instructions in some sections since types or values can now be assumed, * can use data structures that make type or access assumptions (for example a list that always contains ints can use a flattened representation; a large set that is repeatedly having membership checked with many negative results might benefit from an auxiliary bloom filter, etc.)
In summary the runtime performs stochastic, incremental promotion of code blocks from first-level, to second-level, to multiple precompiled versions. It can also demote a code block. The difference between the costs of the different levels is statistically estimated.
Examples of optimizations that can be commonly accomplished using such a system are: * global variables are folded into code as constants. (Even if they change rarely, you pay the discarding penalty described above plus the recompilation cost; the benefit of inline use of the constant (and any constant folding) might outweigh these costs.) * lookup of member functions, which almost never change * flattening of homogeneously-typed lists
Best,
Neil
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Well that's great to hear :) I thought pypy only worked on a restricted set of Python. Does pypy save the optimization statistics between runs? On Fri, Jun 13, 2014 at 1:52 AM, David Mertz <mertz@gnosis.cx> wrote:
Other a sprinkling of the word "stochastic" around this post (why that word, not the more obvious "random"?), this basically exactly describes what PyPy does.
On Thu, Jun 12, 2014 at 9:07 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
I was wondering what work is being done on Python to make it faster. I understand that cpython is incrementally improved. I'm not sure, but I think that pypy acceleration works by compiling a restricted set of Python. And I think I heard something about Guido working on a different model for accelerating Python. I apologize in advance that I didn't look into these projects in a lot of detail. My number one dream about computer languages is for me to be able to write in a language as easy as Python and have it run as quickly as if it were written. I do believe that this is possible (since in theory someone could look at my Python code and port it to C++).
Unfortunately, I don't have time to work on this goal, but I still wanted to get feedback about some ideas I have about reaching this goal.
First, I don't think it's important for a "code block" (say, a small section of code with less coupling to statements outside the block than to within the block) to run quickly on its first iteration.
What I'm suggesting instead is for every iteration of a "code block", the runtime stochastically decides whether to collect statistics about that iteration. Those statistics include the the time running the block, the time perform attribute accesses including type method lookups and so on. Basically, the runtime is trying to guess the potential savings of optimizing this block.
If the block is run many times and the potential savings are large, then stochastically again, the block is promoted to a second-level statistics collection. This level collects statistics about all of the external couplings of the block, like the types and values of the passed-in and returned values.
Using the second-level statistics, the runtime can now guess whether the block should be promoted to a third level whereby any consistencies are exploited. For example, if the passed-in parameter types and return value type of the "min" function are (int, int, int) for 40% of the statistics and (float, float, float) for 59%, and other types for the remaining 1%, then two precompiled versions of min are generated: one for int and one for float.
These precompiled code blocks have different costs than regular Python blocks. They need to pay the following costs: * a check for the required invariants (parameter types above, but it could be parameter values, or other invariants) * they need to install hooks on objects that must remain invariant during the execution of the block; if the invariants are ever violated during the execution of the block, then all of the computations done during this execution of the block must be discarded * therefore a third cost is the probability of discarded the computation times the average cost of the doing the wasted computation.
The saving is that the code block * can be transformed into a faster bytecode, which includes straight assembly instructions in some sections since types or values can now be assumed, * can use data structures that make type or access assumptions (for example a list that always contains ints can use a flattened representation; a large set that is repeatedly having membership checked with many negative results might benefit from an auxiliary bloom filter, etc.)
In summary the runtime performs stochastic, incremental promotion of code blocks from first-level, to second-level, to multiple precompiled versions. It can also demote a code block. The difference between the costs of the different levels is statistically estimated.
Examples of optimizations that can be commonly accomplished using such a system are: * global variables are folded into code as constants. (Even if they change rarely, you pay the discarding penalty described above plus the recompilation cost; the benefit of inline use of the constant (and any constant folding) might outweigh these costs.) * lookup of member functions, which almost never change * flattening of homogeneously-typed lists
Best,
Neil
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

There are many people with a better knowledge of PyPy than me; I've only looked at it and read some general white papers from the team. But I do know with certainty that PyPy executes ALL Python code, not a restricted subset[*]. You might be thinking of Cython in terms of an "annotated, restricted, version of Python." However, PyPy itself is *written* in a restricted subset called RPython, which also might be what you are thinking of. Well, much of PyPy is written that way, I'm pretty sure some of it is regular unrestricted Python too. Other than the fact that PyPy isn't named, e.g. "PyRPy" this is just an implementation detail though--Jython is written in Java, Iron Python is written in C#, CPython is written in C, and PyPy is written in (R)Python. They all run user programs the same though[**] [*] Yes, possibly there is some weird buggy corner case where it does the wrong thing, but if so, file a ticket to get it fixed. [**] For a suitably fuzzy meaning of "the same"--obviously performance characteristics are going to differ, as are things like access to external library calls, etc. It's the same in terms of taking the same identical source files to run, in any case. On Thu, Jun 12, 2014 at 10:53 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
Well that's great to hear :) I thought pypy only worked on a restricted set of Python. Does pypy save the optimization statistics between runs?
On Fri, Jun 13, 2014 at 1:52 AM, David Mertz <mertz@gnosis.cx> wrote:
Other a sprinkling of the word "stochastic" around this post (why that word, not the more obvious "random"?), this basically exactly describes what PyPy does.
On Thu, Jun 12, 2014 at 9:07 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
I was wondering what work is being done on Python to make it faster. I understand that cpython is incrementally improved. I'm not sure, but I think that pypy acceleration works by compiling a restricted set of Python. And I think I heard something about Guido working on a different model for accelerating Python. I apologize in advance that I didn't look into these projects in a lot of detail. My number one dream about computer languages is for me to be able to write in a language as easy as Python and have it run as quickly as if it were written. I do believe that this is possible (since in theory someone could look at my Python code and port it to C++).
Unfortunately, I don't have time to work on this goal, but I still wanted to get feedback about some ideas I have about reaching this goal.
First, I don't think it's important for a "code block" (say, a small section of code with less coupling to statements outside the block than to within the block) to run quickly on its first iteration.
What I'm suggesting instead is for every iteration of a "code block", the runtime stochastically decides whether to collect statistics about that iteration. Those statistics include the the time running the block, the time perform attribute accesses including type method lookups and so on. Basically, the runtime is trying to guess the potential savings of optimizing this block.
If the block is run many times and the potential savings are large, then stochastically again, the block is promoted to a second-level statistics collection. This level collects statistics about all of the external couplings of the block, like the types and values of the passed-in and returned values.
Using the second-level statistics, the runtime can now guess whether the block should be promoted to a third level whereby any consistencies are exploited. For example, if the passed-in parameter types and return value type of the "min" function are (int, int, int) for 40% of the statistics and (float, float, float) for 59%, and other types for the remaining 1%, then two precompiled versions of min are generated: one for int and one for float.
These precompiled code blocks have different costs than regular Python blocks. They need to pay the following costs: * a check for the required invariants (parameter types above, but it could be parameter values, or other invariants) * they need to install hooks on objects that must remain invariant during the execution of the block; if the invariants are ever violated during the execution of the block, then all of the computations done during this execution of the block must be discarded * therefore a third cost is the probability of discarded the computation times the average cost of the doing the wasted computation.
The saving is that the code block * can be transformed into a faster bytecode, which includes straight assembly instructions in some sections since types or values can now be assumed, * can use data structures that make type or access assumptions (for example a list that always contains ints can use a flattened representation; a large set that is repeatedly having membership checked with many negative results might benefit from an auxiliary bloom filter, etc.)
In summary the runtime performs stochastic, incremental promotion of code blocks from first-level, to second-level, to multiple precompiled versions. It can also demote a code block. The difference between the costs of the different levels is statistically estimated.
Examples of optimizations that can be commonly accomplished using such a system are: * global variables are folded into code as constants. (Even if they change rarely, you pay the discarding penalty described above plus the recompilation cost; the benefit of inline use of the constant (and any constant folding) might outweigh these costs.) * lookup of member functions, which almost never change * flattening of homogeneously-typed lists
Best,
Neil
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On 13 Jun 2014 16:22, "David Mertz" <mertz@gnosis.cx> wrote:
There are many people with a better knowledge of PyPy than me; I've only
looked at it and read some general white papers from the team. But I do know with certainty that PyPy executes ALL Python code, not a restricted subset[*]. You might be thinking of Cython in terms of an "annotated, restricted, version of Python." There's also Numba, which allows particular functions to be flagged for JIT compilation with LLVM (and vectorisation if using NumPy). That technically only supports a Python subset, but it's in the context of the normal CPython interpreter, so it's generally possible to just skip accelerating the code that Numba can't handle. Cheers, Nick.

Thanks for the explanation. It's really good to know that PyPy executes all Python code. The idea of using a restricted subset is really annoying to me. The only thing I don't understand is why PyPy would be written in RPython. If it's doing a good job analyzing source code, why do people need to annotate source code instead of the runtime (by collecting statistics)? Best, Neil On Fri, Jun 13, 2014 at 2:21 AM, David Mertz <mertz@gnosis.cx> wrote:
There are many people with a better knowledge of PyPy than me; I've only looked at it and read some general white papers from the team. But I do know with certainty that PyPy executes ALL Python code, not a restricted subset[*]. You might be thinking of Cython in terms of an "annotated, restricted, version of Python."
However, PyPy itself is *written* in a restricted subset called RPython, which also might be what you are thinking of. Well, much of PyPy is written that way, I'm pretty sure some of it is regular unrestricted Python too. Other than the fact that PyPy isn't named, e.g. "PyRPy" this is just an implementation detail though--Jython is written in Java, Iron Python is written in C#, CPython is written in C, and PyPy is written in (R)Python. They all run user programs the same though[**]
[*] Yes, possibly there is some weird buggy corner case where it does the wrong thing, but if so, file a ticket to get it fixed.
[**] For a suitably fuzzy meaning of "the same"--obviously performance characteristics are going to differ, as are things like access to external library calls, etc. It's the same in terms of taking the same identical source files to run, in any case.
On Thu, Jun 12, 2014 at 10:53 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
Well that's great to hear :) I thought pypy only worked on a restricted set of Python. Does pypy save the optimization statistics between runs?
On Fri, Jun 13, 2014 at 1:52 AM, David Mertz <mertz@gnosis.cx> wrote:
Other a sprinkling of the word "stochastic" around this post (why that word, not the more obvious "random"?), this basically exactly describes what PyPy does.
On Thu, Jun 12, 2014 at 9:07 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
I was wondering what work is being done on Python to make it faster. I understand that cpython is incrementally improved. I'm not sure, but I think that pypy acceleration works by compiling a restricted set of Python. And I think I heard something about Guido working on a different model for accelerating Python. I apologize in advance that I didn't look into these projects in a lot of detail. My number one dream about computer languages is for me to be able to write in a language as easy as Python and have it run as quickly as if it were written. I do believe that this is possible (since in theory someone could look at my Python code and port it to C++).
Unfortunately, I don't have time to work on this goal, but I still wanted to get feedback about some ideas I have about reaching this goal.
First, I don't think it's important for a "code block" (say, a small section of code with less coupling to statements outside the block than to within the block) to run quickly on its first iteration.
What I'm suggesting instead is for every iteration of a "code block", the runtime stochastically decides whether to collect statistics about that iteration. Those statistics include the the time running the block, the time perform attribute accesses including type method lookups and so on. Basically, the runtime is trying to guess the potential savings of optimizing this block.
If the block is run many times and the potential savings are large, then stochastically again, the block is promoted to a second-level statistics collection. This level collects statistics about all of the external couplings of the block, like the types and values of the passed-in and returned values.
Using the second-level statistics, the runtime can now guess whether the block should be promoted to a third level whereby any consistencies are exploited. For example, if the passed-in parameter types and return value type of the "min" function are (int, int, int) for 40% of the statistics and (float, float, float) for 59%, and other types for the remaining 1%, then two precompiled versions of min are generated: one for int and one for float.
These precompiled code blocks have different costs than regular Python blocks. They need to pay the following costs: * a check for the required invariants (parameter types above, but it could be parameter values, or other invariants) * they need to install hooks on objects that must remain invariant during the execution of the block; if the invariants are ever violated during the execution of the block, then all of the computations done during this execution of the block must be discarded * therefore a third cost is the probability of discarded the computation times the average cost of the doing the wasted computation.
The saving is that the code block * can be transformed into a faster bytecode, which includes straight assembly instructions in some sections since types or values can now be assumed, * can use data structures that make type or access assumptions (for example a list that always contains ints can use a flattened representation; a large set that is repeatedly having membership checked with many negative results might benefit from an auxiliary bloom filter, etc.)
In summary the runtime performs stochastic, incremental promotion of code blocks from first-level, to second-level, to multiple precompiled versions. It can also demote a code block. The difference between the costs of the different levels is statistically estimated.
Examples of optimizations that can be commonly accomplished using such a system are: * global variables are folded into code as constants. (Even if they change rarely, you pay the discarding penalty described above plus the recompilation cost; the benefit of inline use of the constant (and any constant folding) might outweigh these costs.) * lookup of member functions, which almost never change * flattening of homogeneously-typed lists
Best,
Neil
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On 13 June 2014 19:05, Neil Girdhar <mistersheik@gmail.com> wrote:
Thanks for the explanation. It's really good to know that PyPy executes all Python code. The idea of using a restricted subset is really annoying to me.
The only thing I don't understand is why PyPy would be written in RPython. If it's doing a good job analyzing source code, why do people need to annotate source code instead of the runtime (by collecting statistics)?
Start reading from here: http://pypy.readthedocs.org/en/latest/coding-guide.html#our-runtime-interpre... Tim Delaney

2014-06-13 11:05 GMT+02:00 Neil Girdhar <mistersheik@gmail.com>:
Thanks for the explanation. It's really good to know that PyPy executes all Python code. The idea of using a restricted subset is really annoying to me.
PyPy is 100% compatible with CPython, it implements very tricky implementation details just to be 100% compatible. (But PyPy support of the CPython C API is only partial, but the C API is not part of the "Python language".) In short, the JIT compiler is written in a different language called RPython. This language can be compiled to C, but other backends are or were available: Java, .NET, Javascript, etc. (You should check, I'm not sure.) The whole "PyPy project" is more than just than a fast Python interpreter. There are also fast interpreter for Ruby, PHP, and some other languages. Victor

Le 13/06/2014 08:21, David Mertz a écrit :
There are many people with a better knowledge of PyPy than me; I've only looked at it and read some general white papers from the team. But I do know with certainty that PyPy executes ALL Python code, not a restricted subset[*]. You might be thinking of Cython in terms of an "annotated, restricted, version of Python."
Cython compiles all python, it is not restricted. On the other hand, pypy is not compatible with the C API of CPython, thus can not run compiled modules as is.
However, PyPy itself is *written* in a restricted subset called RPython, which also might be what you are thinking of. Well, much of PyPy is written that way, I'm pretty sure some of it is regular unrestricted Python too. Other than the fact that PyPy isn't named, e.g. "PyRPy" this is just an implementation detail though--Jython is written in Java, Iron Python is written in C#, CPython is written in C, and PyPy is written in (R)Python. They all run user programs the same though[**]
[*] Yes, possibly there is some weird buggy corner case where it does the wrong thing, but if so, file a ticket to get it fixed.
[**] For a suitably fuzzy meaning of "the same"--obviously performance characteristics are going to differ, as are things like access to external library calls, etc. It's the same in terms of taking the same identical source files to run, in any case.
On Thu, Jun 12, 2014 at 10:53 PM, Neil Girdhar <mistersheik@gmail.com <mailto:mistersheik@gmail.com>> wrote:
Well that's great to hear :) I thought pypy only worked on a restricted set of Python. Does pypy save the optimization statistics between runs?
On Fri, Jun 13, 2014 at 1:52 AM, David Mertz <mertz@gnosis.cx <mailto:mertz@gnosis.cx>> wrote:
Other a sprinkling of the word "stochastic" around this post (why that word, not the more obvious "random"?), this basically exactly describes what PyPy does.
On Thu, Jun 12, 2014 at 9:07 PM, Neil Girdhar <mistersheik@gmail.com <mailto:mistersheik@gmail.com>> wrote:
I was wondering what work is being done on Python to make it faster. I understand that cpython is incrementally improved. I'm not sure, but I think that pypy acceleration works by compiling a restricted set of Python. And I think I heard something about Guido working on a different model for accelerating Python. I apologize in advance that I didn't look into these projects in a lot of detail. My number one dream about computer languages is for me to be able to write in a language as easy as Python and have it run as quickly as if it were written. I do believe that this is possible (since in theory someone could look at my Python code and port it to C++).
Unfortunately, I don't have time to work on this goal, but I still wanted to get feedback about some ideas I have about reaching this goal.
First, I don't think it's important for a "code block" (say, a small section of code with less coupling to statements outside the block than to within the block) to run quickly on its first iteration.
What I'm suggesting instead is for every iteration of a "code block", the runtime stochastically decides whether to collect statistics about that iteration. Those statistics include the the time running the block, the time perform attribute accesses including type method lookups and so on. Basically, the runtime is trying to guess the potential savings of optimizing this block.
If the block is run many times and the potential savings are large, then stochastically again, the block is promoted to a second-level statistics collection. This level collects statistics about all of the external couplings of the block, like the types and values of the passed-in and returned values.
Using the second-level statistics, the runtime can now guess whether the block should be promoted to a third level whereby any consistencies are exploited. For example, if the passed-in parameter types and return value type of the "min" function are (int, int, int) for 40% of the statistics and (float, float, float) for 59%, and other types for the remaining 1%, then two precompiled versions of min are generated: one for int and one for float.
These precompiled code blocks have different costs than regular Python blocks. They need to pay the following costs: * a check for the required invariants (parameter types above, but it could be parameter values, or other invariants) * they need to install hooks on objects that must remain invariant during the execution of the block; if the invariants are ever violated during the execution of the block, then all of the computations done during this execution of the block must be discarded * therefore a third cost is the probability of discarded the computation times the average cost of the doing the wasted computation.
The saving is that the code block * can be transformed into a faster bytecode, which includes straight assembly instructions in some sections since types or values can now be assumed, * can use data structures that make type or access assumptions (for example a list that always contains ints can use a flattened representation; a large set that is repeatedly having membership checked with many negative results might benefit from an auxiliary bloom filter, etc.)
In summary the runtime performs stochastic, incremental promotion of code blocks from first-level, to second-level, to multiple precompiled versions. It can also demote a code block. The difference between the costs of the different levels is statistically estimated.
Examples of optimizations that can be commonly accomplished using such a system are: * global variables are folded into code as constants. (Even if they change rarely, you pay the discarding penalty described above plus the recompilation cost; the benefit of inline use of the constant (and any constant folding) might outweigh these costs.) * lookup of member functions, which almost never change * flattening of homogeneously-typed lists
Best,
Neil
_______________________________________________ Python-ideas mailing list Python-ideas@python.org <mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
--- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com

On Fri, Jun 13, 2014 at 11:54 PM, Joseph Martinot-Lagarde < joseph.martinot-lagarde@m4x.org> wrote:
Cython compiles all python, it is not restricted.
Well, kinda yes and no. You are correct of course, that anything that you can execute with 'python someprog' you can compile with 'cython someprog'. However, there is an obvious sense in which adding an annotation (which is, of course, a syntax error for Python itself) "restricts" the code in Cython. E.g.: def silly(): cdef int n, i for i in range(10): if i < 5: n = i + 1 else: n = str(i) This *silly* function isn't really Python code at all, of course. But if you ignore the annotation, it would be--pointless code, but valid. As soon as you add the annotation, you *restrict* the type of code you can write in the scope of the annotation. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Le 14/06/2014 09:30, David Mertz a écrit :
On Fri, Jun 13, 2014 at 11:54 PM, Joseph Martinot-Lagarde <joseph.martinot-lagarde@m4x.org <mailto:joseph.martinot-lagarde@m4x.org>> wrote:
Cython compiles all python, it is not restricted.
Well, kinda yes and no. You are correct of course, that anything that you can execute with 'python someprog' you can compile with 'cython someprog'. However, there is an obvious sense in which adding an annotation (which is, of course, a syntax error for Python itself) "restricts" the code in Cython. E.g.:
def silly(): cdef int n, i for i in range(10): if i < 5: n = i + 1 else: n = str(i)
This *silly* function isn't really Python code at all, of course. But if you ignore the annotation, it would be--pointless code, but valid. As soon as you add the annotation, you *restrict* the type of code you can write in the scope of the annotation.
Yeah, the point is that *you* restrict, not cython. From your previous post I understood that you meant "pypy runs all python but cython doesn't, it is restricted". I use numpy regularely, and in this case it is the other way around: I can optimize my code using cython but I can't run it with pypy at all. --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com

Is there ever a case where removing all the type annotations from Cython code does not produce code that can run in PyPy? I don't know Cython well enough to be certain the answer is 'no', but I think so. So a function a little like my 'silly()' function--but that did something actually interesting in the loop--might run faster by removing the annotation and running it in PyPy. Or is might NOT, of course; the answer is not obvious without looking at the exact code in question, and probably not without actually timing it. But the idea is that let's say I have some code with a loop and some numeric operations inside that loop that I'm currently running using CPython. There are at least two ways I might speed up that code: A) Edit the code to contain some type annotations, and compile it with Cython. However, if I do this, I *might* have to modify some other constructs in the overall code block to get it to compile (i.e. if there's any polymorphism about variable types). B) Run the unchanged code using PyPy. Well, in this description, PyPy sounds better... but it's not better if option (A) makes faster code in *your* specific code, of course. And moreover, (B) is not true if your existing code relies on C extensions, such as NumPy, which mostly aren't going to run on PyPy. However, I do know about https://bitbucket.org/pypy/numpy. At least some substantial part of NumPy has been ported to PyPy. This may or may not support the code *you* need to run. On Sat, Jun 14, 2014 at 12:38 AM, Joseph Martinot-Lagarde < joseph.martinot-lagarde@m4x.org> wrote:
Le 14/06/2014 09:30, David Mertz a écrit :
On Fri, Jun 13, 2014 at 11:54 PM, Joseph Martinot-Lagarde <joseph.martinot-lagarde@m4x.org <mailto:joseph.martinot-lagarde@m4x.org>> wrote:
Cython compiles all python, it is not restricted.
Well, kinda yes and no. You are correct of course, that anything that you can execute with 'python someprog' you can compile with 'cython someprog'. However, there is an obvious sense in which adding an annotation (which is, of course, a syntax error for Python itself) "restricts" the code in Cython. E.g.:
def silly(): cdef int n, i for i in range(10): if i < 5: n = i + 1 else: n = str(i)
This *silly* function isn't really Python code at all, of course. But if you ignore the annotation, it would be--pointless code, but valid. As soon as you add the annotation, you *restrict* the type of code you can write in the scope of the annotation.
Yeah, the point is that *you* restrict, not cython. From your previous post I understood that you meant "pypy runs all python but cython doesn't, it is restricted".
I use numpy regularely, and in this case it is the other way around: I can optimize my code using cython but I can't run it with pypy at all.
--- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Le 14/06/2014 09:53, David Mertz a écrit :
Is there ever a case where removing all the type annotations from Cython code does not produce code that can run in PyPy? I don't know Cython well enough to be certain the answer is 'no', but I think so.
Cython without annotations is just python, so it can be run in pypy.
So a function a little like my 'silly()' function--but that did something actually interesting in the loop--might run faster by removing the annotation and running it in PyPy. Or is might NOT, of course; the answer is not obvious without looking at the exact code in question, and probably not without actually timing it.
But the idea is that let's say I have some code with a loop and some numeric operations inside that loop that I'm currently running using CPython. There are at least two ways I might speed up that code:
A) Edit the code to contain some type annotations, and compile it with Cython. However, if I do this, I *might* have to modify some other constructs in the overall code block to get it to compile (i.e. if there's any polymorphism about variable types).
B) Run the unchanged code using PyPy.
Well, in this description, PyPy sounds better... but it's not better if option (A) makes faster code in *your* specific code, of course. And moreover, (B) is not true if your existing code relies on C extensions, such as NumPy, which mostly aren't going to run on PyPy.
Actually I don't see it like pypy vs cython. The only common point between these projects is that they both try to optimize python code, but in such different ways that the outcome completely depends on the code. For my problems cython is an obvious choice, for others it could be useless.
However, I do know about https://bitbucket.org/pypy/numpy. At least some substantial part of NumPy has been ported to PyPy. This may or may not support the code *you* need to run.
Right now it doesn't ! ;) I have the same problem with numba and numexpr, it seems to rarely be compatible with my real world use cases.
On Sat, Jun 14, 2014 at 12:38 AM, Joseph Martinot-Lagarde <joseph.martinot-lagarde@m4x.org <mailto:joseph.martinot-lagarde@m4x.org>> wrote:
Le 14/06/2014 09:30, David Mertz a écrit :
On Fri, Jun 13, 2014 at 11:54 PM, Joseph Martinot-Lagarde <joseph.martinot-lagarde@m4x.__org <mailto:joseph.martinot-lagarde@m4x.org> <mailto:joseph.martinot-__lagarde@m4x.org <mailto:joseph.martinot-lagarde@m4x.org>>> wrote:
Cython compiles all python, it is not restricted.
Well, kinda yes and no. You are correct of course, that anything that you can execute with 'python someprog' you can compile with 'cython someprog'. However, there is an obvious sense in which adding an annotation (which is, of course, a syntax error for Python itself) "restricts" the code in Cython. E.g.:
def silly(): cdef int n, i for i in range(10): if i < 5: n = i + 1 else: n = str(i)
This *silly* function isn't really Python code at all, of course. But if you ignore the annotation, it would be--pointless code, but valid. As soon as you add the annotation, you *restrict* the type of code you can write in the scope of the annotation.
Yeah, the point is that *you* restrict, not cython. From your previous post I understood that you meant "pypy runs all python but cython doesn't, it is restricted".
I use numpy regularely, and in this case it is the other way around: I can optimize my code using cython but I can't run it with pypy at all.
--- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com
_________________________________________________ Python-ideas mailing list Python-ideas@python.org <mailto:Python-ideas@python.org> https://mail.python.org/__mailman/listinfo/python-ideas <https://mail.python.org/mailman/listinfo/python-ideas> Code of Conduct: http://python.org/psf/__codeofconduct/ <http://python.org/psf/codeofconduct/>
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
--- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com

On 14 June 2014 17:53, David Mertz <mertz@gnosis.cx> wrote:
Is there ever a case where removing all the type annotations from Cython code does not produce code that can run in PyPy? I don't know Cython well enough to be certain the answer is 'no', but I think so. So a function a little like my 'silly()' function--but that did something actually interesting in the loop--might run faster by removing the annotation and running it in PyPy. Or is might NOT, of course; the answer is not obvious without looking at the exact code in question, and probably not without actually timing it.
But the idea is that let's say I have some code with a loop and some numeric operations inside that loop that I'm currently running using CPython. There are at least two ways I might speed up that code:
A) Edit the code to contain some type annotations, and compile it with Cython. However, if I do this, I *might* have to modify some other constructs in the overall code block to get it to compile (i.e. if there's any polymorphism about variable types).
B) Run the unchanged code using PyPy.
From a redistribution perspective, engineering staff can certainly suggest "Hey, these would be good things to offer our customers", but
C) Compile the unchanged code with Cython There's a myth that Cython requires type annotations to speed up Python code. This is not accurate: just bypassing the main eval loop and some aspects of function call handling can provide a respectable speed-up, even when Cython is still performing all the operations through the abstract object API rather than being able to drop back to platform native types. The speed increases aren't as significant as those on offer in PyPy, but they're not trivial, and they don't come at the cost of potential incompatibility with other C extensions (See https://github.com/cython/cython/wiki/FAQ#is-cython-faster-than-cpython for more details) The difference I see between the PyPy approach and the Cython approach to optimisation is between a desire to "just make Python fast, even if it means breaking compatibility with C extensions" (you could call this the "Python as application programming language" perspective, which puts PyPy in a head-to-head contest with the JVM and the .NET CLR, moreso than with CPython) and the "make CPython CPU bottlenecks fast, even if doing so requires some additional static annotations" (you could call this the "CPython as orchestration platform" approach, which leverages the rich CPython C API and the ubiquity of C dynamic linking support at the operating system level to interoperate with existing software components, rather than treating reliance on "native" software as something to be avoided as the application programming runtimes tend to do). Those differences in perspective can then create significant barriers to productive communication between different communities of developers and users (for folks that use Python as an orchestration language, PyPy's poorer orchestration support is a dealbreaker, while for PyPy developers focused on applications programming use cases, the lack of interest from system integrators can be intensely frustrating). cffi is a potential path to improving PyPy's handling of the "orchestration platform" use case, but it still has a long way to go to catch up to CPython on that front. NumPyPy in particular still has a fair bit of work to do in catching up to NumPy (http://buildbot.pypy.org/numpy-status/latest.html is an excellent resource for checking in on the progress of that effort). In all areas of Python optimisation, though, there's a lot of work to be done in cracking the discoverability and distribution channel problem. I assume redistributors are still wary of offering PyPy support because it's a completely new way of building language runtimes and they aren't sure they understand it yet. Customer demand can overcome that wariness, but if existing Python users are using Python in an orchestration role rather than primarily as an applications programming language, then that demand may not be there. If customers aren't even aware that these optimisation tools exist in the first place, then that will also hinder the generation of demand. This is hinted at by the fact that even Cython (let alone PyPy) isn't as well supported by redistributors as Python 3, suggesting that customers and redistributors may not be looking far enough outside python.org and python-dev for opportunities to enhance their Python environments. To use a Red Hat specific example, CPython itself is available as a core supported part of the operating system (with 2.3, 2.4, 2.6 and 2.7 all still supported), while CPython 2.7 and 3.3 are also available as explicitly supported offerings through Red Hat Software Collections. Both PyPy and Cython are also available for Red Hat Enterprise Linux & derivatives, but only through the community provided "Extra Packages for Enterprise Linux" repositories. The newer Numba JIT compiler for CPython isn't even in EPEL - you have to build it from source yourself, or acquire it via other means (likely conda). that's never going to be as compelling as customers coming to us (or other Python vendors) and asking "Hey, what can you do for me to make my Python code run faster?". Upstream has good answers for a lot of these problems, but commercial redistributors usually aren't going to bite unless they can see a clear business case for it. Lowering the *cost* of redistribution is also at the heart of a lot of the work going on around metadata 2.0 on distutils-sig - at the moment, the repackaging process (getting from PyPI formats to redistributor formats) can be incredibly manual (not only when the project is first repackaged, but sometimes even on subsequent updates), which is one of the reasons repackaged Python projects tend to be measured in the dozens, or at best hundreds, compared to the tens of thousands available upstream, and why there tend to be significant lag times between upstream updates and updates of repackaged versions. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

David Mertz, 14.06.2014 09:53:
moreover, (B) is not true if your existing code relies on C extensions, such as NumPy, which mostly aren't going to run on PyPy.
However, I do know about https://bitbucket.org/pypy/numpy. At least some substantial part of NumPy has been ported to PyPy. This may or may not support the code *you* need to run.
Usually, when people say "my code uses NumPy", what they mean is "NumPy and parts of the surrounding ecosystem", which often includes SciPy and other specialised number crunching libraries. Porting all of that to PyPy and its numpypy reimplementation would take a while. Stefan

David Mertz, 14.06.2014 09:30:
On Fri, Jun 13, 2014 at 11:54 PM, Joseph Martinot-Lagarde wrote:
Cython compiles all python, it is not restricted.
Well, kinda yes and no. You are correct of course, that anything that you can execute with 'python someprog' you can compile with 'cython someprog'. However, there is an obvious sense in which adding an annotation (which is, of course, a syntax error for Python itself) "restricts" the code in Cython. E.g.:
def silly(): cdef int n, i
You can rewrite this as import cython @cython.locals(n=int, i=int) def silly(): which makes it valid Python but has the same semantics as your cdef declaration when compiled in Cython.
for i in range(10): if i < 5: n = i + 1 else: n = str(i)
This *silly* function isn't really Python code at all, of course. But if you ignore the annotation, it would be--pointless code, but valid. As soon as you add the annotation, you *restrict* the type of code you can write in the scope of the annotation.
When compiled with Cython, you will get a TypeError on i == 5 (because you said so), whereas it will run through the whole loop in Python. Stefan

On Mon, Jun 16, 2014 at 5:05 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
You can rewrite this as
import cython
@cython.locals(n=int, i=int) def silly():
which makes it valid Python but has the same semantics as your cdef declaration when compiled in Cython.
Syntactically valid, yes. Is there a dummy decorator class cython.locals for the case where it's running under Python? ChrisA

Chris Angelico, 16.06.2014 09:15:
On Mon, Jun 16, 2014 at 5:05 PM, Stefan Behnel wrote:
You can rewrite this as
import cython
@cython.locals(n=int, i=int) def silly():
which makes it valid Python but has the same semantics as your cdef declaration when compiled in Cython.
Syntactically valid, yes. Is there a dummy decorator class cython.locals for the case where it's running under Python?
Wouldn't make much sense otherwise, would it? :) https://github.com/cython/cython/blob/master/Cython/Shadow.py Here are some details: http://docs.cython.org/src/tutorial/pure.html Stefan

On Mon, Jun 16, 2014 at 5:34 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Chris Angelico, 16.06.2014 09:15:
Syntactically valid, yes. Is there a dummy decorator class cython.locals for the case where it's running under Python?
Wouldn't make much sense otherwise, would it? :)
https://github.com/cython/cython/blob/master/Cython/Shadow.py
Heh, I kinda figured it'd have to exist. Incidentally, I flipped through that source file and didn't see it - had to actually search before I found this tiny two-line function that does the job. Naturally I assumed I was looking for a class, but a function that returns an identity function of course does just as well. ChrisA

Hi, 2014-06-13 6:07 GMT+02:00 Neil Girdhar <mistersheik@gmail.com>:
I was wondering what work is being done on Python to make it faster.
PyPy is 100% compatible with CPython and it is much faster. Numba is also fast, maybe faster than PyPy in some cases (I read that it can uses a GPU) but it's more specialized to numerical computation.
I understand that cpython is incrementally improved. I'm not sure, but I think that pypy acceleration works by compiling a restricted set of Python. And I think I heard something about Guido working on a different model for accelerating Python. I apologize in advance that I didn't look into these projects in a lot of detail. My number one dream about computer languages is for me to be able to write in a language as easy as Python and have it run as quickly as if it were written.
I started to take notes about how CPython can be made faster: http://haypo-notes.readthedocs.org/faster_cpython.html See for example my section "Why Python is slow?": http://haypo-notes.readthedocs.org/faster_cpython.html#why-python-is-slow In short: because Python is a dynamic language (the code can be modified at runtime, a single variable can store different types, almost everything can be modified at runtime), the compiler cannot do much assumption on the Python code and so it's very hard to emit fast code (bytecode).
I do believe that this is possible (since in theory someone could look at my Python code and port it to C++).
There are projects to compile Python to C++. See for example pythran: http://pythonhosted.org/pythran/ But these projects only support a subset of Python. The C++ language is less dynamic than Python.
What I'm suggesting instead is for every iteration of a "code block", the runtime stochastically decides whether to collect statistics about that iteration. Those statistics include the the time running the block, the time perform attribute accesses including type method lookups and so on. Basically, the runtime is trying to guess the potential savings of optimizing this block.
You should really take a look at PyPy. It implements a *very efficient* tracing JIT. The problem is to not make the program slower when you trace it. PyPy makes some compromises to avoid this overhead, it only optimizes loop with more than N iterations (1000?) for example.
If the block is run many times and the potential savings are large, then stochastically again, the block is promoted to a second-level statistics collection. This level collects statistics about all of the external couplings of the block, like the types and values of the passed-in and returned values.
Sorry, this is not the real technical problem :-) No, the real problem is to detect environment changes, remove the specialized code (optimized for the old environment) and maybe re-optimize the code later. Environment: modules, classes (types), functions, "constants", etc. If anything is modified, the code must be regenerated. A specialized code is a compiled version of your Python code which is based on different assumptions to run faster. For example, if your function calls the builtin function "len", you can make the assumption that the len function returns an int. But if the builtin "len" function is replaced by something else, you must call the new len function. With a JIT, you can detect changes of the envrionment and regenerate optimized functions during the execution of the application. You can for example add a "timestamp" (counter incremented at each change) in dictionaries and check if the timestamp changed. My notes about that: http://haypo-notes.readthedocs.org/faster_cpython.html#learn-types
The saving is that the code block * can be transformed into a faster bytecode, which includes straight assembly instructions in some sections since types or values can now be assumed,
My plan is to add the infrastructure to support specialized code in CPython: - support multiple codes in a single function - each code has an environment to decide if it can be used or not - notify (or at least detect) changes of the environment (notify when the Python code is changed: modules, classes, functions) It should work well for functions, but I don't see yet how to implement these things for instances of classes because you can also override methods in an instance.
* can use data structures that make type or access assumptions (for example a list that always contains ints can use a flattened representation; a large set that is repeatedly having membership checked with many negative results might benefit from an auxiliary bloom filter, etc.)
Again, please see PyPy: it has very efficient data structures. I don't think that such changes can be made in CPython. CPython code is too old, too many users rely on the current implementation: rely on the "C API". There are for example a PyList_GET_ITEM() macro to access directly an item of a list. This macro is not part of the stable API, but I guess that most C modules such macro (depending on C structures).
participants (8)
-
Chris Angelico
-
David Mertz
-
Joseph Martinot-Lagarde
-
Neil Girdhar
-
Nick Coghlan
-
Stefan Behnel
-
Tim Delaney
-
Victor Stinner