
CPython should get a tracing JIT that turns slow bytecode into fast bytecode. A JIT doesn't have to produce machine code. bytecode-to-bytecode compilation is still compilation. bytecode-to-bytecode compilation works on iOS, and doesn't require deviating from C. (This "internal bytecode" should do things like know that 2 variables necessarily hold integers, doing just "x = y + z" in C in an IADD instruction as opposed to all those middle-of-function typechecks and overhead. You can typecheck once at the start of the function and run separate traces on that. Since this "internal bytecode" is extremely unsafe, it should be considered an implementation detail and never exposed to external code.)

On Fri, Jun 30, 2017 at 12:09:52PM -0300, "Soni L." <fakedme+py@gmail.com> wrote:
CPython should get a
You're welcome to create one. Go on, send your pull requests! Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Jun 30, 2017 5:16 PM, "Oleg Broytman" <phd@phdru.name> wrote: On Fri, Jun 30, 2017 at 12:09:52PM -0300, "Soni L." <fakedme+py@gmail.com> wrote:
CPython should get a
You're welcome to create one. Go on, send your pull requests! But if you are planning to do that, it is still a good idea to ask for feedback here first. That will increase the chances of acceptance by a lot. Also, it doesn't necessarily need to be your own idea :) -- Koos

On Fri, Jun 30, 2017, 10:38 Koos Zevenhoven, <k7hoven@gmail.com> wrote:
On Jun 30, 2017 5:16 PM, "Oleg Broytman" <phd@phdru.name> wrote:
On Fri, Jun 30, 2017 at 12:09:52PM -0300, "Soni L." <fakedme+py@gmail.com> wrote:
CPython should get a
You're welcome to create one. Go on, send your pull requests!
But if you are planning to do that, it is still a good idea to ask for feedback here first. That will increase the chances of acceptance by a lot. Also, it doesn't necessarily need to be your own idea :)
I think Oleg was more responding to the fact that Soni said "CPython should" do something. Phrasing it that way comes off as demanding instead of just sharing an idea. Oleg tried to turn it around and point out that if Soni thinks this should happen then he should be ready to contribute the work to see it happen. -brett
-- Koos
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Hi, All! On Sat, Jul 01, 2017 at 04:22:31PM +0000, Brett Cannon <brett@python.org> wrote:
On Fri, Jun 30, 2017, 10:38 Koos Zevenhoven, <k7hoven@gmail.com> wrote:
On Jun 30, 2017 5:16 PM, "Oleg Broytman" <phd@phdru.name> wrote:
On Fri, Jun 30, 2017 at 12:09:52PM -0300, "Soni L." <fakedme+py@gmail.com> wrote:
CPython should get a
You're welcome to create one. Go on, send your pull requests!
But if you are planning to do that, it is still a good idea to ask for feedback here first. That will increase the chances of acceptance by a lot. Also, it doesn't necessarily need to be your own idea :)
I think Oleg was more responding to the fact that Soni said "CPython should" do something. Phrasing it that way comes off as demanding instead
Exactly!
of just sharing an idea. Oleg tried to turn it around and point out that if Soni thinks this should happen then he should be ready to contribute the work to see it happen.
I think the sentence "Python should have <some big and complex to implement feature>" should be ;-) forbidden if it is not followed with "I'm in the middle of development. Expect the 1st PR in <a short timeframe>." Python can only have features that You, the <User>, implemented (or paid for) and submitted.
-brett
-- Koos
Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Sat, Jul 1, 2017 at 1:17 PM, Oleg Broytman <phd@phdru.name> wrote:
I think the sentence "Python should have <some big and complex to implement feature>" should be ;-) forbidden if it is not followed with "I'm in the middle of development. Expect the 1st PR in <a short timeframe>."
Python can only have features that You, the <User>, implemented (or paid for) and submitted.
Devil's advocate: why prepare a patch and submit it if it is going to be dismissed out of hand. Trying to gauge support for the idea is a reasonable first-step. Devil's devil's advocate: if it has value, it could stand on it's own and gain it's own group of supporters as a CPython fork. Nick

On 1 July 2017 at 18:35, Nick Timkovich <prometheus235@gmail.com> wrote:
Devil's advocate: why prepare a patch and submit it if it is going to be dismissed out of hand. Trying to gauge support for the idea is a reasonable first-step.
That's perfectly OK, but it's important to phrase the email in a way that makes that clear - "I'm considering putting together a PR for Python to implement X. Does that sound like a good idea, or does anyone have suggestions for potential issues I might consider? Also, is there any prior work in this area that I should look into?" "Python should have X" implies (a) that you are criticising the python developers for missing that feature out, (b) that you consider your position self-evident, and (c) that you expect someone to implement it. People have different ways of expressing themselves, so we should all be prepared to allow some leeway in how people put their ideas across. But the writer has some responsibility for the tone, too. Paul

PyPy does basically this. So does the tentative project Pyjion. Also Numba, but on a pre-function basis. It's not a bad ideas, and one that currently exists with varying degrees of refinement in several projects. I may have forgotten a few others. I suppose Brython in a sense. This is very unlikely to make it into core CPython because code simplicity is one of its goals. But all those other projects would welcome your help. On Jun 30, 2017 8:22 AM, "Steven D'Aprano" <steve@pearwood.info> wrote: On Fri, Jun 30, 2017 at 12:09:52PM -0300, Soni L. wrote:
CPython should get a tracing JIT that turns slow bytecode into fast bytecode.
Are you volunteering to do the work? -- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

On 30/06/2017 16:09, Soni L. wrote:
CPython should get a tracing JIT that turns slow bytecode into fast bytecode.
A JIT doesn't have to produce machine code. bytecode-to-bytecode compilation is still compilation. bytecode-to-bytecode compilation works on iOS, and doesn't require deviating from C.
(This "internal bytecode" should do things like know that 2 variables necessarily hold integers, doing just "x = y + z" in C in an IADD instruction as opposed to all those middle-of-function typechecks and overhead. You can typecheck once at the start of the function and run separate traces on that. Since this "internal bytecode" is extremely unsafe, it should be considered an implementation detail and never exposed to external code.)
Patches are always welcome. When do you intend delivering yours? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email has been checked for viruses by AVG. http://www.avg.com

2017-06-30 17:09 GMT+02:00 Soni L. <fakedme+py@gmail.com>:
CPython should get a tracing JIT that turns slow bytecode into fast bytecode.
A JIT doesn't have to produce machine code. bytecode-to-bytecode compilation is still compilation. bytecode-to-bytecode compilation works on iOS, and doesn't require deviating from C.
Optimizations require to make assumptions on the code, and deoptimize if an assumption becomes wrong. I call these things "guards". If I understood correctly, PyPy is able to deoptimize a function in the middle of the function, while executing it. In my FAT Python project, I tried something simpler: add guards at the function entry point, and decide at the entry which version of the code should be run (FAT Python allows to have more than 2 versions of the code for the same function). I described my implementation in the PEP 510: https://www.python.org/dev/peps/pep-0510/ I agree that you *can* emit more efficient bytecode using assumptions. But I'm not sure that the best speedup will be worth it. For example, if your maximum speedup is 20% but the JIT compiler increases the startup time and uses more memory, I'm not sure that users will use it. The design will restrict indirectly the maximum speed. At the bytecode level, you cannot specialize bytecode for 1+2 (x+y with x=1 and y=2) for example. The BINARY_ADD instruction calls PyNumber_Add(), but a previous experience showed that the dispatch inside PyNumber_Add() to reach long_add() is expensive. I'm trying to find a solution to not make CPython 20% faster, but 2x faster. See my talk at the recent Python Language Summit (at Pycon US): https://github.com/haypo/conf/raw/master/2017-PyconUS/summit.pdf https://lwn.net/Articles/723949/ My mid-term/long-term plan for FAT Python is to support multiple optimizers, and allow developers to choose between bytecode ("Python" code) and machine code ("C" code). For example, work on an optimizer reusing Cython rather than writing a new compiler from scratch. My current optimizer works at the AST level and emits more efficient bytecode by rewriting the AST. But another major design choice in FAT Python is to run the optimizer ahead-of-time (AoT), rather than just-in-time (JIT). Maybe it will not work. We will see :-) I suggest you to take a look at my notes to make CPython faster: http://faster-cpython.readthedocs.io/ FAT Python homepage: http://faster-cpython.readthedocs.io/fat_python.html -- You may also be interested by my Pycon US talk about CPython optimization in 3.5, 3.6 and 3.7: https://lwn.net/Articles/725114/ Victor

On 2017-06-30 07:17 PM, Victor Stinner wrote:
2017-06-30 17:09 GMT+02:00 Soni L. <fakedme+py@gmail.com>:
CPython should get a tracing JIT that turns slow bytecode into fast bytecode.
A JIT doesn't have to produce machine code. bytecode-to-bytecode compilation is still compilation. bytecode-to-bytecode compilation works on iOS, and doesn't require deviating from C. Optimizations require to make assumptions on the code, and deoptimize if an assumption becomes wrong. I call these things "guards". If I understood correctly, PyPy is able to deoptimize a function in the middle of the function, while executing it. In my FAT Python project, I tried something simpler: add guards at the function entry point, and decide at the entry which version of the code should be run (FAT Python allows to have more than 2 versions of the code for the same function).
I described my implementation in the PEP 510: https://www.python.org/dev/peps/pep-0510/
I agree that you *can* emit more efficient bytecode using assumptions. But I'm not sure that the best speedup will be worth it. For example, if your maximum speedup is 20% but the JIT compiler increases the startup time and uses more memory, I'm not sure that users will use it. The design will restrict indirectly the maximum speed.
At the bytecode level, you cannot specialize bytecode for 1+2 (x+y with x=1 and y=2) for example. The BINARY_ADD instruction calls PyNumber_Add(), but a previous experience showed that the dispatch inside PyNumber_Add() to reach long_add() is expensive.
If you can assert that the sum(s) never overflow an int, you can avoid hitting long_add() entirely, and avoid all the checks around it. IADD would be IADD as opposed to NADD because it would add two ints specifically, not two numbers. And it would do no overflow checks because the JIT already told it no overflow can happen.
I'm trying to find a solution to not make CPython 20% faster, but 2x faster. See my talk at the recent Python Language Summit (at Pycon US): https://github.com/haypo/conf/raw/master/2017-PyconUS/summit.pdf https://lwn.net/Articles/723949/
My mid-term/long-term plan for FAT Python is to support multiple optimizers, and allow developers to choose between bytecode ("Python" code) and machine code ("C" code). For example, work on an optimizer reusing Cython rather than writing a new compiler from scratch. My current optimizer works at the AST level and emits more efficient bytecode by rewriting the AST.
But another major design choice in FAT Python is to run the optimizer ahead-of-time (AoT), rather than just-in-time (JIT). Maybe it will not work. We will see :-)
I suggest you to take a look at my notes to make CPython faster: http://faster-cpython.readthedocs.io/
FAT Python homepage: http://faster-cpython.readthedocs.io/fat_python.html
--
You may also be interested by my Pycon US talk about CPython optimization in 3.5, 3.6 and 3.7: https://lwn.net/Articles/725114/
Victor

Let's say that you have a function "def mysum (x; y): return x+y", do you always want to use your new IADD instruction here? What if I call mysum ("a", "b")? Victor

On 2017-07-01 07:34 PM, Victor Stinner wrote:
Let's say that you have a function "def mysum (x; y): return x+y", do you always want to use your new IADD instruction here? What if I call mysum ("a", "b")?
Victor
Let's say that you do. Given how short it is, it would just get inlined. Your call of mysum ("a", "b") would indeed not use IADD, nor would it be a call. It would potentially not invoke any operators, but instead get replaced with "ab". When you have a tracing JIT, you can do away with a lot of overhead. You can inline functions, variables, do away with typechecks, and many other things. This holds true even if that JIT never emits a single byte of machine code.

On Sun, Jul 2, 2017 at 8:52 AM, Soni L. <fakedme+py@gmail.com> wrote:
On 2017-07-01 07:34 PM, Victor Stinner wrote:
Let's say that you have a function "def mysum (x; y): return x+y", do you always want to use your new IADD instruction here? What if I call mysum ("a", "b")?
Victor
Let's say that you do. Given how short it is, it would just get inlined. Your call of mysum ("a", "b") would indeed not use IADD, nor would it be a call. It would potentially not invoke any operators, but instead get replaced with "ab".
When you have a tracing JIT, you can do away with a lot of overhead. You can inline functions, variables, do away with typechecks, and many other things. This holds true even if that JIT never emits a single byte of machine code.
Let's try a more complicated example. # demo.py def mysum(x, y): return x + y def do_stuff(a, b): print(mysum("foo", "bar")) print(mysum(5, 7)) print(mysum(a, 42)) print(mysum(b, "spam")) What can you optimize here? Now let's look at a file that might call it: # cruel.py import demo def nasty(x, y): demo.mysum = random.choice([ lambda x, y: x + y, lambda x, y: f"{x} + f{y}", lambda x, y: "muahahaha", ]) return Ellipsis demo.mysum = nasty demo.do_stuff("what", "now?") Unless you can prove that this doesn't happen, you can't really optimize much of mysum away. That's where a tracing JIT compiler has the advantage: it can notice *at run time* that you're not doing this kind of thing, and in effect, forfeit the optimizations when you're running your tests (since test suites are where this kind of monkey-patching tends to happen). ChrisA

On Sat, Jul 01, 2017 at 07:52:55PM -0300, Soni L. wrote:
On 2017-07-01 07:34 PM, Victor Stinner wrote:
Let's say that you have a function "def mysum (x; y): return x+y", do you always want to use your new IADD instruction here? What if I call mysum ("a", "b")?
Victor
Let's say that you do. Given how short it is, it would just get inlined. Your call of mysum ("a", "b") would indeed not use IADD, nor would it be a call. It would potentially not invoke any operators, but instead get replaced with "ab".
What you are describing sounds more like the output of a keyhole optimizer that folds constants, only extended to look inside functions. I expect that it would have to be a VERY clever optimizer, since it would have to do a complete whole-of-program static analysis to be sure that mysum has not been replaced, shadowed or redefined by the time it is called. I won't say that is outright impossible, but it would be *extremely* hard to do something like that at compile time.
When you have a tracing JIT, you can do away with a lot of overhead. You can inline functions, variables, do away with typechecks, and many other things. This holds true even if that JIT never emits a single byte of machine code.
What you are describing sounds more like an "Ahead Of Time" (AOT) compiler to me. Especially the part about doing away with typechecks. As far as I know you can really only do away with typechecks or other guards if you know ahead of time (at compile time) what the types of values are, and that requires static typing. -- Steve

On Sun, Jul 2, 2017 at 3:41 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Let's say that you do. Given how short it is, it would just get inlined. Your call of mysum ("a", "b") would indeed not use IADD, nor would it be a call. It would potentially not invoke any operators, but instead get replaced with "ab".
What you are describing sounds more like the output of a keyhole optimizer that folds constants, only extended to look inside functions. I expect that it would have to be a VERY clever optimizer, since it would have to do a complete whole-of-program static analysis to be sure that mysum has not been replaced, shadowed or redefined by the time it is called.
I won't say that is outright impossible, but it would be *extremely* hard to do something like that at compile time.
Isn't that the sort of thing that the "versioned globals dictionary" was supposed to do? If your globals haven't changed, you know that the optimizer was correct. But that's still a hard problem. Or at very least, it's decidedly non-trivial, and the costs are significant, so the net benefits aren't proven. ChrisA

On Sun, Jul 02, 2017 at 03:52:34PM +1000, Chris Angelico wrote:
On Sun, Jul 2, 2017 at 3:41 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Let's say that you do. Given how short it is, it would just get inlined. Your call of mysum ("a", "b") would indeed not use IADD, nor would it be a call. It would potentially not invoke any operators, but instead get replaced with "ab".
What you are describing sounds more like the output of a keyhole optimizer that folds constants, only extended to look inside functions. I expect that it would have to be a VERY clever optimizer, since it would have to do a complete whole-of-program static analysis to be sure that mysum has not been replaced, shadowed or redefined by the time it is called.
I won't say that is outright impossible, but it would be *extremely* hard to do something like that at compile time.
Isn't that the sort of thing that the "versioned globals dictionary" was supposed to do? If your globals haven't changed, you know that the optimizer was correct.
That only solves the problem of mysum being modified, not whether the arguments are ints. You still need to know whether it is safe to call some low-level (fast) integer addition routine, or whether you have to go through the (slow) high-level Python code. In any case, guards are a kind of runtime check. It might not be an explicit isinstance() check, but it is logically implies one. If x was an int, and nothing has changed, then x is still an int. If Victor is around, he might like to comment on how his FAT Python handles this.
But that's still a hard problem. Or at very least, it's decidedly non-trivial, and the costs are significant, so the net benefits aren't proven.
In fairness, they are proven for other languages, and they certainly worked for things like Psyco. So this isn't completely pie-in-the-sky dreaming. -- Steve

On Sun, Jul 2, 2017 at 10:13 PM, Steven D'Aprano <steve@pearwood.info> wrote:
But that's still a hard problem. Or at very least, it's decidedly non-trivial, and the costs are significant, so the net benefits aren't proven.
In fairness, they are proven for other languages, and they certainly worked for things like Psyco. So this isn't completely pie-in-the-sky dreaming.
Yeah. It's the realm of "let's put in some solid research, then do some proof of concept work, and maybe it'll be worth going ahead with" - it's not "Python should be able to optimize this away". ChrisA

2017-07-02 14:13 GMT+02:00 Steven D'Aprano <steve@pearwood.info>:
That only solves the problem of mysum being modified, not whether the arguments are ints. You still need to know whether it is safe to call some low-level (fast) integer addition routine, or whether you have to go through the (slow) high-level Python code.
In any case, guards are a kind of runtime check. It might not be an explicit isinstance() check, but it is logically implies one. If x was an int, and nothing has changed, then x is still an int.
If Victor is around, he might like to comment on how his FAT Python handles this.
FAT Python design is generic, you are free to implement any kind of check. A check is an object which provides a C callback. FAT Python provides the following guards: GuardArgType GuardBuiltins GuardDict GuardFunc GuardGlobals About "mysym being modified", it's handled by this guard: http://fatoptimizer.readthedocs.io/en/latest/fat.html#GuardFunc Right now, only the func.__code__ is watched. It's not enought, but it's a compromise to keep my implementation simple :-) Tomorrow, if FAT Python becomes a real thing, the builtin function type can be modified to add a version as we have for dictionaries, and the version will be increased for any function modification: argument defaults, arguments, name, etc. We would only have to modify GuardFunc implementation, not users of this guard. To really respect the Python semantics, guards became more complex than expected. GuardBuiltins doesn't only check that len() is still the same function in builtins. It only has to the globals name of the current frame, globals()[name] and builtins of the current frame. Python allows crazy stuff like running a single function with custom builtin functions: see exec() function. Victor

On 2017-07-02 02:41 AM, Steven D'Aprano wrote:
On Sat, Jul 01, 2017 at 07:52:55PM -0300, Soni L. wrote:
On 2017-07-01 07:34 PM, Victor Stinner wrote:
Let's say that you have a function "def mysum (x; y): return x+y", do you always want to use your new IADD instruction here? What if I call mysum ("a", "b")?
Victor Let's say that you do. Given how short it is, it would just get inlined. Your call of mysum ("a", "b") would indeed not use IADD, nor would it be a call. It would potentially not invoke any operators, but instead get replaced with "ab".
What you are describing sounds more like the output of a keyhole optimizer that folds constants, only extended to look inside functions. I expect that it would have to be a VERY clever optimizer, since it would have to do a complete whole-of-program static analysis to be sure that mysum has not been replaced, shadowed or redefined by the time it is called.
Runtime. Not static. This is the same kind of stuff LuaJIT (and any other JIT) does.
I won't say that is outright impossible, but it would be *extremely* hard to do something like that at compile time.
When you have a tracing JIT, you can do away with a lot of overhead. You can inline functions, variables, do away with typechecks, and many other things. This holds true even if that JIT never emits a single byte of machine code. What you are describing sounds more like an "Ahead Of Time" (AOT) compiler to me. Especially the part about doing away with typechecks. As far as I know you can really only do away with typechecks or other guards if you know ahead of time (at compile time) what the types of values are, and that requires static typing.
You can do that at runtime with a JIT and flushing the JIT cache when your assumptions (guards) change. (altho in reality you wouldn't flush the whole JIT cache because that'd be expensive.)
participants (11)
-
Brett Cannon
-
Chris Angelico
-
David Mertz
-
Koos Zevenhoven
-
Mark Lawrence
-
Nick Timkovich
-
Oleg Broytman
-
Paul Moore
-
Soni L.
-
Steven D'Aprano
-
Victor Stinner