data:image/s3,"s3://crabby-images/dfc47/dfc4750791333c9201e574ddfa956722ce3dfa87" alt=""
Saw recent discussion: https://mail.python.org/pipermail/python-dev/2016-February/143013.html I remember trying WPython; it was fast. Unfortunately it feels it came at the wrong time when development was invested in getting py3k out the door. It also had a lot of other ideas like *_INT instructions which allowed having oparg to be a constant int rather than needing to LOAD_CONST one. Anyways I'll stop reminiscing abarnert has started an experiment with wordcode: https://github.com/abarnert/cpython/blob/c095a32f2a68ac708466b9c64906cc4d0f5... I've personally benchmarked this fork with positive results. This experiment seeks to be conservative-- it doesn't seek to introduce new opcodes or combine BINARY_OP's all into a single op where the currently unused-in-wordcode arg then states the kind of binary op (à la COMPARE_OP). I've submitted a pull request which is working on fixing tests & updating peephole.c Bringing this up on the list to figure out if there's interest in a basic wordcode change. It feels like there's no downsides: faster code, smaller bytecode, simpler interpretation of bytecode (The Nth instruction starts at the 2Nth byte if you count EXTENDED_ARG as an instruction). The only downside is the transitional cost What'd be necessary for this to be pulled upstream?
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
I think it's probably too soon to discuss on python-dev, but I do think that something like this could be attempted in 3.6 or (more likely) 3.7, if it really is faster. An unfortunate issue however is that many projects seem to make a hobby of hacking bytecode. All those projects would have to be totally rewritten in order to support the new wordcode format (as opposed to just having to be slightly adjusted to support the occasional new bytecode opcode). Those projects of course don't work with Pypy or Jython either, but they do work for mainstream CPython, and it's unacceptable to just leave them all behind. As an example, AFAIK coverage.py interprets bytecode. This is an important piece of infrastructure that we wouldn't want to leave behind. I think py.test's assert-rewrite code also generates or looks at bytecode. Also important. All of which means that it's more likely to make it into 3.7. See you on python-ideas! --Guido On Sun, Feb 14, 2016 at 4:20 PM, Demur Rumed <gunkmute@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Feb 14, 2016, at 19:05, Guido van Rossum <guido@python.org> wrote:
This is part of why I suggested, on -ideas, that we should add a mutating/assembling API to the dis module. People argued that such an API would make the bytecode format more fragile, but the exact opposite is true. At the dis level, everything is unchanged by wordcode. Or by Serhiy's args-packed-in-opcode. So, if the dis module could do everything for people that, say, the third-party byteplay module does (which wouldn't take much), so things like coverage.py, or the various special-case optimizer decorators on PyPI and ActiveState, etc. could all be written to deal with the dis module format rather than raw bytecode, we could make changes like this without risking nearly as much breakage. Anyway, this obviously wouldn't help the transition for 3.6. But improving dis in 3.6, with a warning that raw bytecode might start changing more frequently and/or radically in the future now that there's less reason to depend on it, might help if wordcode were to go into 3.7.
Despite the name (and inspiration), my fork has very little to do with WPython. I'm just focused on simpler (hopefully = faster) fetch code; he started with that, but ended up going the exact opposite direction, accepting more complicated (and much slower) fetch code as a reasonable cost for drastically reducing the number of instructions. (If you double the 30% fetch-and-parse overhead per instruction, but cut the number of instructions to 40%, the net is a huge win.)
data:image/s3,"s3://crabby-images/fdb1a/fdb1a0553830c3da48a0a4a9f9c7e1bde3e12a87" alt=""
2016-02-15 8:14 GMT+01:00 Andrew Barnert via Python-Dev < python-dev@python.org>:
I don't know why you consider slower the WPython's code that fetches the more complicated instructions. On the contrary, I've structured such "superinstructions" in order to simplify their decoding. Arguments are decoded as they are needed in a specific moment, in order to reduce or completely avoid the usage of temporary variables to keep such values. Can you provide some example about your claim? Regarding the WPython goal, it wasn't only about introducing simpler instructions. As I've written also in my presentation, it's an hybrid VM: stack and register-based. I've introduced a new instruction format for the existing CPython's instructions, which are now easier to fetch, decode, execute, and provide a better density too (for the most common case: arguments with a maximum of 255 as value/index). However I've also added superinstructions to better pack more "useful work", which provides more code density and they are the primary responsible for improving the execution speed. Regards, Cesare
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
Guido van Rossum wrote:
Maybe this argues for having an assembly-language-like intermediate form between the AST and the actual code used by the interpreter? Done properly it could make things easier for bytecode-hacking projects as well as providing some insulation from implementation details. -- Greg
data:image/s3,"s3://crabby-images/78d01/78d0121057ef01b75628908c4ad7e1d6fcbadc34" alt=""
Demur Rumed <gunkmute <at> gmail.com> writes:
I've personally benchmarked this fork with positive results.
I'm skeptical of claims like this. What did you benchmark exactly, and with which results? I don't think changing the opcode encoding per se will bring any large benefit... Regards Antoine.
data:image/s3,"s3://crabby-images/fdb1a/fdb1a0553830c3da48a0a4a9f9c7e1bde3e12a87" alt=""
2016-02-17 12:04 GMT+01:00 Antoine Pitrou <antoine@python.org>:
With WPython I've introduced several optimizations which improved a lot the execution speed (+25% with PyStone, at the time, compared to CPython 2.6), but most of the benefits came from the new opcode format. Regards, Cesare
data:image/s3,"s3://crabby-images/fdb1a/fdb1a0553830c3da48a0a4a9f9c7e1bde3e12a87" alt=""
2016-02-15 1:20 GMT+01:00 Demur Rumed <gunkmute@gmail.com>:
Not only that. IMO the primary problem was related to the fact the "patch" was too big to be reviewed. Unfortunately it was my first attempt, and having worked alone I introduced too much optimizations and (heavy) changes to the code. An incremental approach should have worked better, albeit I believe that such drastic move from the consolidated bytecodes to the new wordcodes would have produced strong resistance anyway.
It also had a lot of other ideas like *_INT instructions which allowed having oparg to be a constant int rather than needing to LOAD_CONST one.
This, specifically, was an experiment that I made with WPython 1.1, which I recommend to do not follow. There are other, more general, ways to speedup the execution when dealing with integers.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
I think it's probably too soon to discuss on python-dev, but I do think that something like this could be attempted in 3.6 or (more likely) 3.7, if it really is faster. An unfortunate issue however is that many projects seem to make a hobby of hacking bytecode. All those projects would have to be totally rewritten in order to support the new wordcode format (as opposed to just having to be slightly adjusted to support the occasional new bytecode opcode). Those projects of course don't work with Pypy or Jython either, but they do work for mainstream CPython, and it's unacceptable to just leave them all behind. As an example, AFAIK coverage.py interprets bytecode. This is an important piece of infrastructure that we wouldn't want to leave behind. I think py.test's assert-rewrite code also generates or looks at bytecode. Also important. All of which means that it's more likely to make it into 3.7. See you on python-ideas! --Guido On Sun, Feb 14, 2016 at 4:20 PM, Demur Rumed <gunkmute@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Feb 14, 2016, at 19:05, Guido van Rossum <guido@python.org> wrote:
This is part of why I suggested, on -ideas, that we should add a mutating/assembling API to the dis module. People argued that such an API would make the bytecode format more fragile, but the exact opposite is true. At the dis level, everything is unchanged by wordcode. Or by Serhiy's args-packed-in-opcode. So, if the dis module could do everything for people that, say, the third-party byteplay module does (which wouldn't take much), so things like coverage.py, or the various special-case optimizer decorators on PyPI and ActiveState, etc. could all be written to deal with the dis module format rather than raw bytecode, we could make changes like this without risking nearly as much breakage. Anyway, this obviously wouldn't help the transition for 3.6. But improving dis in 3.6, with a warning that raw bytecode might start changing more frequently and/or radically in the future now that there's less reason to depend on it, might help if wordcode were to go into 3.7.
Despite the name (and inspiration), my fork has very little to do with WPython. I'm just focused on simpler (hopefully = faster) fetch code; he started with that, but ended up going the exact opposite direction, accepting more complicated (and much slower) fetch code as a reasonable cost for drastically reducing the number of instructions. (If you double the 30% fetch-and-parse overhead per instruction, but cut the number of instructions to 40%, the net is a huge win.)
data:image/s3,"s3://crabby-images/fdb1a/fdb1a0553830c3da48a0a4a9f9c7e1bde3e12a87" alt=""
2016-02-15 8:14 GMT+01:00 Andrew Barnert via Python-Dev < python-dev@python.org>:
I don't know why you consider slower the WPython's code that fetches the more complicated instructions. On the contrary, I've structured such "superinstructions" in order to simplify their decoding. Arguments are decoded as they are needed in a specific moment, in order to reduce or completely avoid the usage of temporary variables to keep such values. Can you provide some example about your claim? Regarding the WPython goal, it wasn't only about introducing simpler instructions. As I've written also in my presentation, it's an hybrid VM: stack and register-based. I've introduced a new instruction format for the existing CPython's instructions, which are now easier to fetch, decode, execute, and provide a better density too (for the most common case: arguments with a maximum of 255 as value/index). However I've also added superinstructions to better pack more "useful work", which provides more code density and they are the primary responsible for improving the execution speed. Regards, Cesare
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
Guido van Rossum wrote:
Maybe this argues for having an assembly-language-like intermediate form between the AST and the actual code used by the interpreter? Done properly it could make things easier for bytecode-hacking projects as well as providing some insulation from implementation details. -- Greg
data:image/s3,"s3://crabby-images/78d01/78d0121057ef01b75628908c4ad7e1d6fcbadc34" alt=""
Demur Rumed <gunkmute <at> gmail.com> writes:
I've personally benchmarked this fork with positive results.
I'm skeptical of claims like this. What did you benchmark exactly, and with which results? I don't think changing the opcode encoding per se will bring any large benefit... Regards Antoine.
data:image/s3,"s3://crabby-images/fdb1a/fdb1a0553830c3da48a0a4a9f9c7e1bde3e12a87" alt=""
2016-02-17 12:04 GMT+01:00 Antoine Pitrou <antoine@python.org>:
With WPython I've introduced several optimizations which improved a lot the execution speed (+25% with PyStone, at the time, compared to CPython 2.6), but most of the benefits came from the new opcode format. Regards, Cesare
data:image/s3,"s3://crabby-images/fdb1a/fdb1a0553830c3da48a0a4a9f9c7e1bde3e12a87" alt=""
2016-02-15 1:20 GMT+01:00 Demur Rumed <gunkmute@gmail.com>:
Not only that. IMO the primary problem was related to the fact the "patch" was too big to be reviewed. Unfortunately it was my first attempt, and having worked alone I introduced too much optimizations and (heavy) changes to the code. An incremental approach should have worked better, albeit I believe that such drastic move from the consolidated bytecodes to the new wordcodes would have produced strong resistance anyway.
It also had a lot of other ideas like *_INT instructions which allowed having oparg to be a constant int rather than needing to LOAD_CONST one.
This, specifically, was an experiment that I made with WPython 1.1, which I recommend to do not follow. There are other, more general, ways to speedup the execution when dealing with integers.
participants (7)
-
Andrew Barnert
-
Antoine Pitrou
-
Cesare Di Mauro
-
Demur Rumed
-
Greg Ewing
-
Guido van Rossum
-
Maciej Fijalkowski