Is there a reference manual for Python bytecode?

Hi. Looking at ceval.c and peephole.c, there is - of course - lots of specific hard-coded knowledge of the bytecode (e.g., number of operands and other attributes). I'd like to experiment at this level, but I can't seem to find a reference for the bytecode. Is there the equivalent of something like the ARM ARM(*) for Python bytecode? I can read Python or C code if it's encoded that way, but I'm looking for something that's a bit more immediate than deciphering what an interpreter or optimizer is trying to do (i.e., some sort of table layout or per-opcode set of attributes). BR, E. (*) http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0406c/index.h...

The number and meaning of the arguments are documented in the dis module: https://docs.python.org/3.6/library/dis.html On Sat, Dec 26, 2015 at 5:20 PM, Erik <python@lucidity.plus.com> wrote:

Also there's a great talk by Allison Kaptur on YouTube about this topic: https://www.youtube.com/watch?v=HVUTjQzESeo -- --Guido van Rossum (python.org/~guido)

Hi Joe, On 26/12/15 22:36, Joe Jevnik wrote:
The number and meaning of the arguments are documented in the dis module: https://docs.python.org/3.6/library/dis.html
OK - I *did* find that, but perhaps didn't immediately understand what it was telling me. So, something documented as "OP_CODE" is a 1-byte op, something documented as "OP_CODE(foo)" is a 2-byte op - and unless I missed one, there are no 3-byte ops? Thanks, E.

Ned also neglected to mention his byterun project which is a pure Python implementation of the CPython eval loop: https://github.com/nedbat/byterun On Sat, 26 Dec 2015, 16:38 Ned Batchelder <ned@nedbatchelder.com> wrote:

On 27 December 2015 at 12:23, Guido van Rossum <guido@python.org> wrote:
It occurred to me that "byterun" would make a good see-also link from the dis module docs, and looking into that idea brought me to this article Allison wrote about it for the "500 lines" project: http://aosabook.org/en/500L/a-python-interpreter-written-in-python.html For a detailed semantic reference, byterun's eval loop is likely one of the most readable sources of information: https://github.com/nedbat/byterun/blob/master/byterun/pyvm2.py In terms of formal documentation, the main problem with providing reference bytecode tables is keeping them up to date as the eval loop changes. However, it would theoretically be possible to create a custom Sphinx directive that uses the dis module to generate the tables automatically during the docs build process, rather than maintaining them by hand - something like that could be experimented with outside CPython, and potentially incorporated into the dis module docs if folks are able to figure out something that works well. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 12/26/15 10:19 PM, Nick Coghlan wrote:
And: it doesn't work completely. There are things it doesn't handle properly, and I turned to other projects some time ago. If someone wants to pick it up, that would be cool.
Recently though, I've started a new implementation of branch coverage based on the ast rather than the bytecode. This was prompted by the "async" keywords in 3.5. "async for" and "for" compile very differently to bytecode, which caused headaches for a bytecode-based understanding of flow, so I'm trying out an ast-based understanding. --Ned.

Thanks for your help so far (I'm experimenting with the peephole optimizer - hence my question before as I was trying to work out how to know what the small integer hard-coded offsets should be when looking ahead in the bytecode). I've successfully added a new opcode (generated by the optimizer and understood by the interpreter loop) but when adding a second I unexpectedly got the following error. I'm not doing anything different to what I did with the first opcode as far as I can tell (I have a TARGET(FOO) in ceval.c and have obviously defined the new opcode's value in opcode.h). """ ./python -E -S -m sysconfig --generate-posix-vars ;\ if test $? -ne 0 ; then \ echo "generate-posix-vars failed" ; \ rm -f ./pybuilddir.txt ; \ exit 1 ; \ fi XXX lineno: 241, opcode: 1 Fatal Python error: Py_Initialize: can't import _frozen_importlib Traceback (most recent call last): File "<frozen importlib._bootstrap>", line 698, in <module> File "<frozen importlib._bootstrap>", line 751, in BuiltinImporter File "<frozen importlib._bootstrap>", line 241, in _requires_builtin SystemError: unknown opcode Aborted (core dumped) generate-posix-vars failed make: *** [pybuilddir.txt] Error 1 """ If I #ifdef out the code in peephole.c which generates my new (2nd) opcode, then the error does not occur. I tried a "make clean" first, but that didn't help (I realise that does not necessarily rule out a makefile dependency issue). Does anyone know if this is a well-known symptom of forgetting to add something somewhere when adding a new opcode, or do I need to track it down some more myself? I did not have this problem when introducing my first new opcode. Thanks, E.

You can look at https://docs.python.org/devguide/compiler.html to see if you missed something. As for the _frozen_importlib problem, that typically manifests itself when you have invalid bytecode (that module is frozen bytecode that gets compiled into the interpreter and is the first bit of Python code that gets run). On Sun, 27 Dec 2015, 16:41 Guido van Rossum <gvanrossum@gmail.com> wrote:

On 28/12/15 00:41, Guido van Rossum wrote:
Can you show the diffs you have so far? Somebody's got to look at your code.
Sounds like it's not a well-known symptom then. I agree, but that Somebody should be me (initially, at least) - I don't want to waste other people's time if I made a silly mistake. I'm happy to post my diffs once I'm done (if only to document that what I tried is not worth spending time on). E.

On 28 December 2015 at 11:00, Erik <python@lucidity.plus.com> wrote:
The symptom is well known (at least to folks that have worked on the compiler and eval loop since the switch to importlib as the import system implementation), but the circumstances where it can arise are *very* limited. Specifically, being unable to load the import system while working on CPython is usually a sign that: 1. The interpreter's bytecode generation is inconsistent with the implementation of the eval loop 2. importlib._bootstrap includes code that triggers the inconsistent bytecode processing path 3. Freezing importlib._bootstrap to create _frozen_importlib thus produces a frozen module that won't load with the given eval loop implementation If you're not hacking on bytecode generation or the eval loop (1), or your changes to the bytecode generator and/or eval loop don't impact the code in importlib._bootstrap (2), then you won't see this kind of bug (3).
In this particular case, it's hard to help debug the error without being able to see both the new code generation changes and the corresponding eval loop changes. It's also the case that to rule out the bootstrapping cycle as a potential source of problems, you can try the following manual dance: 1. Revert to a clean checkout and rebuild 2. Apply the eval loop changes, and rebuild 3. Apply the code generation changes, and rebuild That generally *shouldn't* be necessary (it's why there's a separate build step to freeze the import system), but it can be a useful exercise to help figure out the source of the "unknown opcode" problem. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi Nick, On 29/12/15 02:46, Nick Coghlan wrote:
1. The interpreter's bytecode generation is inconsistent with the implementation of the eval loop
Essentially, this was my problem. I'd neglected to add the reference to TARGET_NEW_OP2 to Python/opcode_targets.h (so staring hard at the op generation and ceval implementation did not help me: they were both fine). I'd forgotten adding the first op to that array, and section 24.8 of https://docs.python.org/devguide/compiler.html doesn't mention that file either. I will look at raising a docs bug on that.
Thanks - this is useful to know. It's a bit chicken-and-egg if one has introduced a bug which stops the build-time execution of the python auto-generation scripts from executing correctly :) E.

Brett Cannon <brett@python.org> wrote:
I would also encourage you to take a look at Numba. It is an LLVM based JIT compiler for Python bytecode, written for hardcore numerical algorithms in Python. It can often achieve the same performance as -O2 in C after a short burn-in while inferring the types of the arguments and variables. Using it is mostly as easy as adding an @numba.jit decorator to the function we want to accelerate. Numba is rapidly becoming what Google's long dead swallow should have been. :-) Sturla

The number and meaning of the arguments are documented in the dis module: https://docs.python.org/3.6/library/dis.html On Sat, Dec 26, 2015 at 5:20 PM, Erik <python@lucidity.plus.com> wrote:

Also there's a great talk by Allison Kaptur on YouTube about this topic: https://www.youtube.com/watch?v=HVUTjQzESeo -- --Guido van Rossum (python.org/~guido)

Hi Joe, On 26/12/15 22:36, Joe Jevnik wrote:
The number and meaning of the arguments are documented in the dis module: https://docs.python.org/3.6/library/dis.html
OK - I *did* find that, but perhaps didn't immediately understand what it was telling me. So, something documented as "OP_CODE" is a 1-byte op, something documented as "OP_CODE(foo)" is a 2-byte op - and unless I missed one, there are no 3-byte ops? Thanks, E.

Ned also neglected to mention his byterun project which is a pure Python implementation of the CPython eval loop: https://github.com/nedbat/byterun On Sat, 26 Dec 2015, 16:38 Ned Batchelder <ned@nedbatchelder.com> wrote:

On 27 December 2015 at 12:23, Guido van Rossum <guido@python.org> wrote:
It occurred to me that "byterun" would make a good see-also link from the dis module docs, and looking into that idea brought me to this article Allison wrote about it for the "500 lines" project: http://aosabook.org/en/500L/a-python-interpreter-written-in-python.html For a detailed semantic reference, byterun's eval loop is likely one of the most readable sources of information: https://github.com/nedbat/byterun/blob/master/byterun/pyvm2.py In terms of formal documentation, the main problem with providing reference bytecode tables is keeping them up to date as the eval loop changes. However, it would theoretically be possible to create a custom Sphinx directive that uses the dis module to generate the tables automatically during the docs build process, rather than maintaining them by hand - something like that could be experimented with outside CPython, and potentially incorporated into the dis module docs if folks are able to figure out something that works well. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 12/26/15 10:19 PM, Nick Coghlan wrote:
And: it doesn't work completely. There are things it doesn't handle properly, and I turned to other projects some time ago. If someone wants to pick it up, that would be cool.
Recently though, I've started a new implementation of branch coverage based on the ast rather than the bytecode. This was prompted by the "async" keywords in 3.5. "async for" and "for" compile very differently to bytecode, which caused headaches for a bytecode-based understanding of flow, so I'm trying out an ast-based understanding. --Ned.

Thanks for your help so far (I'm experimenting with the peephole optimizer - hence my question before as I was trying to work out how to know what the small integer hard-coded offsets should be when looking ahead in the bytecode). I've successfully added a new opcode (generated by the optimizer and understood by the interpreter loop) but when adding a second I unexpectedly got the following error. I'm not doing anything different to what I did with the first opcode as far as I can tell (I have a TARGET(FOO) in ceval.c and have obviously defined the new opcode's value in opcode.h). """ ./python -E -S -m sysconfig --generate-posix-vars ;\ if test $? -ne 0 ; then \ echo "generate-posix-vars failed" ; \ rm -f ./pybuilddir.txt ; \ exit 1 ; \ fi XXX lineno: 241, opcode: 1 Fatal Python error: Py_Initialize: can't import _frozen_importlib Traceback (most recent call last): File "<frozen importlib._bootstrap>", line 698, in <module> File "<frozen importlib._bootstrap>", line 751, in BuiltinImporter File "<frozen importlib._bootstrap>", line 241, in _requires_builtin SystemError: unknown opcode Aborted (core dumped) generate-posix-vars failed make: *** [pybuilddir.txt] Error 1 """ If I #ifdef out the code in peephole.c which generates my new (2nd) opcode, then the error does not occur. I tried a "make clean" first, but that didn't help (I realise that does not necessarily rule out a makefile dependency issue). Does anyone know if this is a well-known symptom of forgetting to add something somewhere when adding a new opcode, or do I need to track it down some more myself? I did not have this problem when introducing my first new opcode. Thanks, E.

You can look at https://docs.python.org/devguide/compiler.html to see if you missed something. As for the _frozen_importlib problem, that typically manifests itself when you have invalid bytecode (that module is frozen bytecode that gets compiled into the interpreter and is the first bit of Python code that gets run). On Sun, 27 Dec 2015, 16:41 Guido van Rossum <gvanrossum@gmail.com> wrote:

On 28/12/15 00:41, Guido van Rossum wrote:
Can you show the diffs you have so far? Somebody's got to look at your code.
Sounds like it's not a well-known symptom then. I agree, but that Somebody should be me (initially, at least) - I don't want to waste other people's time if I made a silly mistake. I'm happy to post my diffs once I'm done (if only to document that what I tried is not worth spending time on). E.

On 28 December 2015 at 11:00, Erik <python@lucidity.plus.com> wrote:
The symptom is well known (at least to folks that have worked on the compiler and eval loop since the switch to importlib as the import system implementation), but the circumstances where it can arise are *very* limited. Specifically, being unable to load the import system while working on CPython is usually a sign that: 1. The interpreter's bytecode generation is inconsistent with the implementation of the eval loop 2. importlib._bootstrap includes code that triggers the inconsistent bytecode processing path 3. Freezing importlib._bootstrap to create _frozen_importlib thus produces a frozen module that won't load with the given eval loop implementation If you're not hacking on bytecode generation or the eval loop (1), or your changes to the bytecode generator and/or eval loop don't impact the code in importlib._bootstrap (2), then you won't see this kind of bug (3).
In this particular case, it's hard to help debug the error without being able to see both the new code generation changes and the corresponding eval loop changes. It's also the case that to rule out the bootstrapping cycle as a potential source of problems, you can try the following manual dance: 1. Revert to a clean checkout and rebuild 2. Apply the eval loop changes, and rebuild 3. Apply the code generation changes, and rebuild That generally *shouldn't* be necessary (it's why there's a separate build step to freeze the import system), but it can be a useful exercise to help figure out the source of the "unknown opcode" problem. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi Nick, On 29/12/15 02:46, Nick Coghlan wrote:
1. The interpreter's bytecode generation is inconsistent with the implementation of the eval loop
Essentially, this was my problem. I'd neglected to add the reference to TARGET_NEW_OP2 to Python/opcode_targets.h (so staring hard at the op generation and ceval implementation did not help me: they were both fine). I'd forgotten adding the first op to that array, and section 24.8 of https://docs.python.org/devguide/compiler.html doesn't mention that file either. I will look at raising a docs bug on that.
Thanks - this is useful to know. It's a bit chicken-and-egg if one has introduced a bug which stops the build-time execution of the python auto-generation scripts from executing correctly :) E.

Brett Cannon <brett@python.org> wrote:
I would also encourage you to take a look at Numba. It is an LLVM based JIT compiler for Python bytecode, written for hardcore numerical algorithms in Python. It can often achieve the same performance as -O2 in C after a short burn-in while inferring the types of the arguments and variables. Using it is mostly as easy as adding an @numba.jit decorator to the function we want to accelerate. Numba is rapidly becoming what Google's long dead swallow should have been. :-) Sturla
participants (8)
-
Brett Cannon
-
Erik
-
Guido van Rossum
-
Guido van Rossum
-
Joe Jevnik
-
Ned Batchelder
-
Nick Coghlan
-
Sturla Molden