proto-pep: How to change Python's bytecode
After implementing over 10 new opcodes for my thesis I figured I should write down the basic steps in an info PEP so that there is enough guidelines with this PEP and PEP 306 to cover the bases on changes to the language itself. To go along with this I also plan to write some benchmarks for individual opcodes that could possibly lead to a testing suite for the opcodes themselves (will probably do this piece-meal and put it up on SF initially since there are a lot of opcodes). Anyway, let me know if I seem to be missing anything or have something to add. After a reasonable time of non-response to this I will request a PEP number (assuming people don't think this PEP is stupid). ------------------------------------------ PEP: XXX Title: How to change Python's bytecode Version: $Revision: 1.4 $ Last-Modified: $Date: 2003/09/22 04:51:50 $ Author: Brett Cannoon <brett@python.org> Status: Draft Type: Informational Content-Type: text/x-rst Created: XX-XXX-XXXX Post-History: XX-XXX-XXXX Abstract ======== Python source code is compiled down to something called bytecode. This bytecode (which can be viewed as sequences of opcodes) defines what Python is capable of. As such, knowing how to add, remove, or change the bytecode is important to do properly when changing the abilities of the Python language. Rationale ========= While changing Python's bytecode is not a frequent occurence, it still happens. Having the required steps documented in a single location should make experimentation with the bytecode easier since it is not necessarily obvious what the steps are to change the bytecode. This PEP, paired with PEP 306 [#PEP-306]_, should provide enough basic guidelines for handling any changes performed to the Python language itself in terms of syntactic changes that introduce new semantics. Checklist ========= This is a rough checklist of what files need to change and how they are involved with the bytecode. All paths are given from the viewpoint of ``/cvsroot/python/dist/src`` from CVS). This list should not be considered exhaustive nor to cover all possible situations. - ``Include/opcode.h`` This include file lists all known opcodes and associates each opcode name with a unique number. When adding a new opcode it is important to take note of the ``HAVE_ARGUMENT`` value. This ``#define``'s value specifies the value at which all opcodes that have a value greater than ``HAVE_ARGUMENT`` are expected to take an argument to the opcode. - ``Lib/opcode.py`` Lists all of the opcodes and their associated value. Used by the dis module [#dis]_ to map bytecode values to their names. - ``Python/ceval.c`` Contains the main interpreter loop. Code to handle the evalution of an opcode here. - ``Python/compile.c`` To make sure an opcode is actually used, this file must be altered. The emitting of all bytecode occurs here. - ``Lib/compiler/pyassem.py``, ``Lib/compiler/pycodegen.py`` The 'compiler' package [#compiler]_ needs to be altered to also reflect any changes to the bytecode. - ``Doc/lib/libdis.tex`` The documentation [#dis-docs] for the dis module contains a complete list of all the opcodes. - ``Python/import.c`` Defines the magic word (named ``MAGIC``) used in .pyc files to detect if the bytecode used matches the one used by the version of Python running. This number needs to be changed to make sure that the running interpreter does not try to execute bytecode that it does not know about. Suggestions for bytecode development ==================================== A few things can be done to make sure that development goes smoothly when experimenting with Python's bytecode. One is to delete all .py(c|o|w) files after each semantic change to Python/compile.c . That way all files will use any bytecode changes. Make sure to run the entire testing suite [#test-suite]_. Since the ``regrtest.py`` driver recompiles all source code before a test is run it acts a good test to make sure that no existing semantics are broken. Running parrotbench [#parrotbench]_ is also a good way to make sure existing semantics are not broken; this benchmark is practically a compliance test. References ========== .. [#PEP-306] PEP 306, How to Change Python's Grammar, Hudson (http://www.python.org/peps/pep-0306.html) .. [#dis] XXX .. [#test-suite] XXX .. [#parrotbench] XXX .. [#dis-docs] XXX Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End:
"Brett C." <bac@ocf.berkeley.edu> wrote in message news:41CC7F67.9070009@ocf.berkeley.edu... At to the title, bytecodes are a property of the CPython implementation, not of Python itself. Since I think the distinction is quite important to maintain, I would insert the missing 'C' and everywhere else as appropriate.
Over the last several years, various people have reported experimenting with CPython's bytecodes. I wonder if it would be helpful to have a respository of the results, in one place, for new experimenters and curious people to peruse.
Anyway, let me know if I seem to be missing anything or have something to add.
As I said, the importanct 'C' qualifier.
PEP: XXX Title: How to change Python's bytecode
/P/CP/
Python source code is compiled down to something called bytecode.
Suggested replacements in quotes: "CPython compiles Python source code to something called bytecode."
This bytecode (which can be viewed as sequences of opcodes) defines what Python is capable of.
This is backwards. "The language, as defined in the Reference Manual, determines what bytecodes are needed (collectively, not one by one) for a bytecode implementation."
"Therefore, changes in the language may require changes in the set of bytecodes. In addition, changing the bytecode set for a given definition may result in desireable changes in the interpreter behavior. This document describes how to do so for either reason."
/P/CP/ Experiments are much more frequent than committed changes -- all of which start as experiments. ... Terry J. Reedy
>> Over the last several years, various people have reported >> experimenting with CPython's bytecodes. I wonder if it would be >> helpful to have a respository of the results, in one place, for new >> experimenters and curious people to peruse. Brett> Wouldn't hurt. Adding that section would not be bad, but I don't Brett> have the inclination to hunt them down. What do others think Brett> about having this section? How about just references in the PEP? I presented a paper several years ago at a Python workshop on a peephole optimizer for Python. Also, Michael Hudson has his no-longer-active bytecodehacks stuff: http://www.foretec.com/python/workshops/1998-11/proceedings/papers/montanaro... http://bytecodehacks.sourceforge.net/bch-docs/bch/index.html I imagine there's other stuff as well. Skip
OK, latest update with all suggest revisions (mention this is for CPython, section for known previous bytecode work). If no one has any revisions I will submit to David for official PEP acceptance this weekend. ---------------------------------- PEP: XXX Title: How to change CPython's bytecode Version: $Revision: 1.4 $ Last-Modified: $Date: 2003/09/22 04:51:50 $ Author: Brett Cannoon <brett@python.org> Status: Draft Type: Informational Content-Type: text/x-rst Created: XX-XXX-XXXX Post-History: XX-XXX-XXXX Abstract ======== Python source code is compiled down to something called bytecode. This bytecode must implement enough semantics to perform the actions required by the Language Reference [#lang-ref]. As such, knowing how to add, remove, or change the bytecode is important to do properly when changing the abilities of the Python language. This PEP covers how to accomplish this in the CPython implementation of the language (referred to as simply "Python" for the rest of this PEP). Rationale ========= While changing Python's bytecode is not a frequent occurence, it still happens. Having the required steps documented in a single location should make experimentation with the bytecode easier since it is not necessarily obvious what the steps are to change the bytecode. This PEP, paired with PEP 306 [#PEP-306]_, should provide enough basic guidelines for handling any changes performed to the Python language itself in terms of syntactic changes that introduce new semantics. Checklist ========= This is a rough checklist of what files need to change and how they are involved with the bytecode. All paths are given from the viewpoint of ``/cvsroot/python/dist/src`` from CVS). This list should not be considered exhaustive nor to cover all possible situations. - ``Include/opcode.h`` This include file lists all known opcodes and associates each opcode name with a unique number. When adding a new opcode it is important to take note of the ``HAVE_ARGUMENT`` value. This ``#define``'s value specifies the value at which all opcodes that have a value greater than ``HAVE_ARGUMENT`` are expected to take an argument to the opcode. - ``Lib/opcode.py`` Lists all of the opcodes and their associated value. Used by the dis module [#dis]_ to map bytecode values to their names. - ``Python/ceval.c`` Contains the main interpreter loop. Code to handle the evalution of an opcode here. - ``Python/compile.c`` To make sure an opcode is actually used, this file must be altered. The emitting of all bytecode occurs here. - ``Lib/compiler/pyassem.py``, ``Lib/compiler/pycodegen.py`` The 'compiler' package [#compiler]_ needs to be altered to also reflect any changes to the bytecode. - ``Doc/lib/libdis.tex`` The documentation [#opcode-list] for the dis module contains a complete list of all the opcodes. - ``Python/import.c`` Defines the magic word (named ``MAGIC``) used in .pyc files to detect if the bytecode used matches the one used by the version of Python running. This number needs to be changed to make sure that the running interpreter does not try to execute bytecode that it does not know about. Suggestions for bytecode development ==================================== A few things can be done to make sure that development goes smoothly when experimenting with Python's bytecode. One is to delete all .py(c|o) files after each semantic change to Python/compile.c . That way all files will use any bytecode changes. Make sure to run the entire testing suite [#test-suite]_. Since the ``regrtest.py`` driver recompiles all source code before a test is run it acts a good test to make sure that no existing semantics are broken. Running parrotbench [#parrotbench]_ is also a good way to make sure existing semantics are not broken; this benchmark is practically a compliance test. Previous experiments ==================== Skip Montanaro presented a paper at a Python workshop on a peephole optimizer [#skip-peephole]_. Michael Hudson has a non-active SourceForge project named Bytecodehacks [#Bytecodehacks]_ that provides functionality for playing with bytecode directly. References ========== .. [#lang-ref] Python Language Reference, van Rossum & Drake (http://docs.python.org/ref/ref.html) .. [#PEP-306] PEP 306, How to Change Python's Grammar, Hudson (http://www.python.org/peps/pep-0306.html) .. [#dis] dis Module (http://docs.python.org/lib/module-dis.html) .. [#test-suite] 'test' package (http://docs.python.org/lib/module-test.html) .. [#parrotbench] Parrotbench (ftp://ftp.python.org/pub/python/parrotbench/parrotbench.tgz, http://mail.python.org/pipermail/python-dev/2003-December/041527.html) .. [#opcode-list] Python Byte Code Instructions (http://docs.python.org/lib/bytecodes.html) .. [#skip-peephole] http://www.foretec.com/python/workshops/1998-11/proceedings/papers/montanaro... .. [#Bytecodehacks] http://bytecodehacks.sourceforge.net/bch-docs/bch/index.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End:
participants (5)
-
Brett C.
-
Brett C.
-
Scott David Daniels
-
Skip Montanaro
-
Terry Reedy