Do more at compile time; less at runtime
Hi all, The current CPython bytecode interpreter is rather more complex than it needs to be. A number of bytecodes could be eliminated and a few more simplified by moving the work involved in handling compound statements (loops, try-blocks, etc) from the interpreter to the compiler. This simplest example of this is the while loop... while cond: body This currently compiled as start: if not cond goto end body goto start end: but it could be compiled as goto test: start: body if cond goto start which eliminates one instruction per iteration. A more complex example is a return in a try-finally block. try: part1 if cond: return X part2 finally: part3 Currently, handling the return is complex and involves "pseudo exceptions", but if part3 were duplicated by the compiler, then the RETURN bytecode could just perform a simple return. The code above would be compiled thus... PUSH_BLOCK try part1 if not X goto endif push X POP_BLOCK part3 <<< duplicated RETURN_VALUE endif: part2 POP_BLOCK part3 <<< duplicated The changes I am proposing are: Allow negative line deltas in the lnotab array (bytecode deltas would remain non-negative) Remove the SETUP_LOOP, BREAK and CONTINUE bytecodes Simplify the RETURN bytecode Eliminate "pseudo exceptions" from the interpreter Simplify (or perhaps eliminate) SETUP_TRY, END_FINALLY, END_WITH. Reverse the sense of the FOR_ITER bytecode (ie. jump on not-exhausted) The net effect of these changes would be: Reduced code size and reduced code complexity. A small (1-5%)? increase in speed, due the simplification of the bytecodes and a very small change in the number of bytecodes executed. A small change in the static size of the bytecodes (-2% to +2%)? Although this is a quite intrusive change, I think it is worthwhile as it simplifies ceval.c considerably. The interpreter has become rather convoluted and any simplification has to be a good thing. I've already implemented negative line deltas and the transformed while loop: https://bitbucket.org/markshannon/cpython-lnotab-signed I'm currently working on the block unwinding. So, Good idea? Bad idea? Should I write a PEP or is the bug tracker sufficient? Cheers, Mark.
Sounds good to me. No PEP needed, just a tracker item, tests, review etc... --Guido van Rossum (sent from Android phone) On Dec 9, 2012 2:24 PM, "Mark Shannon" wrote:
Hi all,
The current CPython bytecode interpreter is rather more complex than it needs to be. A number of bytecodes could be eliminated and a few more simplified by moving the work involved in handling compound statements (loops, try-blocks, etc) from the interpreter to the compiler.
This simplest example of this is the while loop... while cond: body
This currently compiled as
start: if not cond goto end body goto start end:
but it could be compiled as
goto test: start: body if cond goto start
which eliminates one instruction per iteration.
A more complex example is a return in a try-finally block.
try: part1 if cond: return X part2 finally: part3
Currently, handling the return is complex and involves "pseudo exceptions", but if part3 were duplicated by the compiler, then the RETURN bytecode could just perform a simple return. The code above would be compiled thus...
PUSH_BLOCK try part1 if not X goto endif push X POP_BLOCK part3 <<< duplicated RETURN_VALUE endif: part2 POP_BLOCK part3 <<< duplicated
The changes I am proposing are:
Allow negative line deltas in the lnotab array (bytecode deltas would remain non-negative) Remove the SETUP_LOOP, BREAK and CONTINUE bytecodes Simplify the RETURN bytecode Eliminate "pseudo exceptions" from the interpreter Simplify (or perhaps eliminate) SETUP_TRY, END_FINALLY, END_WITH. Reverse the sense of the FOR_ITER bytecode (ie. jump on not-exhausted)
The net effect of these changes would be: Reduced code size and reduced code complexity. A small (1-5%)? increase in speed, due the simplification of the bytecodes and a very small change in the number of bytecodes executed. A small change in the static size of the bytecodes (-2% to +2%)?
Although this is a quite intrusive change, I think it is worthwhile as it simplifies ceval.c considerably. The interpreter has become rather convoluted and any simplification has to be a good thing.
I've already implemented negative line deltas and the transformed while loop: https://bitbucket.org/**markshannon/cpython-lnotab-**signedhttps://bitbucket.org/markshannon/cpython-lnotab-signed I'm currently working on the block unwinding.
So, Good idea? Bad idea? Should I write a PEP or is the bug tracker sufficient?
Cheers, Mark.
______________________________**_________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/**mailman/listinfo/python-devhttp://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** guido%40python.orghttp://mail.python.org/mailman/options/python-dev/guido%40python.org
Interesting idea, main challenge is to keep stepping through the code with pdb sane, and being clear on what differences in behaviour will be visible to the runtime execution hooks. -- Sent from my phone, thus the relative brevity :)
On 2012-12-09 22:22, Mark Shannon wrote:
Hi all,
The current CPython bytecode interpreter is rather more complex than it needs to be. A number of bytecodes could be eliminated and a few more simplified by moving the work involved in handling compound statements (loops, try-blocks, etc) from the interpreter to the compiler.
This simplest example of this is the while loop... while cond: body
This currently compiled as
start: if not cond goto end body goto start end:
but it could be compiled as
goto test: start: body if cond goto start
which eliminates one instruction per iteration.
A more complex example is a return in a try-finally block.
try: part1 if cond: return X part2 finally: part3
Currently, handling the return is complex and involves "pseudo exceptions", but if part3 were duplicated by the compiler, then the RETURN bytecode could just perform a simple return. The code above would be compiled thus...
PUSH_BLOCK try part1 if not X goto endif push X POP_BLOCK part3 <<< duplicated RETURN_VALUE endif: part2 POP_BLOCK part3 <<< duplicated
The changes I am proposing are:
[snip] Is it necessary to duplicate part3? Is it possible to put it into a subroutine if it's long? (And I do mean a simple cheap subroutine.)
Do know my registervm project? I try to emit better bytecode. See the implementation on http://hg.python.org/sandbox/registervm See for example the documentation: hg.python.org/sandbox/registervm/file/tip/REGISTERVM.txt http://hg.python.org/sandbox/registervm I implemented other optimisations. See also WPython project which uses also more efficient bytecode. http://code.google.com/p/wpython/ Victor Le 9 déc. 2012 23:23, "Mark Shannon" a écrit :
Hi all,
The current CPython bytecode interpreter is rather more complex than it needs to be. A number of bytecodes could be eliminated and a few more simplified by moving the work involved in handling compound statements (loops, try-blocks, etc) from the interpreter to the compiler.
This simplest example of this is the while loop... while cond: body
This currently compiled as
start: if not cond goto end body goto start end:
but it could be compiled as
goto test: start: body if cond goto start
which eliminates one instruction per iteration.
A more complex example is a return in a try-finally block.
try: part1 if cond: return X part2 finally: part3
Currently, handling the return is complex and involves "pseudo exceptions", but if part3 were duplicated by the compiler, then the RETURN bytecode could just perform a simple return. The code above would be compiled thus...
PUSH_BLOCK try part1 if not X goto endif push X POP_BLOCK part3 <<< duplicated RETURN_VALUE endif: part2 POP_BLOCK part3 <<< duplicated
The changes I am proposing are:
Allow negative line deltas in the lnotab array (bytecode deltas would remain non-negative) Remove the SETUP_LOOP, BREAK and CONTINUE bytecodes Simplify the RETURN bytecode Eliminate "pseudo exceptions" from the interpreter Simplify (or perhaps eliminate) SETUP_TRY, END_FINALLY, END_WITH. Reverse the sense of the FOR_ITER bytecode (ie. jump on not-exhausted)
The net effect of these changes would be: Reduced code size and reduced code complexity. A small (1-5%)? increase in speed, due the simplification of the bytecodes and a very small change in the number of bytecodes executed. A small change in the static size of the bytecodes (-2% to +2%)?
Although this is a quite intrusive change, I think it is worthwhile as it simplifies ceval.c considerably. The interpreter has become rather convoluted and any simplification has to be a good thing.
I've already implemented negative line deltas and the transformed while loop: https://bitbucket.org/**markshannon/cpython-lnotab-**signedhttps://bitbucket.org/markshannon/cpython-lnotab-signed I'm currently working on the block unwinding.
So, Good idea? Bad idea? Should I write a PEP or is the bug tracker sufficient?
Cheers, Mark.
______________________________**_________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/**mailman/listinfo/python-devhttp://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** victor.stinner%40gmail.comhttp://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com
On 12/9/2012 5:22 PM, Mark Shannon wrote:
The current CPython bytecode interpreter is rather more complex than it needs to be. A number of bytecodes could be eliminated and a few more simplified by moving the work involved in handling compound statements (loops, try-blocks, etc) from the interpreter to the compiler.
As with all suggestions to optimize the bytecode generation, I'd like to re-iterate the need for a way to disable all optimization, for the sake of reasoning about the program. For example, debugging, coverage measurement, etc. This idea was misunderstood and defeated in http://bugs.python.org/issue2506, but I strongly believe it is important. --Ned.
+1
On Dec 11, 2012 8:47 AM, "Ned Batchelder"
On 12/9/2012 5:22 PM, Mark Shannon wrote:
The current CPython bytecode interpreter is rather more complex than it needs to be. A number of bytecodes could be eliminated and a few more simplified by moving the work involved in handling compound statements (loops, try-blocks, etc) from the interpreter to the compiler.
As with all suggestions to optimize the bytecode generation, I'd like to re-iterate the need for a way to disable all optimization, for the sake of reasoning about the program. For example, debugging, coverage measurement, etc. This idea was misunderstood and defeated in http://bugs.python.org/* *issue2506 http://bugs.python.org/issue2506, but I strongly believe it is important.
--Ned. ______________________________**_________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/**mailman/listinfo/python-devhttp://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** guido%40python.orghttp://mail.python.org/mailman/options/python-dev/guido%40python.org
participants (6)
-
Guido van Rossum
-
Mark Shannon
-
MRAB
-
Ned Batchelder
-
Nick Coghlan
-
Victor Stinner