Do more at compile time; less at runtime
Hi all, The current CPython bytecode interpreter is rather more complex than it needs to be. A number of bytecodes could be eliminated and a few more simplified by moving the work involved in handling compound statements (loops, try-blocks, etc) from the interpreter to the compiler. This simplest example of this is the while loop... while cond: body This currently compiled as start: if not cond goto end body goto start end: but it could be compiled as goto test: start: body if cond goto start which eliminates one instruction per iteration. A more complex example is a return in a try-finally block. try: part1 if cond: return X part2 finally: part3 Currently, handling the return is complex and involves "pseudo exceptions", but if part3 were duplicated by the compiler, then the RETURN bytecode could just perform a simple return. The code above would be compiled thus... PUSH_BLOCK try part1 if not X goto endif push X POP_BLOCK part3 <<< duplicated RETURN_VALUE endif: part2 POP_BLOCK part3 <<< duplicated The changes I am proposing are: Allow negative line deltas in the lnotab array (bytecode deltas would remain non-negative) Remove the SETUP_LOOP, BREAK and CONTINUE bytecodes Simplify the RETURN bytecode Eliminate "pseudo exceptions" from the interpreter Simplify (or perhaps eliminate) SETUP_TRY, END_FINALLY, END_WITH. Reverse the sense of the FOR_ITER bytecode (ie. jump on not-exhausted) The net effect of these changes would be: Reduced code size and reduced code complexity. A small (1-5%)? increase in speed, due the simplification of the bytecodes and a very small change in the number of bytecodes executed. A small change in the static size of the bytecodes (-2% to +2%)? Although this is a quite intrusive change, I think it is worthwhile as it simplifies ceval.c considerably. The interpreter has become rather convoluted and any simplification has to be a good thing. I've already implemented negative line deltas and the transformed while loop: https://bitbucket.org/markshannon/cpython-lnotab-signed I'm currently working on the block unwinding. So, Good idea? Bad idea? Should I write a PEP or is the bug tracker sufficient? Cheers, Mark.
Sounds good to me. No PEP needed, just a tracker item, tests, review etc... --Guido van Rossum (sent from Android phone) On Dec 9, 2012 2:24 PM, "Mark Shannon" <mark@hotpy.org> wrote:
Hi all,
The current CPython bytecode interpreter is rather more complex than it needs to be. A number of bytecodes could be eliminated and a few more simplified by moving the work involved in handling compound statements (loops, try-blocks, etc) from the interpreter to the compiler.
This simplest example of this is the while loop... while cond: body
This currently compiled as
start: if not cond goto end body goto start end:
but it could be compiled as
goto test: start: body if cond goto start
which eliminates one instruction per iteration.
A more complex example is a return in a try-finally block.
try: part1 if cond: return X part2 finally: part3
Currently, handling the return is complex and involves "pseudo exceptions", but if part3 were duplicated by the compiler, then the RETURN bytecode could just perform a simple return. The code above would be compiled thus...
PUSH_BLOCK try part1 if not X goto endif push X POP_BLOCK part3 <<< duplicated RETURN_VALUE endif: part2 POP_BLOCK part3 <<< duplicated
The changes I am proposing are:
Allow negative line deltas in the lnotab array (bytecode deltas would remain non-negative) Remove the SETUP_LOOP, BREAK and CONTINUE bytecodes Simplify the RETURN bytecode Eliminate "pseudo exceptions" from the interpreter Simplify (or perhaps eliminate) SETUP_TRY, END_FINALLY, END_WITH. Reverse the sense of the FOR_ITER bytecode (ie. jump on not-exhausted)
The net effect of these changes would be: Reduced code size and reduced code complexity. A small (1-5%)? increase in speed, due the simplification of the bytecodes and a very small change in the number of bytecodes executed. A small change in the static size of the bytecodes (-2% to +2%)?
Although this is a quite intrusive change, I think it is worthwhile as it simplifies ceval.c considerably. The interpreter has become rather convoluted and any simplification has to be a good thing.
I've already implemented negative line deltas and the transformed while loop: https://bitbucket.org/**markshannon/cpython-lnotab-**signed<https://bitbucket.org/markshannon/cpython-lnotab-signed> I'm currently working on the block unwinding.
So, Good idea? Bad idea? Should I write a PEP or is the bug tracker sufficient?
Cheers, Mark.
______________________________**_________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/**mailman/listinfo/python-dev<http://mail.python.org/mailman/listinfo/python-dev> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** guido%40python.org<http://mail.python.org/mailman/options/python-dev/guido%40python.org>
Interesting idea, main challenge is to keep stepping through the code with pdb sane, and being clear on what differences in behaviour will be visible to the runtime execution hooks. -- Sent from my phone, thus the relative brevity :)
On 2012-12-09 22:22, Mark Shannon wrote:
Hi all,
The current CPython bytecode interpreter is rather more complex than it needs to be. A number of bytecodes could be eliminated and a few more simplified by moving the work involved in handling compound statements (loops, try-blocks, etc) from the interpreter to the compiler.
This simplest example of this is the while loop... while cond: body
This currently compiled as
start: if not cond goto end body goto start end:
but it could be compiled as
goto test: start: body if cond goto start
which eliminates one instruction per iteration.
A more complex example is a return in a try-finally block.
try: part1 if cond: return X part2 finally: part3
Currently, handling the return is complex and involves "pseudo exceptions", but if part3 were duplicated by the compiler, then the RETURN bytecode could just perform a simple return. The code above would be compiled thus...
PUSH_BLOCK try part1 if not X goto endif push X POP_BLOCK part3 <<< duplicated RETURN_VALUE endif: part2 POP_BLOCK part3 <<< duplicated
The changes I am proposing are:
[snip] Is it necessary to duplicate part3? Is it possible to put it into a subroutine if it's long? (And I do mean a simple cheap subroutine.)
Do know my registervm project? I try to emit better bytecode. See the implementation on http://hg.python.org/sandbox/registervm See for example the documentation: hg.python.org/sandbox/registervm/file/tip/REGISTERVM.txt <http://hg.python.org/sandbox/registervm> I implemented other optimisations. See also WPython project which uses also more efficient bytecode. http://code.google.com/p/wpython/ Victor Le 9 déc. 2012 23:23, "Mark Shannon" <mark@hotpy.org> a écrit :
Hi all,
The current CPython bytecode interpreter is rather more complex than it needs to be. A number of bytecodes could be eliminated and a few more simplified by moving the work involved in handling compound statements (loops, try-blocks, etc) from the interpreter to the compiler.
This simplest example of this is the while loop... while cond: body
This currently compiled as
start: if not cond goto end body goto start end:
but it could be compiled as
goto test: start: body if cond goto start
which eliminates one instruction per iteration.
A more complex example is a return in a try-finally block.
try: part1 if cond: return X part2 finally: part3
Currently, handling the return is complex and involves "pseudo exceptions", but if part3 were duplicated by the compiler, then the RETURN bytecode could just perform a simple return. The code above would be compiled thus...
PUSH_BLOCK try part1 if not X goto endif push X POP_BLOCK part3 <<< duplicated RETURN_VALUE endif: part2 POP_BLOCK part3 <<< duplicated
The changes I am proposing are:
Allow negative line deltas in the lnotab array (bytecode deltas would remain non-negative) Remove the SETUP_LOOP, BREAK and CONTINUE bytecodes Simplify the RETURN bytecode Eliminate "pseudo exceptions" from the interpreter Simplify (or perhaps eliminate) SETUP_TRY, END_FINALLY, END_WITH. Reverse the sense of the FOR_ITER bytecode (ie. jump on not-exhausted)
The net effect of these changes would be: Reduced code size and reduced code complexity. A small (1-5%)? increase in speed, due the simplification of the bytecodes and a very small change in the number of bytecodes executed. A small change in the static size of the bytecodes (-2% to +2%)?
Although this is a quite intrusive change, I think it is worthwhile as it simplifies ceval.c considerably. The interpreter has become rather convoluted and any simplification has to be a good thing.
I've already implemented negative line deltas and the transformed while loop: https://bitbucket.org/**markshannon/cpython-lnotab-**signed<https://bitbucket.org/markshannon/cpython-lnotab-signed> I'm currently working on the block unwinding.
So, Good idea? Bad idea? Should I write a PEP or is the bug tracker sufficient?
Cheers, Mark.
______________________________**_________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/**mailman/listinfo/python-dev<http://mail.python.org/mailman/listinfo/python-dev> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** victor.stinner%40gmail.com<http://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com>
On 12/9/2012 5:22 PM, Mark Shannon wrote:
The current CPython bytecode interpreter is rather more complex than it needs to be. A number of bytecodes could be eliminated and a few more simplified by moving the work involved in handling compound statements (loops, try-blocks, etc) from the interpreter to the compiler.
As with all suggestions to optimize the bytecode generation, I'd like to re-iterate the need for a way to disable all optimization, for the sake of reasoning about the program. For example, debugging, coverage measurement, etc. This idea was misunderstood and defeated in http://bugs.python.org/issue2506, but I strongly believe it is important. --Ned.
+1 On Dec 11, 2012 8:47 AM, "Ned Batchelder" <ned@nedbatchelder.com> wrote:
On 12/9/2012 5:22 PM, Mark Shannon wrote:
The current CPython bytecode interpreter is rather more complex than it needs to be. A number of bytecodes could be eliminated and a few more simplified by moving the work involved in handling compound statements (loops, try-blocks, etc) from the interpreter to the compiler.
As with all suggestions to optimize the bytecode generation, I'd like to re-iterate the need for a way to disable all optimization, for the sake of reasoning about the program. For example, debugging, coverage measurement, etc. This idea was misunderstood and defeated in http://bugs.python.org/* *issue2506 <http://bugs.python.org/issue2506>, but I strongly believe it is important.
--Ned. ______________________________**_________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/**mailman/listinfo/python-dev<http://mail.python.org/mailman/listinfo/python-dev> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** guido%40python.org<http://mail.python.org/mailman/options/python-dev/guido%40python.org>
participants (6)
-
Guido van Rossum
-
Mark Shannon
-
MRAB
-
Ned Batchelder
-
Nick Coghlan
-
Victor Stinner