Register based interpreter

Hi, Has there been any discussion or effort on moving from a stack based interpreter to a more register based one? (my limited search in the archives did not yield me any results) regards, -V-

On Fri, Feb 20, 2009 at 10:07 AM, Venkatraman S <venkat83@gmail.com> wrote:
It's possible PyPy will investigate that, and the Python implementation on Parrot (which was stagnant last I checked) would obviously be register-based, but as for CPython, I don't think it's going to happen (at least not any time soon). The bytecode is stack-oriented through-and-through; it'd be quite a major change to become register-based. Cheers, Chris -- Follow the path of the Iguana... http://rebertia.com

On Feb 20, 2009, at 3:15 PM, Chris Rebert wrote:
Antonio Cuni made some experiments on PyPy about this, If you ask at the pypy-dev mailing list or on irc (#pypy on freenode.net) he or others can explain what happened. If I remember correctly there weren't any significant improvements in performance as dispatch and memory copies is not the problem on python, the bytecodes are very complex. []'s -- Leonardo Santagada santagada at gmail.com

Leonardo Santagada <santagada@...> writes:
If bytecode dispatch were not a problem, I wonder how enabling computed gotos on the py3k branch could yield up to a 15% overall speedup on pybench :-) The biggest complication I can think of with a register-based VM is that you have to decref objects as soon as they aren't used anymore, which means you have to track the actual lifetime of registers (while it's done automatically with a stack-based design). I don't know how much it could slow down an implementation (perhaps not at all, if a clever implementation is devised...). Regards Antoine.

On Fri, Feb 20, 2009 at 11:48 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
FYI, some relevant reading on converting a stack-based JVM to use register-based bytecode: http://www.usenix.org/events/vee05/full_papers/p153-yunhe.pdf Collin Winter

Antoine Pitrou wrote:
If bytecode dispatch were not a problem, I wonder how enabling computed gotos on the py3k branch could yield up to a 15% overall speedup on pybench :-)
Perhaps there is room for a hybrid approach where you take sequences of instructions such as LOAD_LOCAL x LOAD_LOCAL y BINARY_ADD STORE_GLOBAL z and turn them into 3-operand macroinstructions TRINARY_ADD LOCAL(x), LOCAL(y), GLOBAL(z) This would actually do all the same pushing and popping as before (at least conceptually), but it would reduce the number of bytecodes being executed, and therefore the number of unpredictable branches, from 4 to 1. So it would be kind of like a register-based instruction set, except that there aren't any registers, so there wouldn't be any problem with managing refcounts. (BTW, calling them "bytecodes" might become something of a misnomer -- they're going to be more like "longcodes".:-) -- Greg

Greg Ewing <greg.ewing@...> writes:
I was thinking of something like that, although in a simpler form of 2-operand macroinstructions: BINARY_ADD 5, 8 where 5 and 8 are indices into a "super array" which would be equal to: [local variables] + [code object constants] (fast random access to the array would imply copying the constants table each time a new frame is created for the given code object, but it would hopefully remain quite cheap) We can keep the current instruction format and address up to 256 local variables and constants by splitting the 16-bit operand in two bytes. The result would be stored on top of stack. (if we want to access the top of the stack using the macroinstructions, we could reserve 255 as a special index value for popping the top of the stack...) Regards Antoine.

On Sat, Feb 21, 2009 at 1:18 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Looks like there was no attempt on that front.
I hope to take this as 'fun' project and try it out. Not sure how far i will reach. -V- http://twitter.com/venkat83

On Mon, Feb 23, 2009 at 2:44 AM, Venkatraman S <venkat83@gmail.com> wrote:
I think this would be interesting to attempt. Lua switched to a stack-based VM to a register-based VM for Lua 5.0. http://www.tecgraf.puc-rio.br/~lhf/ftp/doc/sblp2005.pdf includes some benchmark numbers. Collin Winter

Chris Rebert wrote:
Also, I have a hard time believing it would make much difference. Python bytecodes are mostly quite large units of functionality, so the time taken manipulating the stack is not likely to be a significant component. This is different from something like Java where you have bytecodes that directly manipulate low-level things such as integers. -- Greg

On Fri, Feb 20, 2009 at 10:07 AM, Venkatraman S <venkat83@gmail.com> wrote:
It's possible PyPy will investigate that, and the Python implementation on Parrot (which was stagnant last I checked) would obviously be register-based, but as for CPython, I don't think it's going to happen (at least not any time soon). The bytecode is stack-oriented through-and-through; it'd be quite a major change to become register-based. Cheers, Chris -- Follow the path of the Iguana... http://rebertia.com

On Feb 20, 2009, at 3:15 PM, Chris Rebert wrote:
Antonio Cuni made some experiments on PyPy about this, If you ask at the pypy-dev mailing list or on irc (#pypy on freenode.net) he or others can explain what happened. If I remember correctly there weren't any significant improvements in performance as dispatch and memory copies is not the problem on python, the bytecodes are very complex. []'s -- Leonardo Santagada santagada at gmail.com

Leonardo Santagada <santagada@...> writes:
If bytecode dispatch were not a problem, I wonder how enabling computed gotos on the py3k branch could yield up to a 15% overall speedup on pybench :-) The biggest complication I can think of with a register-based VM is that you have to decref objects as soon as they aren't used anymore, which means you have to track the actual lifetime of registers (while it's done automatically with a stack-based design). I don't know how much it could slow down an implementation (perhaps not at all, if a clever implementation is devised...). Regards Antoine.

On Fri, Feb 20, 2009 at 11:48 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
FYI, some relevant reading on converting a stack-based JVM to use register-based bytecode: http://www.usenix.org/events/vee05/full_papers/p153-yunhe.pdf Collin Winter

Antoine Pitrou wrote:
If bytecode dispatch were not a problem, I wonder how enabling computed gotos on the py3k branch could yield up to a 15% overall speedup on pybench :-)
Perhaps there is room for a hybrid approach where you take sequences of instructions such as LOAD_LOCAL x LOAD_LOCAL y BINARY_ADD STORE_GLOBAL z and turn them into 3-operand macroinstructions TRINARY_ADD LOCAL(x), LOCAL(y), GLOBAL(z) This would actually do all the same pushing and popping as before (at least conceptually), but it would reduce the number of bytecodes being executed, and therefore the number of unpredictable branches, from 4 to 1. So it would be kind of like a register-based instruction set, except that there aren't any registers, so there wouldn't be any problem with managing refcounts. (BTW, calling them "bytecodes" might become something of a misnomer -- they're going to be more like "longcodes".:-) -- Greg

Greg Ewing <greg.ewing@...> writes:
I was thinking of something like that, although in a simpler form of 2-operand macroinstructions: BINARY_ADD 5, 8 where 5 and 8 are indices into a "super array" which would be equal to: [local variables] + [code object constants] (fast random access to the array would imply copying the constants table each time a new frame is created for the given code object, but it would hopefully remain quite cheap) We can keep the current instruction format and address up to 256 local variables and constants by splitting the 16-bit operand in two bytes. The result would be stored on top of stack. (if we want to access the top of the stack using the macroinstructions, we could reserve 255 as a special index value for popping the top of the stack...) Regards Antoine.

On Sat, Feb 21, 2009 at 1:18 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Looks like there was no attempt on that front.
I hope to take this as 'fun' project and try it out. Not sure how far i will reach. -V- http://twitter.com/venkat83

On Mon, Feb 23, 2009 at 2:44 AM, Venkatraman S <venkat83@gmail.com> wrote:
I think this would be interesting to attempt. Lua switched to a stack-based VM to a register-based VM for Lua 5.0. http://www.tecgraf.puc-rio.br/~lhf/ftp/doc/sblp2005.pdf includes some benchmark numbers. Collin Winter

Chris Rebert wrote:
Also, I have a hard time believing it would make much difference. Python bytecodes are mostly quite large units of functionality, so the time taken manipulating the stack is not likely to be a significant component. This is different from something like Java where you have bytecodes that directly manipulate low-level things such as integers. -- Greg
participants (8)
-
Antoine Pitrou
-
Brett Cannon
-
Chris Rebert
-
Collin Winter
-
Greg Ewing
-
Leonardo Santagada
-
Terry Reedy
-
Venkatraman S