METH_FASTCALL passing arguments on the stack doesn't necessarily mean it will be slow. In x86 there are calling conventions that read all the arguments from the stack, but the rest of the machine is register based. Python could also look at ABI calling conventions for inspiration, like x86-64 where some arguments up to a fixed amount are passed on the stack and the rest are passed on the stack.

One thing that I am wondering is would Python want to use a global set of registers and a global data stack, or continue to have a new data stack (and now registers) per call stack. If Python switched to a global stack and global registers we may be able to eliminate a lot of instructions that just shuffle data from the caller's stack to the callee's stack.

On Tue, Feb 26, 2019 at 4:55 PM Victor Stinner <> wrote:
Hum, I read again my old REGISTERVM.txt that I wrote a few years ago.

A little bit more context. In my "registervm" fork I also tried to
implement further optimizations like moving invariants out of the
loop. Some optimizations could change the Python semantics, like
remove "duplicated" LOAD_GLOBAL whereas the global might be modified
in the middle. I wanted to experiment such optimizations. Maybe it was
a bad idea to convert stack-based bytecode to register-based bytecode
and experiment these optimizations at the same time.


Le mar. 26 févr. 2019 à 22:42, Victor Stinner <> a écrit :
> No, I wasn't aware of this project. My starting point was:
> Yunhe Shi, David Gregg, Andrew Beatty, M. Anton Ertl, 2005
> See also my email to python-dev that I sent in 2012:
> Ah, my main issue was my implementation is that I started without
> taking care of clearing registers when the stack-based bytecode
> implicitly cleared a reference (decref), like "POP_TOP" operation.
> I added "CLEAR_REG" late in the development and it caused me troubles,
> and the "correct" register-based bytecode was less efficient than
> bytecode without CLEAR_REG. But my optimizer was very limited, too
> limited.
> Another implementation issue that I had was to understand some
> "implicit usage" of the stack like try/except which do black magic,
> whereas I wanted to make everything explicit for registers. I'm
> talking about things like "POP_BLOCK" and "SETUP_EXCEPT". In my
> implementation, I kept support for stack-based bytecode, and so I had
> some inefficient code and some corner cases.
> My approach was to convert stack-based bytecode to register-based
> bytecode on the fly. Having both in the same code allowed to me run
> some benchmarks. Maybe it wasn't the best approach, but I didn't feel
> able to write a real compiler (AST => bytecode).
> Victor
> Le mar. 26 févr. 2019 à 21:58, Neil Schemenauer <> a écrit :
> >
> > On 2019-02-26, Victor Stinner wrote:
> > > I made an attempt once and it was faster:
> > >
> >
> > Interesting.  I don't think I have seen that before.  Were you aware
> > of "Rattlesnake" before you started on that?  It seems your approach
> > is similar.  Probably not because I don't think it is easy to find.
> > I uploaded a tarfile I had on my PC to my web site:
> >
> >
> >
> > It seems his name doesn't appear in the readme or source but I think
> > Rattlesnake was Skip Montanaro's project.  I suppose my idea of
> > unifying the local variables and the registers could have came from
> > Rattlesnake.  Very little new in the world. ;-P
> >
> > Cheers,
> >
> >   Neil
> --
> Night gathers, and now my watch begins. It shall not end until my death.

Night gathers, and now my watch begins. It shall not end until my death.
Python-Dev mailing list