[Eric S. Raymond]
... (and, incidentally, argued that the bytecode should emulate a stack rather than a register machine because the cost/speed disparities that justify register architectures in hardware don't exist in a software VM).
Don't get too married to that! My bet is that if anyone had time for it, we'd switch the Python VM today to a register model; Skip Montanaro's Rattlesnake project was aiming at that, but fizzled out due to lack of time. The per-opcode fetch-decode-dispatch overhead is very high in SW too, so a register VM can win simply by cutting the number of opcodes needed to accomplish a given bit of useful work. Indeed, eliding SET_LINENO opcodes is the primary reason Python -O runs faster, yet all it saves is one trip around the eval loop per source-code line (the *body* of SET_LINENO is just a test, branch, and store -- it's trivial compared to the overhead of getting to it). Variants of forth-like threading are alternatives to both.