Python and the need for speed
python at lucidity.plus.com
Tue Apr 18 18:33:04 EDT 2017
On 18/04/17 11:30, bartc wrote:
> On 18/04/2017 10:32, Erik wrote:
>> improvements over the original huge switch() to dispatch the bytecodes
>> to the correct handler appear to have made this type of optimization
>> less effective.
> What did they do to it, and on which version?
It's the computed 'goto' stuff that I'm referring to. At the time, I was
looking at 2.7.9 (which just has the big switch) and 3.5.0 (which has
the computed gotos). I can't be any more specific than that without
looking at commit histories - you can do that if you want ;).
I am running on Linux using GCC, so I am using the computed-goto version
(I hadn't realised at the time that it was optional - so perhaps my
changes _would_ still have shown an improvement on builds that still use
the big switch ... I'll have to look again when I have time).
The opcodes have changed from bytecode to wordcode since then, so my
changes would need looking at again anyway.
> (I couldn't get anywhere with CPython because:
It obviously builds on Windows (isn't there a MSVC project?), but FWIW,
on Linux things do get a bit hairy when you want to introduce new
opcodes, for example (IIRC, the build process itself uses a version of
Python that it builds to generate some "frozen" .py files for the final
build and that can cause things to get out of step where the
build-process executable doesn't understand the new opcodes - I forget
the details, but it's something like that. I think the answer was to
keep "clean"ing the build area and building from scratch each time the
opcode map changes).
Perhaps the Windows build process requires that things are stable and
doesn't support this type of development at all. You should take this
part of the discussion to python-dev if you want to be able to build and
experiment with it on Windows.
> If that had worked, then further optimisations are possible, such as
> doing a pre-pass combining common operations, that would not be
> worthwhile using 'official' byte-codes.)
That is effectively what my experiments were doing - sort of -, but
without introducing an assembler layer. I'm not convinced about that -
given the right hints, a half-decent C compiler these days will produce
pretty good code.
More information about the Python-list