Python and the need for speed

Erik python at lucidity.plus.com
Tue Apr 18 18:33:04 EDT 2017


On 18/04/17 11:30, bartc wrote:
> On 18/04/2017 10:32, Erik wrote:
>
>> the
>> improvements over the original huge switch() to dispatch the bytecodes
>> to the correct handler appear to have made this type of optimization
>> less effective.
>
> What did they do to it, and on which version?

It's the computed 'goto' stuff that I'm referring to. At the time, I was 
looking at 2.7.9 (which just has the big switch) and 3.5.0 (which has 
the computed gotos). I can't be any more specific than that without 
looking at commit histories - you can do that if you want ;).

I am running on Linux using GCC, so I am using the computed-goto version 
(I hadn't realised at the time that it was optional - so perhaps my 
changes _would_ still have shown an improvement on builds that still use 
the big switch ... I'll have to look again when I have time).

The opcodes have changed from bytecode to wordcode since then, so my 
changes would need looking at again anyway.

> (I couldn't get anywhere with CPython because:

[snip]

It obviously builds on Windows (isn't there a MSVC project?), but FWIW, 
on Linux things do get a bit hairy when you want to introduce new 
opcodes, for example (IIRC, the build process itself uses a version of 
Python that it builds to generate some "frozen" .py files for the final 
build and that can cause things to get out of step where the 
build-process executable doesn't understand the new opcodes - I forget 
the details, but it's something like that. I think the answer was to 
keep "clean"ing the build area and building from scratch each time the 
opcode map changes).

Perhaps the Windows build process requires that things are stable and 
doesn't support this type of development at all. You should take this 
part of the discussion to python-dev if you want to be able to build and 
experiment with it on Windows.

> If that had worked, then further optimisations are possible, such as
> doing a pre-pass combining common operations, that would not be
> worthwhile using 'official' byte-codes.)

That is effectively what my experiments were doing - sort of -, but 
without introducing an assembler layer. I'm not convinced about that - 
given the right hints, a half-decent C compiler these days will produce 
pretty good code.

E.


More information about the Python-list mailing list