[Python-Dev] Micro-optimizations by adding special-case bytecodes?
Erik
python at lucidity.plus.com
Wed May 24 16:14:18 EDT 2017
Hi Ben,
On 24/05/17 19:07, Ben Hoyt wrote:
> I'm not proposing to do this yet, as I'd need to benchmark to see how
> much of a gain (if any) it would amount to, but I'm just wondering if
> there's any previous work on this kind of thing. Or, if not, any other
> thoughts before I try it?
This is exactly what I looked into just over a year ago. As Stephane
suggests, I did this by adding new opcodes that the peephole optimizer
generated and the interpreter loop understood (but the compiler itself
did not need to know anything about these new opcodes, so that makes
things much easier).
Adding new opcodes like this at the time wasn't straightforward because
of issues with the build process (see this thread:
https://mail.python.org/pipermail/python-dev/2015-December/142600.html -
it starts out as a question about the bytecode format but ended up with
some very useful information on the build process).
Note that since that thread, a couple of things have changed - the
bytecode is now wordcode so some of my original questions aren't
relevant, and some of the things I had a problem with in the build
system are now auto-generated with a new 'make' target. So it _should_
be easier now than it was then.
In terms of the results I got once I had things building and running, I
didn't manage to find any particular magic bullets that gave me a
significant enough speedup. Perhaps I just didn't pick the right opcode
sequences or the right test cases (though what I was trying to do was
quite successful in terms of doing things like replacing
branches-to-RETURN into a single RETURN - so LOAD_CONST/RETURN_VALUE
became RETURN_CONST and therefore if the target of an unconditional
branch was to a RETURN_CONST op, the branch op could be replaced by the
RETURN_CONST).
I figured that one thing every function or method needs to do is return,
so I tried to make that more efficient. I only had two weeks to spend on
it though ...
I was trying to do that by avoiding trips-around-the-interpreter-loop as
that was historically something that would give speedups. However, with
the new computed-goto version of the interpreter I came to the
conclusion that it's not at important as it used to be. I was building
with gcc though and what I *didn't* do was disable the computed-goto
code (it's controlled by a #define) to see if my changes improved
performance on platforms that can't use it.
I have other opcode sequences that I identified that might be useful to
look at further.
I didn't (and still don't) have enough bandwidth to *drive* something
like this through though, but if you want to do that I'd be more than
happy to be kept in the loop on what you're doing and can possibly find
time to write some code too.
Regards, E.
More information about the Python-Dev
mailing list