[Python-Dev] Python 3 optimizations continued...
Stefan Behnel
stefan_ml at behnel.de
Fri Sep 2 11:13:19 CEST 2011
stefan brunthaler, 02.09.2011 06:37:
> as promised, I created a publicly available preview of an
> implementation with my optimizations, which is available under the
> following location:
> https://bitbucket.org/py3_pio/preview/wiki/Home
>
> I followed Nick's advice and added some valuable advice and
> overview/introduction at the wiki page the link points to, I am
> positive that spending 10mins reading this will provide you with a
> valuable information regarding what's happening.
It does, thanks.
A couple of remarks:
1) The SFC optimisation is purely based on static code analysis, right? I
assume it takes loops into account (and just multiplies scores for inner
loops)? Is that what you mean with "nesting level"? Obviously, static
analysis can sometimes be misleading, e.g. when there's a rare special case
with lots of loops that needs to adapt input data in some way, but in
general, I'd expect that this heuristic would tend to hit the important
cases, especially for well structured code with short functions.
2) The RC elimination is tricky to get right and thus somewhat dangerous,
but sounds worthwhile and should work particularly well on a stack based
byte code interpreter like CPython.
3) Inline caching also sounds worthwhile, although I wonder how large the
savings will be here. You'd save a couple of indirect jumps at the C-API
level, sure, but apart from that, my guess is that it would highly depend
on the type of instruction. Certain (repeated) calls to C implemented
functions would likely benefit quite a bit, for example, which would be a
nice optimisation by itself, e.g. for builtins. I would expect that the
same applies to iterators, even a couple of percent faster iteration can
make a great deal of a difference, and a substantial set of iterators are
implemented in C, e.g. itertools, range, zip and friends.
I'm not so sure about arithmetic operations. In Cython, we (currently?) do
not optimistically replace these with more specific code (unless we know
the types at compile time), because it complicates the generated C code and
indirect jumps aren't all that slow that the benefit would be important.
Savings are *much* higher when data can be unboxed, so much that the slight
improvement for optimistic type guesses is totally dwarfed in Cython. I
would expect that the return of investment is better when the types are
actually known at runtime, as in your case.
4) Regarding inlined object references, I would expect that it's much more
worthwhile to speed up LOAD_GLOBAL and LOAD_NAME than LOAD_CONST. I guess
that this would be best helped by watching the module dict and the builtin
dict internally and invalidating the interpreter state after changes (e.g.
by providing a change counter in those dicts and checking that in the
instructions that access them), and otherwise keeping the objects cached.
Simply watching the dedicated instructions that change that state isn't
enough as Python allows code to change these dicts directly through their
dict interface.
All in all, your list does sound like an interesting set of changes that
are both understandable and worthwhile.
Stefan
More information about the Python-Dev
mailing list