[Python-Dev] Speeding up CPython 5-10%

Damien George damien.p.george at gmail.com
Fri Jan 29 07:38:53 EST 2016


Hi Yury,

> An off-topic: have you ever tried hg.python.org/benchmarks
> or compare MicroPython vs CPython?  I'm curious if MicroPython
> is faster -- in that case we'll try to copy some optimization
> ideas.

I've tried a small number of those benchmarks, but not in any rigorous
way, and not enough to compare properly with CPython.  Maybe one day I
(or someone) will get to it and report results :)

One thing that makes MP fast is the use of pointer tagging and
stuffing of small integers within object pointers.  Thus integer
arithmetic below 2**30 (on 32-bit arch) requires no heap.

> Do you use opcode dictionary caching only for LOAD_GLOBAL-like
> opcodes?  Do you have an equivalent of LOAD_FAST, or you use
> dicts to store local variables?

The opcodes that have dict caching are:

LOAD_NAME
LOAD_GLOBAL
LOAD_ATTR
STORE_ATTR
LOAD_METHOD (not implemented yet in mainline repo)

For local variables we use LOAD_FAST and STORE_FAST (and DELETE_FAST).
Actually, there are 16 dedicated opcodes for loading from positions
0-15, and 16 for storing to these positions.  Eg:

LOAD_FAST_0
LOAD_FAST_1
...

Mostly this is done to save RAM, since LOAD_FAST_0 is 1 byte.

> If we change the opcode size, it will probably affect libraries
> that compose or modify code objects.  Modules like "dis" will
> also need to be updated.  And that's probably just a tip of the
> iceberg.
>
> We can still implement your approach if we add a separate
> private 'unsigned char' array to each code object, so that
> LOAD_GLOBAL can store the key offsets.  It should be a bit
> faster than my current patch, since it has one less level
> of indirection.  But this way we loose the ability to
> optimize LOAD_METHOD, simply because it requires more memory
> for its cache.  In any case, I'll experiment!

Problem with that approach (having a separate array for offset_guess)
is that how do you know where to look into that array for a given
LOAD_GLOBAL opcode?  The second LOAD_GLOBAL in your bytecode should
look into the second entry in the array, but how does it know?

I'd love to experiment implementing my original caching idea with
CPython, but no time!

Cheers,
Damien.


More information about the Python-Dev mailing list