[Python-Dev] Bytecode analysis
damien morton
dmorton@bitfurnace.com
Wed, 26 Feb 2003 00:59:11 -0500
I implemented LOAD_FAST_n, STORE_FAST_n, LOAD_CONST_n for n < 16
Getting a small 2% improvement in speed
Going from about 21800 PyStones to 22300 PyStones; very hard to get
consistent readings on the PyStones - anyone got any tips on how to get
more consistent results under windows?
Getting a small 3% reduction in .pyc filesizes
os.path 24,929 unmodified
os.path 24,149 with modifications
I sort of cheated on the switch statement to avoid the use of a goto.
opcode = NEXTOP();
if (HAS_ARG(opcode))
oparg = NEXTARG();
...
switch (opcode) {
...
case LOAD_FAST_14:
case LOAD_FAST_15:
oparg = opcode - LOAD_FAST_0;
case LOAD_FAST:
x = GETLOCAL(oparg);
if (x != NULL) {
Py_INCREF(x);
...
I also altered the opcode.h file to use an enum for the opcodes instead
of all those #defines. Much easier to re-arrange things that way. I have
a feeling that most of the speedup (such that it is) comes from that
re-arrangment, which packs the opcodes into a contiguous numeric space.
I suspect that sorting the opcodes by frequency of access might also
have some positive effect. Also, organising the opcodes and the switch
statement so that frequently co-occuring opcodes are adjacent to each
other might also have some positive effect.
> -----Original Message-----
> From: guido@python.org [mailto:guido@python.org]
> Sent: Tuesday, 25 February 2003 20:25
> To: damien morton
> Cc: python-dev@python.org
> Subject: Re: [Python-Dev] Bytecode analysis
>
>
> > As you say, LOAD_FAST is a very frequently occuring
> instruction, both
> > statically and dynamically. Reducing it from a 3 byte
> instruction to a
> > 1 byte instruction in 97% of (static) cases should be an
> overall good.
> >
> > Most of the opcodes I proposed could be added without disturbing
> > locality of reference.
> >
> > e.g.
> >
> > switch (op = *p++) {
> > ...
> > case LOAD_FAST:
> > index = (*p++) + (*p++)<<8
> > goto LOAD_FAST_MAIN;
> > break;
> > case LOAD_FAST_0:
> > case LOAD_FAST_1:
> > case LOAD_FAST_15:
> > index = op - LOAD_FAST_0
> > LOAD_FAST_MAIN:
> > ...
> > break;
> >
> >
> > }
>
> Good idea. Can you benchmark this?
>
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
>