
I implemented LOAD_FAST_n, STORE_FAST_n, LOAD_CONST_n for n < 16 Getting a small 2% improvement in speed Going from about 21800 PyStones to 22300 PyStones; very hard to get consistent readings on the PyStones - anyone got any tips on how to get more consistent results under windows? Getting a small 3% reduction in .pyc filesizes os.path 24,929 unmodified os.path 24,149 with modifications I sort of cheated on the switch statement to avoid the use of a goto. opcode = NEXTOP(); if (HAS_ARG(opcode)) oparg = NEXTARG(); ... switch (opcode) { ... case LOAD_FAST_14: case LOAD_FAST_15: oparg = opcode - LOAD_FAST_0; case LOAD_FAST: x = GETLOCAL(oparg); if (x != NULL) { Py_INCREF(x); ... I also altered the opcode.h file to use an enum for the opcodes instead of all those #defines. Much easier to re-arrange things that way. I have a feeling that most of the speedup (such that it is) comes from that re-arrangment, which packs the opcodes into a contiguous numeric space. I suspect that sorting the opcodes by frequency of access might also have some positive effect. Also, organising the opcodes and the switch statement so that frequently co-occuring opcodes are adjacent to each other might also have some positive effect.
-----Original Message----- From: guido@python.org [mailto:guido@python.org] Sent: Tuesday, 25 February 2003 20:25 To: damien morton Cc: python-dev@python.org Subject: Re: [Python-Dev] Bytecode analysis
As you say, LOAD_FAST is a very frequently occuring instruction, both statically and dynamically. Reducing it from a 3 byte instruction to a 1 byte instruction in 97% of (static) cases should be an overall good.
Most of the opcodes I proposed could be added without disturbing locality of reference.
e.g.
switch (op = *p++) { ... case LOAD_FAST: index = (*p++) + (*p++)<<8 goto LOAD_FAST_MAIN; break; case LOAD_FAST_0: case LOAD_FAST_1: case LOAD_FAST_15: index = op - LOAD_FAST_0 LOAD_FAST_MAIN: ... break;
}
Good idea. Can you benchmark this?
--Guido van Rossum (home page: http://www.python.org/~guido/)