[Python-Dev] Re: new bytecode results

Damien Morton newsgroups1@bitfurnace.com
Sun, 2 Mar 2003 20:55:57 -0500


I optimised the layout of the python opcodes using a simulated annealing
process that scored adjacent opcodes according to their frequency of
co-occurence.

This raised my PyStone benchmark from 22100 to 22700, for a 3% gain.

Ive been using Skip's DXP server to gather statistics, but there isnt much
data there. I should be able to achieve better results if more people
contributed stats to his server, more information about which can be found
here:
http://manatee.mojam.com/~skip/python/

The process of layout the opcodes and switch cases has largely been
automated, and generating new layouts is relatively painless and quick. Do
please contribute stats for 2.3a2 to Skip's DXP server.

I also implemented a LOAD_FASTER opcode, with the argument encoded into the
opcode.

This raised my PyStone benchmark from 22700 to 23150, for a total 5% gain.

The main switch loop looks like this now:

if (opcode >= LOAD_FASTER) {
  load_fast(opcode - LOAD_FASTER);
  ...
  goto fast_next_opcode;
  }
switch(opcode) {
  case LOAD_ATTR:
    oparg = NEXTARG();
    w = GETITEM(names, oparg);
    ...
    break;
  ...
}

Each opcode case now loads its own argument as necessary. The test for
HAVE_ARGUMENT is now implemented using an array of bytes. The test now
happens very infrequently, so any performance loss is negligible.

const char HASARG[] = {
  0 , /* STOP_CODE */
  1 , /* LOAD_ATTR */
  1 , /* CALL_FUNCTION */
  1 , /* STORE_FAST */
  0 , /* BINARY_ADD */
  0 , /* SLICE+0 */
  0 , /* SLICE+1 */
  0 , /* SLICE+2 */
...
}