
Separating out LOAD_FAST from the switch shows a nice effect. SET_LINENO is removed by -OO anyway, so there's really no use in optimizing this one.
I tried this and found about three percent speed increase on pystone, for what that's worth. This is with python -OO on Linux x86. Note that removing the (now redundant) case from the switch seemed to make a small difference too. Alas, I have no time to play with optimizing the main loop in a more rigorous way... :-( Here's the patch I came up with: Index: ceval.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Python/ceval.c,v retrieving revision 2.187 diff -c -r2.187 ceval.c *** ceval.c 2000/07/25 12:56:38 2.187 --- ceval.c 2000/07/30 16:13:23 *************** *** 608,616 **** f->f_lasti = INSTR_OFFSET(); #endif opcode = NEXTOP(); ! if (HAS_ARG(opcode)) oparg = NEXTARG(); #ifdef DYNAMIC_EXECUTION_PROFILE #ifdef DXPAIRS dxpairs[lastopcode][opcode]++; --- 608,631 ---- f->f_lasti = INSTR_OFFSET(); #endif + get_opcode: opcode = NEXTOP(); ! if (HAS_ARG(opcode)) { oparg = NEXTARG(); + if (opcode == LOAD_FAST) { + x = GETLOCAL(oparg); + if (x != NULL) { + Py_INCREF(x); + PUSH(x); + goto get_opcode; + } + PyErr_SetObject(PyExc_UnboundLocalError, + PyTuple_GetItem(co->co_varnames, + oparg)); + goto on_error; + } + } + #ifdef DYNAMIC_EXECUTION_PROFILE #ifdef DXPAIRS dxpairs[lastopcode][opcode]++; *************** *** 1282,1300 **** } Py_INCREF(x); PUSH(x); - break; - - case LOAD_FAST: - x = GETLOCAL(oparg); - if (x == NULL) { - PyErr_SetObject(PyExc_UnboundLocalError, - PyTuple_GetItem(co->co_varnames, - oparg)); - break; - } - Py_INCREF(x); - PUSH(x); - if (x != NULL) continue; break; case STORE_FAST: --- 1297,1302 ----