While recently goofing around with the bytecode, I thought of doing something like this: case LOAD_CONST: x = GETITEM(consts, oparg); Py_INCREF(x); + if (*next_instr == RETURN_VAL) { + retval = x; + why = WHY_RETURN; + goto fast_block_end; + } PUSH(x); goto fast_next_opcode; This would skip the stack and a trip through the loop without changing the parser or the bytecode, and with a minimal amount of added code or overhead. This could (of course) be applied to other opcodes, too. Perhaps instead of littering the function with that block, a macro "PUSH_MAYBE_RET" could be added that would replace the final PUSH in the opcode's case block: #define PUSH_MAYBE_RET { if (*next_instr == RETURN_VAL) { \ retval = x; \ why = WHY_RETURN; \ goto fast_block_end; } \ PUSH(x); } Not sure how much this would help speed, if any.