One Python 2.1 idea

Wed Dec 27 20:46:25 EST 2000

On Wed, 27 Dec 2000 02:56:07 GMT, Tim Peters <tim.one at home.com> wrote:
>[Neelakantan Krishnaswami <neelk at alum.mit.edu
>>
>>   http://starship.python.net/crew/vlad/archive/threaded_code/
>>
>> Bringing that up to date with 2.0 would be an interesting experiment.
>
> Note that at the 1998 Python Conference, John Aycock reported on an extreme
> technique that can eliminate *all* eval-loop overhead, including instruction
> decoding and dispatch costs; look for "Converting Python Virtual Machine
> Code to C" at
> 
>     http://www.foretec.com/python/workshops/1998-11/proceedings.html
> 
> The results were inconclusive (detecting a pattern here <wink>?),
> but it wasn't encouraging that pystone only got a 10% speed boost.
> That did and does seem anomalously small, but so long as "+" has to
> do pages of analysis at runtime, the time to *find* the BINARY_ADD
> code really isn't important.  Threading techniques grew up in Forth,
> where the implementation of a typical opcode consists of just a
> handful of machine instructions.

I've only ever written a Scheme VM; every opcode there is tiny too
except for function calls, which receive extreme optimization effort
(because everything is a function call in Scheme). Add some
represenatation finagling to make type-checking faster, and then
you've pretty much exhausted all the easy parts of optimization.

But IMO it's still a good idea to go for the easy-but-modest gains,
since five or six of them are just as potent as the one two-fold
super-sexy optimization. Has any work been done with Swallow since
last year? The types-sig has been dead, but turning global name
accesses into offsets like locals could be another easy 10-15%.

> god-forbid-anyone-profile-the-code-and-find-out-where-it's-really-
> spending-time<wink>-ly y'rs  - tim

I just did. The answer is SET_LINENO. 

I expected Vladimir's patch to yield perhaps a 10% increase in speed
in pure Python code. A 25% speedup in instruction dispatch (based on
Anton Ertl's threading speed measurements) times 50% time spent in
instruction dispatch (based on the technical report on the ZINC Caml
interpreter) = 12.5% speedup, rounded down to 10% due to general
pessissm.

Before I tried it, though, I took a look at the disassembled byte
codes and saw a bunch of SET_LINENOs, and out of curiousity tried
running Python with the -O option. The result was a 15% increase in
speed. Now I'm no longer in the mood to do any benchmarking of his
patch. :)

Neel