[Python-Dev] Rattlesnake progress

Daniel Berlin dan@dberlin.org
Tue, 19 Feb 2002 11:37:22 -0500


On Tuesday, February 19, 2002, at 11:01  AM, Kevin Jacobs wrote:

> On Tue, 19 Feb 2002, Daniel Berlin wrote:
>> On Tuesday, February 19, 2002, at 09:51  AM, Neil Schemenauer wrote:
>>
>>> Daniel Berlin wrote:
>>>> When you get to optimizations, you want Advanced Compiler Design and
>>>> Implementation by Muchnick.
>>>
>>> Right now I'm not planning to do any optimizations (except perhaps
>>> limiting the number of registers used).
>>>
>> This is, of course, a tricky optimization to do.
>> Limiting registers used involves splitting live ranges at the right
>> places, etc.
>
> Why limit the number of registers at all?  So long as they fit in L1 
> cache
> you are golden.

Err, what makes you think this?
The largest problem on architectures like x86 is the number of registers.
You end up with about 4 usable registers. (hardware register renaming 
only helps eliminate instruction dependencies, before someone mentions 
it).
Performance quickly drops when you start spilling registers to the stack.

In fact, i've seen multiple  SPEC regressions of 15% or more caused by a 
single extra spilled register.
Why?
Because you have to save it and reload it multiple times.
These *kill* pipelines, and instruction scheduling.

It's also *much* harder to model the cache hierarchy properly so that 
you can make sure they'd fit in the l1 cache, than it is to make sure 
they stay in registers where needed in the first place.


Try taking a performance critical loop entirely in registers, and change 
it to save to and load from memory into a register on every iteration.
See how much slower it gets.


--Dan