[Python-Dev] Python 3000: Special type for object attributes & map keys

Neil Toronto ntoronto at cs.byu.edu
Wed Mar 19 19:54:49 CET 2008


Greg Ewing wrote:
> Neal Norwitz wrote:
>> Part of this is done, but very differently in that all strings used in
>> code objects are interned (stored in a dictionary
> 
> And since two interned strings can be compared
> by pointer identity, I don't see how this differs
> significantly from the "unique integer" idea.
> 
> If the integers were used to directly index an
> array instead of being used as dict keys, it
> might make a difference. The cost would be that
> every namespace would need to be as big as
> the number of names in existence, with most
> of them being extremely sparse.

And we already have a data structure for sparse arrays... it's called a 
"dict". :)

If every attribute name were guaranteed to be an interned string (not 
currently the case - attribute names can be any kind of object), it 
might be possible to add another dict specialization for interned string 
lookup. The wins would be that lookup could assume the string's hash is 
valid, and equality comparison could be done via pointer comparison. 
HOWEVER...

I think that the attribute cache patches that went into 2.6 and 3.0 
mostly take care of lookup speed issues. They both assume strings are 
interned already. A cache hit consists of calculating a cache index from 
the string hash, ensuring that the attribute name at that index is 
identical (via pointer comparison) to the attribute name to be looked 
up, and returning the associated value. Lookup with an attribute that's 
not a string or not interned is automatically a cache miss, and it 
happens very rarely.

Specializing dicts for interned strings would optimize the cache miss. 
(When I was making the 3.0 patch, I found that happened rarely on the 
benchmarks and regression tests. It was somewhere around 5%.) The cache 
miss isn't expensive just because of the dict lookup. The attribute has 
to be searched for in every type and super-type of the object. The 
occasional string equality test probably doesn't register.

I'd be happy to be shown to be wrong, though.

Neil


More information about the Python-Dev mailing list