Mutable objects which define __hash__ (was Re: Why are tuples immutable?)
Nick Coghlan
ncoghlan at iinet.net.au
Sat Dec 25 21:17:42 EST 2004
Bengt Richter wrote:
> I take it you are referring to regular python dictionaries,
Correct. So I'll skip over the sections talking about alternate lookup
strategies in my reply.
>>Anyway, what are the consequences of a type which has a 'variable hash'?
>>
>>The only real bad consequence is that the internal state of a dictionary can
>>break if someone alters the hash of one of its keys. The restriction to
>>'constant hashes' is aimed squarely at eliminating this class of programming
>>errors. It doesn't actually provide any performance gain.
>
> Why not? It means you only have to compute the hash for any given object once,
> and you can cache the hash value. That could be a biggie if you had megabyte trees
> of nested tuples as keys and reused them frequently for lookup.
Allowing mutation doesn't mean abandoning your caching - it just means you need
a way to tell the dict to update it's key cache (i.e. rehash())
> I think, for one thing, it would be more efficient to notice that some keys were guaranteed not
> to need rehashing... um, maybe by paritioning the key set into immutables and mutables ;-)
True, since otherwise rehash() would have to check every key, even the immutable
ones. However, that only affects the time required for a rehash(), rather than
general dictionary operation.
> For another, it's not just a matter of "moving" keys that hash "incorrectly". If keys that
> are a part of the state of the dict can mutate outside of dict methods, then keys can mutate
> to equal each other, and "rehashing" will give equal hashes for previously unequal and distinct keys,
> and you must choose which of the previous values is to be the value for the two keys that have now
> become one by equality (however __eq__, __cmp__, __coerce__ and inheritance may define equality).
Or resist the temptation to guess, and have rehash() raise an exception :)
> OTOH, if KeyError is frequent because of mutation, hashing may be next to useless overhead. IOW, DYFR ;-)
Indeed. I defined my FR based on Antoon's analogy of the sorted list - have the
dict behave like sorted list, so that if you screw the invariant, it's the
programmer's job to tell the dictionary to fix it.
>>>The problem is also that you really can't make new immutable objects
>
> I don't understand that one. What do you mean by "new immutable object"?
Bad terminology on Antoon's part, I think - I believe he meant "new immutable type".
I gave the Python 2.4's Decimal as an example of something which is
theoretically immutable, but doesn't actually enforce that immutability properly.
Overriding __setattr__ can give true immutability. It just makes the class a
serious pain to write :)
> That is so different that I don't know how it could be used to debug an app that
> uses a mutable key dict. I think Antoon was just asking for a key-mutation detector.
> I think that's reasonable goal. A deep copy of all keys defined would permit that,
> at least when dict methods had control. But you couldn't find the code that mutated
> the originals without something outside the dict to catch it in the act.
The identity keyed dict was based on Antoon's posted use case: he wanted to
classify mutable objects, and then check for membership in those groups.
You can use lists for this, but the lookup is slow. Conventional dictionaries
and sets give a fast lookup, but require that the items in use be hashable.
An identity dictionary (or set) solves the 'object classification' problem
perfectly.
> You just need to DYFR ;-)
I really do like this acronym :)
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at email.com | Brisbane, Australia
---------------------------------------------------------------
http://boredomandlaziness.skystorm.net
More information about the Python-list
mailing list