[Python-3000] callable()

Nick Coghlan ncoghlan at gmail.com
Tue Jul 25 14:53:45 CEST 2006


Andrew Koenig wrote:
>> In both cases, __hash__ is not idempotent, and is thus an abomination.
> 
> Why do you say it's not idempotent?  The first time you call it, either it
> works or it doesn't.  If it doesn't work, then you shouldn't have called it
> in the first place.  If it does work, all subsequent calls will return the
> same result.
> 
>> Case
>> 1 is a perverse programmer -- well known to be capable of abominations.
> 
> What is perverse about case 1?  I'm not being disingenuous here; I really
> don't know.  I am assuming, of course, that the object in question never
> changes the value of its component once constructed.

I wouldn't call case 1 perverse, but I would call it buggy if x.partialhash() 
wasn't idempotent, or if it used the same hash cache as the full hash function.

E.g. there's no state consistency problems with the following:

     def __init__(self):
         self._fullhash = None
         self._partialhash = None

     def partialhash(self, init_hash=None):
         if self._partialhash is None:
             #work it out and set it
         return self._partialhash

     def __hash__(self):
         if self._fullhash is None:
             selfpart = self.partialhash()
             self._fullhash = self.y.partialhash(selfpart)
         return self._fullhash

Alternatively, the __hash__ function could be written in a transactional 
style, backing out the call to the partial hash if the hash of the 
subcomponent failed:

     def __init__(self):
         self._fullhash = None

     def __hash__(self):
         if self._fullhash is None:
             selfpart = self.partialhash()
             try:
                 self._fullhash = self.y.partialhash(selfpart)
             except:
                 self._clearpartialhash()
                 raise
         return self._fullhash

Either way, if the __hash__ function can fail in a way that can leave the 
object in an inconsistent state, then that's a bug in the implementation of 
the __hash__ function.

For case 2, the problem is the idea of using the hash of the entire CD as the 
__hash__ of the object that represents that CD in memory, and then making the 
retrieval of that data a side effect of attempting to hash the object. 
Touching an IO device or the network to compute the hash of an in memory data 
structure sounds like an incredibly bad idea. If that information is an 
important enough part of the object's identity to be included in its hash, it 
needs to be retrieved before the object can be considered fully created, and 
it should NOT be done as a side effect of trying to hash the object. Instead, 
if the attribute is not set, the hash operation should simply fail with 
something like RuntimeError("necessary attribute not set"). Or you can be 
stricter, and make the attribute mandatory at object creation time.

All of which is a long-winded way of saying "calculation of an object hash 
should be both cheap and idempotent" :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org


More information about the Python-3000 mailing list