Using hash to see if object's attributes have changed

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Fri Dec 11 20:49:58 EST 2009


On Fri, 11 Dec 2009 10:03:06 -0800, Bryan wrote:

> When a user submits a request to update an object in my web app, I make
> the changes in the DB, along w/ who last updated it and when.  I only
> want to update the updated/updatedBy columns in the DB if the data has
> actually changed however.
> 
> I'm thinking of having the object in question be able to return a list
> of its values that constitute its state.  Then I can take a hash of that
> list as the object exists in the database before the request, 

Storing the entire object instead of the hash is not likely to be *that* 
much more expensive. We're probably talking about [handwaves] a few dozen 
bytes versus a few more dozen bytes -- trivial in the large scheme of 
things.

So give the object a __ne__ method, store a copy of the object, and do 
this:

if current_object != existing_object:
    update(...)



> and then
> on the object that the user has made changes to.  If they are not equal,
> the user has changed the object.

If all you care about is a flag that says whether the state has changed 
or not, why don't you add a flag "changed" to the object and update it as 
needed? 

if current_object.changed:
    update(...)
    current_object.changed = False


That would require all attributes be turned into properties, but that 
shouldn't be hard. Or use a datestamp instead of a flag:

if current_object.last_changed > database_last_changed:
    update(...)

 
> I imagine it working something like this:
> 
> def getValues(obj):
> 	return [obj.a, obj.b, obj.c]
> 
> foo = Obj()
> foo.a = foo.b = foo.c = 1
> stateBefore = hashlib.sha1(str(getValues(foo))) 
> foo.b = 'changed'
> stateNow = hashlib.sha1(str(getValues(foo))) 
> assert stateBefore != stateNow


You probably don't need a cryptographically strong hash. Just add a 
__hash__(self) method to your class:


def MyObject(object):  # or whatever it is called
    def __hash__(self):
        t = (self.a, self.b, self.c)
        return hash(t)

stateNow = hash(foo)



In fact, why bother with hashing it? Just store the tuple itself, or a 
serialized version of it, and compare that.


> I originally thought about running the hash on the __dict__ attribute,
> but there may be things in there that don't actually constitute the
> object's state as far as the database is concerned, so I thought it
> better to have each object be responsible for returning a list of values
> that constitute its state as far as the DB is concerned.
> 
> I would appreciate any insight into why this is a good/bad idea given
> your past experiences.


Call me paranoid if you like, but I fear collisions. Even 
cryptographically strong hashes aren't collision-free (mathematically, 
they can't be). Even though the chances of a collision might only be one 
in a trillion-trillion-trillion, some user might be unlucky and stumble 
across such a collision, leading to a bug that might cause loss of data. 
As negligible as the risk is, why take that chance if there are ways of 
detecting changes that are just as good and probably faster?

Hash functions have their uses, but I don't think that this is one of 
them.



-- 
Steven



More information about the Python-list mailing list