Using hash to see if object's attributes have changed
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Fri Dec 11 20:49:58 EST 2009
On Fri, 11 Dec 2009 10:03:06 -0800, Bryan wrote:
> When a user submits a request to update an object in my web app, I make
> the changes in the DB, along w/ who last updated it and when. I only
> want to update the updated/updatedBy columns in the DB if the data has
> actually changed however.
>
> I'm thinking of having the object in question be able to return a list
> of its values that constitute its state. Then I can take a hash of that
> list as the object exists in the database before the request,
Storing the entire object instead of the hash is not likely to be *that*
much more expensive. We're probably talking about [handwaves] a few dozen
bytes versus a few more dozen bytes -- trivial in the large scheme of
things.
So give the object a __ne__ method, store a copy of the object, and do
this:
if current_object != existing_object:
update(...)
> and then
> on the object that the user has made changes to. If they are not equal,
> the user has changed the object.
If all you care about is a flag that says whether the state has changed
or not, why don't you add a flag "changed" to the object and update it as
needed?
if current_object.changed:
update(...)
current_object.changed = False
That would require all attributes be turned into properties, but that
shouldn't be hard. Or use a datestamp instead of a flag:
if current_object.last_changed > database_last_changed:
update(...)
> I imagine it working something like this:
>
> def getValues(obj):
> return [obj.a, obj.b, obj.c]
>
> foo = Obj()
> foo.a = foo.b = foo.c = 1
> stateBefore = hashlib.sha1(str(getValues(foo)))
> foo.b = 'changed'
> stateNow = hashlib.sha1(str(getValues(foo)))
> assert stateBefore != stateNow
You probably don't need a cryptographically strong hash. Just add a
__hash__(self) method to your class:
def MyObject(object): # or whatever it is called
def __hash__(self):
t = (self.a, self.b, self.c)
return hash(t)
stateNow = hash(foo)
In fact, why bother with hashing it? Just store the tuple itself, or a
serialized version of it, and compare that.
> I originally thought about running the hash on the __dict__ attribute,
> but there may be things in there that don't actually constitute the
> object's state as far as the database is concerned, so I thought it
> better to have each object be responsible for returning a list of values
> that constitute its state as far as the DB is concerned.
>
> I would appreciate any insight into why this is a good/bad idea given
> your past experiences.
Call me paranoid if you like, but I fear collisions. Even
cryptographically strong hashes aren't collision-free (mathematically,
they can't be). Even though the chances of a collision might only be one
in a trillion-trillion-trillion, some user might be unlucky and stumble
across such a collision, leading to a bug that might cause loss of data.
As negligible as the risk is, why take that chance if there are ways of
detecting changes that are just as good and probably faster?
Hash functions have their uses, but I don't think that this is one of
them.
--
Steven
More information about the Python-list
mailing list