MD5 hash of object
Jeff McNeil
jeff at jmcneil.net
Mon Jun 8 23:20:26 EDT 2009
On Jun 8, 3:47 pm, Chris Rebert <c... at rebertia.com> wrote:
> On Mon, Jun 8, 2009 at 11:43 AM, lczancanella<lczancane... at gmail.com> wrote:
> > Hi,
>
> > in hashlib the hash methods have as parameters just string, i want to
> > know how can i digest an object to get a md5 hash of them.
>
> Hashes are only defined to operate on bytestrings. Since Python is a
> high-level language and doesn't permit you to view the internal binary
> representation of objects, you're going to have to properly convert
> the object to a bytestring first, a process called "serialization".
> The `pickle` and `json` serialization modules are included in the
> standard library. These modules can convert objects to bytestrings and
> back again.
> Once you've done the bytestring conversion, just run the hash method
> on the bytestring.
>
> Be careful when serializing dictionaries and sets though, because they
> are arbitrarily ordered, so two dictionaries containing the same items
> and which compare equal may have a different internal ordering, thus
> different serializations, and thus different hashes.
>
> Cheers,
> Chris
> --http://blog.rebertia.com
I'd think that using the hash of the pickled representation of an
object might be problematic, no? The pickle protocol handles object
graphs in a way that allows it to preserve references back to
identical objects. Consider the following (contrived) example:
import pickle
from hashlib import md5
class Value(object):
def __init__(self, v):
self._v = v
class P1(object):
def __init__(self, name):
self.name = Value(name)
self.other_name = self.name
class P2(object):
def __init__(self, name):
self.name = Value(name)
self.other_name = Value(name)
h1 = md5(pickle.dumps(P1('sabres'))).hexdigest()
h2 = md5(pickle.dumps(P2('sabres'))).hexdigest()
print h1 == h2
>>> False
Just something to be aware of. Depending on what you're trying to
accomplish, it may make sense to simply define a method which
generates a byte string representation of your object's state and just
return the hash of that value.
Thanks,
-Jeff
mcjeff.blogspot.com
More information about the Python-list
mailing list