MD5 hash of object

Jeff McNeil jeff at jmcneil.net
Tue Jun 9 05:20:26 CEST 2009


On Jun 8, 3:47 pm, Chris Rebert <c... at rebertia.com> wrote:
> On Mon, Jun 8, 2009 at 11:43 AM, lczancanella<lczancane... at gmail.com> wrote:
> > Hi,
>
> > in hashlib the hash methods have as parameters just string, i want to
> > know how can i digest an object to get a md5 hash of them.
>
> Hashes are only defined to operate on bytestrings. Since Python is a
> high-level language and doesn't permit you to view the internal binary
> representation of objects, you're going to have to properly convert
> the object to a bytestring first, a process called "serialization".
> The `pickle` and `json` serialization modules are included in the
> standard library. These modules can convert objects to bytestrings and
> back again.
> Once you've done the bytestring conversion, just run the hash method
> on the bytestring.
>
> Be careful when serializing dictionaries and sets though, because they
> are arbitrarily ordered, so two dictionaries containing the same items
> and which compare equal may have a different internal ordering, thus
> different serializations, and thus different hashes.
>
> Cheers,
> Chris
> --http://blog.rebertia.com

I'd think that using the hash of the pickled representation of an
object might be problematic, no?  The pickle protocol handles object
graphs in a way that allows it to preserve references back to
identical objects.  Consider the following (contrived) example:

import pickle
from hashlib import md5

class Value(object):
    def __init__(self, v):
        self._v  = v

class P1(object):
    def __init__(self, name):
        self.name = Value(name)
        self.other_name = self.name

class P2(object):
    def __init__(self, name):
        self.name = Value(name)
        self.other_name = Value(name)

h1 = md5(pickle.dumps(P1('sabres'))).hexdigest()
h2 = md5(pickle.dumps(P2('sabres'))).hexdigest()

print h1 == h2
>>> False

Just something to be aware of. Depending on what you're trying to
accomplish, it may make sense to simply define a method which
generates a byte string representation of your object's state and just
return the hash of that value.

Thanks,

-Jeff
mcjeff.blogspot.com



More information about the Python-list mailing list