hashkey/digest for a complex object
geremy condra
debatem1 at gmail.com
Wed Oct 6 15:28:30 EDT 2010
On Wed, Oct 6, 2010 at 11:58 AM, kj <no.email at please.post> wrote:
>
>
> The short version of this question is: where can I find the algorithm
> used by the tuple class's __hash__ method?
>From Objects/tuple.c, line 315 in Python3.2:
static long
tuplehash(PyTupleObject *v)
{
register long x, y;
register Py_ssize_t len = Py_SIZE(v);
register PyObject **p;
long mult = 1000003L;
x = 0x345678L;
p = v->ob_item;
while (--len >= 0) {
y = PyObject_Hash(*p++);
if (y == -1)
return -1;
x = (x ^ y) * mult;
/* the cast might truncate len; that doesn't change hash stability */
mult += (long)(82520L + len + len);
}
x += 97531L;
if (x == -1)
x = -2;
return x;
}
> Now, for the long version of this question, I'm working with some
> complext Python objects that I want to be able to compare for
> equality easily.
>
> These objects are non-mutable once they are created, so I would
> like to use a two-step comparison for equality, based on the
> assumption that I can compute (either at creation time, or as needed
> and memoized) a hashkey/digest for each object. The test for
> equality of two of these objects would first compare their hashkeys.
> If they are different, the two objects are declared different; if
> they match, then a more stringent test for equality is performed.
>
> So the problem is to come up with a reasonable hashkey for each of
> these objects. The objects have two significant attributes, and
> two of these objects should be regarded as equal if these attributes
> are "the same" in both. The first attribute is a simple dictionary
> whose keys are integers and values are strings. The second attribute
> is more complicated. It is a tree structure, represented as a
> dictionary of dictionaries of dictionaries... until we get to the
> leaf elements, which are frozensets of strings. The keys at every
> level of this structure are strings. E.g. a simple example of such
> an attribute would look like:
>
> {'A': {'a': set(['1', '2', '3']),
> 'b': set(['4', '5'])},
> 'B': set(['6', '7', '8'])}
>
> I'm looking for a good algorithm for computing a hash key for
> something like this? (Basically I'm looking for a good way to
> combine hashkeys.)
Sounds like you're trying to hash mutable data structures, which is a
no-no, but assuming you've got that part all figured out then the easy
way out would be to print the whole thing and take the hash of the
resulting string. A (possibly) better way would be to do a Schwartzian
transform[1] and freeze everything. Other approaches may be better,
but those are the first two out of my head.
Geremy Condra
[1] http://en.wikipedia.org/wiki/Schwartzian_transform
More information about the Python-list
mailing list