[Tutor] hash value input

Rich Lovely roadierich at googlemail.com
Fri Jan 29 19:20:39 CET 2010


On 29 January 2010 16:54, Kent Johnson <kent37 at tds.net> wrote:
> On Fri, Jan 29, 2010 at 8:03 AM, spir <denis.spir at free.fr> wrote:
>
>> I recently discovered that Lua uses the data's address (read: id) as input to the hash func. This allows Lua tables (a kind of more versatile associative array) to use _anything_ as key, since the id is guaranteed not to change, per definition.
>> [Aside this consideration, hashing addresses ensures a constant type as input (so allows a specific efficient hash method) and the simplest possible one.]
>
> This sounds like a bad idea to me. You generally want key lookup to be
> based on value, not identity. For example, if I say
>
> d = dict()
> d['key'] = value
>
> and later
> print d['key']
>
> I want this to print 'value' regardless of whether I use the same
> instance of the string 'key' in both cases.
>
> Maybe Lua has some mechanism for ensuring that this can't happen?
>
> In general, for a hash table to work as expected, two keys that are
> equal in value should have the same hash value. Here is an explanation
> of why (for Java, but it pretty much applies to any hash table
> implementation):
> http://www.javaworld.com/javaqa/2002-06/01-qa-0621-hashtable.html
>
> Kent
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>

I've played with this a little.  The following class was quite handy for this:

class BrokenHash(object):
    def __init__(self, hashval):
        self.hashval = hashval
    def __hash__(self):
        return self.hashval

It basically implements the python hashable interface, but in a manner
that is "broken" - it allows you control over what the object points
to.

>From this, we can tell that chaning the value of an object's hash
changes its "bucket":

>>> b = BrokenHash(1)
>>> d = {b:1}
>>> b.hashval=2
>>> d[b]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: <__main__.BrokenHash object at 0x657d70>

But it's more than just the objects hash:

>>> b2 = BrokenHash(hash("foo"))
>>> hash(b2) == hash("foo")
True
>>> d2 = {b2: "bar", "foo": "baz"}
>>> print d2
{<__main__.BrokenHash object at 0x657e10>: 'bar', 'foo': 'baz'}

(If they both fell into the same bucket, d2['foo'] would overwrite d2[b2]

It appears to be some combination of identity and hash.  It is not,
however, dependant on type:

>>> b3 = BrokenHash(hash("foo")) #same hash as b2
>>> d2[b3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: <__main__.BrokenHash object at 0x657e90>

Don't know if that's helped at all.

-- 
Rich "Roadie Rich" Lovely

Just because you CAN do something, doesn't necessarily mean you SHOULD.
In fact, more often than not, you probably SHOULDN'T.  Especially if I
suggested it.


More information about the Tutor mailing list