set using alternative hash function?

Diez B. Roggisch deets at nospam.web.de
Thu Oct 15 09:02:03 EDT 2009


Austin Bingham wrote:

> On Thu, Oct 15, 2009 at 2:23 PM, Diez B. Roggisch <deets at nospam.web.de>
> wrote:
>> Austin Bingham wrote:
>> This is a POV, but to to me, the set just deals with a very minimal
>> protocol - hash-value & equality. Whatever you feed it, it has to cope
>> with that. It strikes *me* as odd to ask for something else.
> 
> But I'm not really asking for anything that changes the paradigm of
> how things currently work. All I need is the ability to say something
> like this:
> 
> s = set(hash_func = lambda x: hash(x.name))
> 
> If set could be de-hardcoded from using hash(), nothing else about how
> it works would have to change. Or am I wrong on that? I see that you
> mention equality in the protocol, but I don't know why a set would
> need that if a) hash(x) == hash(y) --> x == y and b) hash function
> must return a 32 bit value (which I think is the case)...so maybe I
> misunderstand something.

You do. Hashes can collide, and then you need equality. Sets are *based* on
equality actually, the hash is just one optimization. Other optimizations
build on order (usually, the STL requires you to define < on objects for
that. Which is enough because a == b <=> a < b && b < a). But eventually,
it all boils down to equality. You could implement a set without hash, by
imposing linear behavior on insert and lookup - but none based purely on a
hash.

> 
> I wonder...does the requirement of using hash() have to do with the
> low-level implementation of set? That might explain the state of
> things.
> 
>> The ideal solution would be to have several hash/eq-methods on your
>> object, and somehow make the surroundings decide which to use. But there
>> is no OO-pragma for that.
> 
> But why force objects to know about all of the wacky contexts in which
> they might be used? Let objects be objects and hash-functions be
> hash-functions. If someone wants a set where uniqueness is defined by
> the number of occurrences of 'a' in an object's name, the object
> should never have to know that...it should just have a name.

Because these functions need intimate knowledge of the objects to actually
work. Sure, in python it's easy to define them outside, and just reach into
the object as you wish. But aside this implementation detail, a hash (and
equality of course) always are based upon the object in question. So I
think it's natural to have it defined right there (which __hash__ and
__eq__ show that this seems to be accepted in general).

And if there were something that would decide on context which of several
implementations to use, you'd have less to worry. As things are, there
isn't such thing (I don't even have the slightest idea what could work),
you are as well off with defining two functions.


Diez



More information about the Python-list mailing list