insert unique data in a list
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Mon Dec 14 16:53:38 EST 2009
On Mon, 14 Dec 2009 17:13:24 +0000, mattia wrote:
> Il Sun, 13 Dec 2009 21:17:28 -0800, knifenomad ha scritto:
>
>> On 12월14일, 오후12시42분, Steven D'Aprano
>> <ste... at REMOVE.THIS.cybersource.com.au> wrote:
>>> On Sun, 13 Dec 2009 17:19:17 -0800, knifenomad wrote:
>>> > this makes the set type hashable.
>>>
>>> > class Set(set):
>>> > __hash__ = lambda self: id(self)
>>>
>>> That's a *seriously* broken hash function.
>>>
>>> >>> key = "voila"
>>> >>> d = { Set(key): 1 }
>>> >>> d
>>>
>>> {Set(['i', 'a', 'l', 'o', 'v']): 1}>>> d[ Set(key) ]
>>>
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> KeyError: Set(['i', 'a', 'l', 'o', 'v'])
>>>
>>> --
>>> Steven
>>
>> of course it is broken as long as it uses it's instance id. i added
>> this to notify that unhashable can become hashable implementing
>> __hash__ inside the class. which probably set to None by default.
>
> Ok, nice example, but I believe that using id() as the hash function can
> lead to unexpected collisions.
No, you have that backwards. Using id() as the hash function means that
equal keys will hash unequal -- rather than unexpected collisions, it
leads to unexpected failure-to-collide-when-it-should-collide.
And it isn't a "nice example", it is a terrible example.
Firstly, the example fails to behave correctly. It simply doesn't work as
advertised.
Secondly, and much worse, it encourages people to do something dangerous
without thinking about the consequences. If it is so easy to hash mutable
objects, why don't built-in lists and sets don't have a __hash__ method?
Do you think that the Python developers merely forgot?
No, there is good reason why mutable items shouldn't be used as keys in
dicts or in sets, and this example simply papers over the reasons why and
gives the impression that using mutable objects as keys is easy and safe
when it is neither.
Using mutable objects as keys or set elements leads to surprising,
unexpected behaviour and hard-to-find bugs. Consider the following set
with lists as elements:
L = [1, 2]
s = Set() # set that allows mutable elements
s.add(L)
s.add([1, 2, 3])
So far so good. But what should happen now?
L.append(3)
The set now has two equal elements, which breaks the set invariant that
it has no duplicates.
Putting the problem of duplicates aside, there is another problem:
L = [1, 2]
s = Set([L])
L.append(3)
There are two possibilities: either the hash function of L changes when
the object changes, or it doesn't. Suppose it changes. Then the hash of L
after the append will be different from the hash of L before the append,
and so membership testing (L in s) will fail.
Okay, suppose we somehow arrange matters so that the hash of the object
doesn't change as the object mutates. This will solve the problem above:
we can mutate L as often as we like, and L in s will continue to work
correctly. But now the hash of an object doesn't just depend on it's
value, but on its history. That means that two objects which are
identical can hash differently, and we've already seen this is a problem.
There is one final approach which could work: we give the object a
constant hash function, so that all objects of that type hash
identically. This means that the performance of searches and lookups in
sets and dicts will fall to that of lists. There is no point in paying
all the extra overhead of a dict to get behaviour as slow, or slower,
than a list.
In other words, no matter what you do, using mutable objects as keys or
set elements leads to serious problems that need to be dealt with. It
simply isn't true that all you need to do to make mutable objects usable
in dicts or sets is to add a hash function. That part is trivial.
--
Steven
More information about the Python-list
mailing list