<div class="gmail_quote">On Fri, Jan 13, 2012 at 10:13 AM, Mark Dickinson <span dir="ltr"><<a href="mailto:dickinsm@gmail.com">dickinsm@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On Fri, Jan 13, 2012 at 5:43 PM, Guido van Rossum <<a href="mailto:guido@python.org">guido@python.org</a>> wrote:<br>
>> How pathological do you consider the set<br>
>><br>
>> {1 << n for n in range(2000)}<br>
>><br>
>> to be? What about the set:<br>
>><br>
>> ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)}<br>
>><br>
>> ? The > 2000 elements of the latter set have only 61 distinct hash<br>
>> values on 64-bit machine, so there will be over 2000 total collisions<br>
>> involved in creating this set (though admittedly only around 30<br>
>> collisions per hash value).<br>
><br>
> Hm... So how does the collision counting work for this case?<br>
<br>
</div>Ah, my bad. It looks like the ieee754_powers_of_two is safe---IIUC,<br>
it's the number of collisions involved in a single key-set operation<br>
that's limited. So a dictionary with keys {1<<n for n in range(2000)}<br>
is fine, but a dictionary with keys {1<<(61*n) for n in range(2000)}<br>
is not:<br>
<br>
>>> {1<<(n*61):True for n in range(2000)}<br>
Traceback (most recent call last):<br>
File "<stdin>", line 1, in <module><br>
File "<stdin>", line 1, in <dictcomp><br>
KeyError: 'too many hash collisions'<br>
[67961 refs]<br>
<br>
I'd still not consider this particularly pathological, though.</blockquote><div><br>Really? Even though you came up with specifically to prove me wrong? <br></div></div><br>-- <br>--Guido van Rossum (<a href="http://python.org/~guido">python.org/~guido</a>)<br>