<div class="gmail_quote">On Fri, Jan 13, 2012 at 10:13 AM, Mark Dickinson <span dir="ltr">&lt;<a href="mailto:dickinsm@gmail.com">dickinsm@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class="im">On Fri, Jan 13, 2012 at 5:43 PM, Guido van Rossum &lt;<a href="mailto:guido@python.org">guido@python.org</a>&gt; wrote:<br>

&gt;&gt; How pathological do you consider the set<br>

&gt;&gt;<br>

&gt;&gt;   {1 &lt;&lt; n for n in range(2000)}<br>

&gt;&gt;<br>

&gt;&gt; to be?  What about the set:<br>

&gt;&gt;<br>

&gt;&gt;   ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)}<br>

&gt;&gt;<br>

&gt;&gt; ?  The &gt; 2000 elements of the latter set have only 61 distinct hash<br>

&gt;&gt; values on 64-bit machine, so there will be over 2000 total collisions<br>

&gt;&gt; involved in creating this set (though admittedly only around 30<br>

&gt;&gt; collisions per hash value).<br>

&gt;<br>

&gt; Hm... So how does the collision counting work for this case?<br>

<br>

</div>Ah, my bad.  It looks like the ieee754_powers_of_two is safe---IIUC,<br>

it&#39;s the number of collisions involved in a single key-set operation<br>

that&#39;s limited.  So a dictionary with keys {1&lt;&lt;n for n in range(2000)}<br>

is fine, but a dictionary with keys  {1&lt;&lt;(61*n) for n in range(2000)}<br>

is not:<br>

<br>

&gt;&gt;&gt; {1&lt;&lt;(n*61):True for n in range(2000)}<br>

Traceback (most recent call last):<br>

  File &quot;&lt;stdin&gt;&quot;, line 1, in &lt;module&gt;<br>

  File &quot;&lt;stdin&gt;&quot;, line 1, in &lt;dictcomp&gt;<br>

KeyError: &#39;too many hash collisions&#39;<br>

[67961 refs]<br>

<br>

I&#39;d still not consider this particularly pathological, though.</blockquote><div><br>Really? Even though you came up with specifically to prove me wrong? <br></div></div><br>-- <br>--Guido van Rossum (<a href="http://python.org/~guido">python.org/~guido</a>)<br>