Writing huge Sets() to disk

Tim Peters tim.peters at gmail.com
Mon Jan 10 21:23:35 EST 2005


[Istvan Albert] 
> #- I think that you need to first understand how dictionaries work. 
> #- The time needed to insert a key is independent of 
> #- the number of values in the dictionary. 

[Batista, Facundo]
> Are you sure? 
> 
> I think that is true while the hashes don't collide. If you have collisions,
> time starts to depend of element quantity. But I'm not sure
>
> Tim sure can enlighten us. 

I can only point the way to enlightenment -- you have to contemplate
your own navel (or Guido's, but he's kind of shy that way).

What Istvan Albert said is close enough in context.  The *expected*
(mean) number of collisions in a Python dict probe is less than 1,
regardless of dict size.  That assumes the collision distribution is
no worse than random.  It's possible that all dict keys hash to the
same table slot, and then each insertion is O(len(dict)).  It's
possible to contrive such cases even if the hash function is a good
one.  But it's not going to happen by accident (and, when it does
<wink>, open a bug report -- we'll improve the key type's hash
function then).



More information about the Python-list mailing list