
[Jeremy Hylton]
I just too a minute too look at this. I downloaded the python-attack file from your Web site. I loading all the strings and then inserted them into a dictionary. I also generated a list of 10,000 random strings and inserted them into a dictionary.
[Scott Crosby]
Ok. It should have taken almost a minute instead of .08 seconds in the attack version. My file is broken. I'll be constructing a new one later this evening. If you test perl with the perl files, you'll see what should have occured in this case.
Note that the 10,000 strings in the file map to 400 distinct 32-bit hash codes under Python's hash. It's not enough to provoke worst-case behavior in Python just to collide on the low-order bits: all 32 bits contribute to the probe sequence (just colliding on the initial bucket slot doesn't have much effect). As is, it's effectively creating 400 collision chains, ranging in length from 7 to 252, with a mean length of 25 and a median of 16.