On Thu, 29 May 2003 18:31:56 -0400, "Tim Peters" <tim@zope.com> writes:
[Jeremy Hylton]
I just too a minute too look at this. I downloaded the python-attack file from your Web site. I loading all the strings and then inserted them into a dictionary. I also generated a list of 10,000 random strings and inserted them into a dictionary.
[Scott Crosby]
Ok. It should have taken almost a minute instead of .08 seconds in the attack version. My file is broken. I'll be constructing a new one later this evening. If you test perl with the perl files, you'll see what should have occured in this case.
Note that the 10,000 strings in the file map to 400 distinct 32-bit hash codes under Python's hash. It's not enough to provoke worst-case behavior
Yes. Jeremey has made me aware of this. I appear to have made a mistake when inserting python's hash code into my program that finds generators. The fact that I find so many collisions, but I don't have everything colliding indicates what the problem is.
in Python just to collide on the low-order bits: all 32 bits contribute to the probe sequence (just colliding on the initial bucket slot doesn't have much effect).
Correct. My program, as per the paper, generates full 32 bit hash collisions. It doesn't generate bucket collisions.
As is, it's effectively creating 400 collision chains, ranging in length from 7 to 252, with a mean length of 25 and a median of 16.
Yes. I don't know python so I had no test program to verify that my attack file was correct. Its not. Now that I have one, I'll quash the bug and release a new file later this evening. Scott