[Python-Dev] Algoritmic Complexity Attack on Python

Scott A Crosby scrosby@cs.rice.edu
29 May 2003 18:29:45 -0500


On Thu, 29 May 2003 18:31:56 -0400, "Tim Peters" <tim@zope.com> writes:

> [Jeremy Hylton]
> >> I just too a minute too look at this.  I downloaded the python-attack
> >> file from your Web site.  I loading all the strings and then inserted
> >> them into a dictionary.  I also generated a list of 10,000 random
> >> strings and inserted them into a dictionary.
> 
> [Scott Crosby]
> > Ok. It should have taken almost a minute instead of .08 seconds in the
> > attack version.  My file is broken. I'll be constructing a new one
> > later this evening.  If you test perl with the perl files, you'll see
> > what should have occured in this case.
> 
> Note that the 10,000 strings in the file map to 400 distinct 32-bit hash
> codes under Python's hash.  It's not enough to provoke worst-case behavior

Yes. Jeremey has made me aware of this. I appear to have made a
mistake when inserting python's hash code into my program that finds
generators. The fact that I find so many collisions, but I don't have
everything colliding indicates what the problem is.

> in Python just to collide on the low-order bits:  all 32 bits contribute to
> the probe sequence (just colliding on the initial bucket slot doesn't have
> much effect).

Correct. My program, as per the paper, generates full 32 bit hash
collisions. It doesn't generate bucket collisions.

> As is, it's effectively creating 400 collision chains, ranging in
> length from 7 to 252, with a mean length of 25 and a median of 16.

Yes. I don't know python so I had no test program to verify that my
attack file was correct. Its not. Now that I have one, I'll quash the
bug and release a new file later this evening.

Scott