Raymond Hettinger wrote:
FWIW, Uncle Timmy considers the non-randomized hashes to be a virtue. It is believed that they give us better-than-random results for commonly encountered datasets. A change to randomized hashes would have a negative performance impact on those cases.
Tim Peter's analysis applies mainly to ints which would be unchanged.
A change to the hash function for strings would make no difference to the performance of the dict, as the ordering of the hash values is already quite different from the ordering of the strings for any string of more than 3 characters.
Also, randomizing the hash wreaks havoc on doctests, book examples not matching actual dict reprs, and on efforts by users to optimize the insertion order into dicts with frequent lookups.
The docs clearly state that the ordering of iteration over dicts is arbitrary. Perhaps changing it once in a while might be a good thing :)
On Dec 28, 2011, at 5:28 PM, Michael Foord wrote:
A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included:
Although it's a security issue I'm posting it here because it is now public and seems important.
The issue they report can cause (for example) handling an http post to consume horrible amounts of cpu. For Python the figures they quoted:
reasonable-sized attack strings only for 32 bits Plone has max. POST size of 1 MB 7 minutes of CPU usage for a 1 MB request ~20 kbits/s → keep one Core Duo core busy
This was apparently reported to the security list, but hasn't been responded to beyond an acknowledgement on November 24th (the original report didn't make it onto the security list because it was held in a moderation queue).
The same vulnerability was reported against various languages and web frameworks, and is already fixed in some of them.
Their recommended fix is to randomize the hash function.
All the best,
May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html
Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/raymond.hettinger%40gmail....
Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org