[issue13703] Hash collision security issue

Sun Jan 8 12:47:18 CET 2012

Marc-Andre Lemburg <mal at egenix.com> added the comment:

Christian Heimes wrote:
> Marc-Andre:
> Have you profiled your suggestion? I'm interested in the speed implications. My gut feeling is that your idea could be slower, since you have added more instructions to a tight loop, that is execute on every lookup, insert, update and deletion of a dict key. The hash modification could have a smaller impact, since the hash is cached. I'm merely speculating here until we have some numbers to compare.

I haven't done any profiling on this yet, but will run some
tests.

The lookup functions in the dict implementation are optimized
to make the first non-collision case fast. The patch doesn't touch this
loop. The only change is in the collision case, where an increment
and comparison is added (and then only after the comparison which
is the real cost factor in the loop). I did add a printf() to
see how often this case occurs - it's a surprisingly rare case,
which suggests that Tim, Christian and all the others that have
invested considerable time into the implementation have done
a really good job here.

BTW: I noticed that a rather obvious optimization appears to be
missing from the Python dict initialization code: when passing in
a list of (key, value) pairs, the implementation doesn't make
use of the available length information and still starts with an
empty (small) dict table and then iterates over the pairs, increasing
the table size as necessary. It would be better to start with a
table that is presized to O(len(data)). The dict implementation
already provides such a function, but it's not being used
in the case dict(pair_list). Anyway, just an aside.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13703>
_______________________________________