[Python-Dev] Hash collision security issue (now public)
Victor Stinner
victor.stinner at haypocalc.com
Sun Jan 1 18:32:51 CET 2012
Le 01/01/2012 04:29, Paul McMillan a écrit :
> This is incorrect. Once an attacker has guessed the random seed, any
> operation which reveals the ordering of hashed objects can be used to
> verify the answer. JSON responses would be ideal. In fact, an attacker
> can do a brute-force attack of the random seed offline. Once they have
> the seed, generating collisions is a fast process.
If we want to protect a website against this attack for example, we must
suppose that the attacker can inject arbitrary data and can get
(indirectly) the result of hash(str) (e.g. with the representation of a
dict in a traceback, with a JSON output, etc.).
> The goal isn't perfection, but we need to do better than a simple
> salt.
I disagree. I don't want to break backward compatibility and have a
hash() function different for each process, if the change is not an
effective protection against the "hash vulnerability".
It's really hard to write a good (secure) hash function: see for example
the recent NIST competition (started in 2008, will end this year). Even
good security researcher are unable to write a strong and fast hash
function. It's easy to add a weakness in the function if you don't have
a good background in cryptography. The NIST competition gives 4 years to
analyze new hash functions. We should not rush to add a quick "hack" if
it doesn't solve correctly the problem (protect against a collision
attack and preimage attack).
http://en.wikipedia.org/wiki/NIST_hash_function_competition
http://en.wikipedia.org/wiki/Collision_attack
Runtime performance does matter, I'm not completly sure that changing
Python is the best place to add a countermeasure against a
vulnerability. I don't want to slow down numpy for a web vulnerability.
Because there are different use cases, a better compromise is maybe to
add a runtime option to use a secure hash function, and keep the unsafe
but fast hash function by default.
> I propose we modify the string hash function like this:
>
> https://gist.github.com/0a91e52efa74f61858b5
Always allocate 2**21 bytes just to workaround one specific kind of
attack is not acceptable. I suppose that the maximum acceptable is 4096
bytes (or better 256 bytes).
Crytographic hash functions don't need random data, why would Python
need 2 MB (!) for its hash function?
Victor
More information about the Python-Dev
mailing list