On Sat, Dec 31, 2011 at 10:57 PM, Paul McMillan email@example.com wrote:
Still, it would represent an effort for the attacker of a much greater magnitude than the current attack. It's all a trade-off -- at some point it'll just be easier for the attacker to use some other vulnerability.
the class of vulnerable servers would be greatly reduced.
I agree that doing anything is better than doing nothing. If we use the earlier suggestion and prepend everything with a fixed-length seed, we need quite a bit of entropy (and so a fairly long string) to make a lasting difference.
Ah, but the effect of that long string is summarized in a single (32- or 64-bit) integer.
I'm not sure I understand this. What's the worry about "a different
hash functions"? (It may be clear that I do not have a deep mathematical understanding of hash functions.)
This was mostly in reference to earlier suggestions of switching to cityhash, or using btrees, or other more invasive changes. Python 2.X is pretty stable and making large changes like that to the codebase can have unpredictable effects.
We know that the current hash function works well (except for this specific problem), so it seems like the best fix will be as minimal a modification as possible, to avoid introducing bugs.
I forget -- what do we do on systems without urandom()? (E.g. Windows?)
Windows has CryptGenRandom which is approximately equivalent.
Let's worry about timing attacks another time okay?
Agreed. As long as there isn't a gaping hole, I'm fine with that.
Hm. I'm not sure I like the idea of extra arithmetic for every character being hashed.
From a performance standpoint, this may still be better than adding 8 or 10 characters to every single hash operation, since most hashes are over short strings.
But how about precomputing the intermediate value (x)? The hash is (mostly) doing x = f(x, c) for each c in the input.
It is important that this function touches every
character - if it only interacts with a subset of them, an attacker can fix that subset and vary the rest.
I sort of see your point, but I still think that if we could add as little per-character overhead as possible it would be best -- sometimes people *do* hash very long strings.
But I like the idea of a bigger random seed from which we deterministically pick some part.
Yeah. This makes it much harder to attack, since it very solidly places the attacker outside the realm of "just brute force the key".
How about just initializing x to some subsequence of the seed determined by e.g. the length of the hashed
plus a few characters from it?
We did consider this, and if performance is absolutely the prime directive, this (or a variant) may be the best option. Unfortunately, the collision generator doesn't necessarily vary the length of the string. Additionally, if we don't vary based on all the letters in the string, an attacker can fix the characters that we do use and generate colliding strings around them.
Still, much more work for the attacker.
Another option to consider would be to apply this change to some but not all of the rounds. If we added the seed lookup xor operation for only the first and last 5 values of x, we would still retain much of the benefit without adding much computational overhead for very long strings.
I like that.
We could also consider a less computationally expensive operation than the modulo for calculating the lookup index, like simply truncating to the correct number of bits.
Sure. Thanks for thinking about all the details here!!