Uniquely identifying each & every html template

Chris Angelico rosuav at gmail.com
Thu Jan 24 01:39:16 CET 2013

On Thu, Jan 24, 2013 at 11:09 AM, Dave Angel <d at davea.name> wrote:
> I certainly can't disagree that it's easy to produce a very long hash that
> isn't at all secure.  But I would disagree that longer hashes
> *automatically* reduce chances of collision.

Sure. But by and large, longer hashes give you a better chance at
avoiding collisions.

Caveat: I am not a cryptography expert. My statements are based on my
own flawed understanding of what's going on. I use the stuff but I
don't invent it.

> Wikipedia - http://en.wikipedia.org/wiki/Cryptographic_hash_function
> seems to say that there are four requirements.
> it is easy to compute the hash value for any given message
> it is infeasible to generate a message that has a given hash
> it is infeasible to modify a message without changing the hash
> it is infeasible to find two different messages with the same hash
> Seems to me a small hash wouldn't be able to meet the last 3 conditions.

True, but the definition of "small" is tricky. Of course the one-byte
hash I proposed isn't going to be difficult to break, since you can
just brute-force a bunch of message changes until you find one that
has the right hash.

But it's more about the cascade effect - that any given message has
equal probability of having any of the possible hashes. Make a random
change, get another random hash. So for a perfect one-byte hash, you
have exactly one chance in 256 of getting any particular hash.

By comparison, a simple/naive hash that just XORs together all the
byte values fails these checks. Even if you take the message 64 bytes
at a time (thus producing a 512-bit hash), you'll still be insecure,
because it's easy to predict what hash you'll get after making a
particular change.

This property of the hash doesn't change as worldwide computing power
improves. A hashing function might go from being "military-grade
security" to being "home-grade security" to being "two-foot fence
around your property", while still being impossible to predict without
brute-forcing. But when an algorithm is found that generates
collisions faster than the hash size indicates, it effectively reduces
the hash size to the collision rate - MD5 is 128-bit, but (if I
understand the Wikipedia note correctly) a known attack cuts that to
20.96 bits of "real hash size". So MD5 is still better than a perfect
16-bit hash, but not as good as a perfect 32-bit hash. (And on today's
hardware, that's not good enough.)



More information about the Python-list mailing list