Generate unique ID for URL

Johannes Bauer dfnsonfsduifb at gmx.de
Wed Nov 14 12:29:05 CET 2012


On 14.11.2012 02:39, Roy Smith wrote:

> The next step is to reduce the number of bits you are encoding.  You 
> said in another post that "1 collision in 10 million hashes would be 
> tolerable".  So you need:
> 
>>>> math.log(10*1000*1000, 2)
> 23.25349666421154
> 
> 24 bits worth of key. 

Nope :-)

> Base64 encoded, that's only 4 characters.  
> Actually, I probably just proved that I don't really understand how 
> probabilities work, so maybe what you really need is 32 or 48 or 64 
> bits.

:-))

When doing these calculations, it's important to keep the birthday
paradox in mind (this is kind of counter-intuitive): The chance of a
collission raises tremendously when we're looking for *any* arbitrary
two hashes colliding within a certain namespace. The probability you've
calculated is the pre-image probability (which you also again need to
multiply with a factor of two, because when trying to collide one given
hash, in the mean case you'll only have to search *half* the namespace
before finding a collision).

There are three things you need to know before you can give an estimate:
1. The permissible probability of a collision (1e-7 in this case)
2. The hash size
3. The worst-case number of elements in the namespace

You neglected 3 completely -- but knowing this is really important. This
becomes obvious when looking at the extreme cases: Let's say you have a
hash of arbitrary size, but only hash one element. The chance of a
collision is *always* zero. Or look at a hash of size 2^n. Then put 2^n
+ 1 elements in the namespace. The chance of a collision is *always* one.

Doing the calculations (formulas can be found on wikipedia on the site
of the birthday phaenomenon), you can come up with these following
bitlenghts of the hash with a 1e-7 probability of collision in respect
to the worst-case number of elements

10k elements: 49 bit
100k elements: 56 bit
1e6 elements: 63 bit
100e6 elements: 76 bit
1e9 elements: 83 bit
1e12 elements: 102 bit

Best regards,
Johannes

-- 
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1 at speranza.aioe.org>


More information about the Python-list mailing list