[Mailman-Developers] [Bug 985149] Add List-Post value to permalink hash input
Jeff Breidenbach
jeff at jab.org
Wed Apr 25 06:09:47 CEST 2012
I apologize, the simulation code had a flaw. I'm embarrassed that I
didn't immediately recognize this immediately from intuition. We
could get even more accurate results by computing actual SHA-1 of
actual message-ids, but I'm not sure it is worth the effort. Here is a
revised program and the results from one run. For the record, I'm
interested in billion+ message counts, for which we'd need a few more
hash characters. But not that many.
message count 10000000, hash length 4, collisions 99.992930%
message count 10000000, hash length 5, collisions 25.755240%
message count 10000000, hash length 6, collisions 0.928050%
message count 10000000, hash length 7, collisions 0.029860%
message count 10000000, hash length 8, collisions 0.000800%
message count 10000000, hash length 9, collisions 0.000060%
message count 10000000, hash length 10, collisions 0.000000%
===
#!/usr/bin/python
import random
def compute(message_count, hashlength):
database = {}
for i in range(message_count):
n = random.randint(0, pow(2, 5 * hashlength))
if n in database:
database[n] += 1
else:
database[n] = 1
collisions = 0
for i in database:
if database[i] > 1:
collisions += database[i]
p1 = (100.0 * collisions) / float(message_count)
print("message count %d, hash length %d, collisions %f%%" %
(message_count, hashlength, p1))
for i in range(4, 11):
compute(10000000, i)
More information about the Mailman-Developers
mailing list