Re: [Mailman-Developers] [Bug 985149] Add List-Post value to permalink hash input

I apologize, the simulation code had a flaw. I'm embarrassed that I didn't immediately recognize this immediately from intuition. We could get even more accurate results by computing actual SHA-1 of actual message-ids, but I'm not sure it is worth the effort. Here is a revised program and the results from one run. For the record, I'm interested in billion+ message counts, for which we'd need a few more hash characters. But not that many.
message count 10000000, hash length 4, collisions 99.992930% message count 10000000, hash length 5, collisions 25.755240% message count 10000000, hash length 6, collisions 0.928050% message count 10000000, hash length 7, collisions 0.029860% message count 10000000, hash length 8, collisions 0.000800% message count 10000000, hash length 9, collisions 0.000060% message count 10000000, hash length 10, collisions 0.000000%
===
#!/usr/bin/python import random def compute(message_count, hashlength): database = {} for i in range(message_count): n = random.randint(0, pow(2, 5 * hashlength)) if n in database: database[n] += 1 else: database[n] = 1 collisions = 0 for i in database: if database[i] > 1: collisions += database[i] p1 = (100.0 * collisions) / float(message_count) print("message count %d, hash length %d, collisions %f%%" % (message_count, hashlength, p1))
for i in range(4, 11): compute(10000000, i)
participants (1)
-
Jeff Breidenbach