[Mailman-Developers] [Bug 985149] Add List-Post value to permalink hash input

Jeff Breidenbach jeff at jab.org
Wed Apr 25 06:09:47 CEST 2012


I apologize, the simulation code had a flaw.  I'm embarrassed that I
didn't immediately recognize this immediately from intuition.  We
could get even more accurate results by computing actual SHA-1 of
actual message-ids, but I'm not sure it is worth the effort. Here is a
revised program and the results from one run. For the record, I'm
interested in billion+ message counts, for which we'd need a few more
hash characters. But not that many.

message count 10000000, hash length 4, collisions 99.992930%
message count 10000000, hash length 5, collisions 25.755240%
message count 10000000, hash length 6, collisions 0.928050%
message count 10000000, hash length 7, collisions 0.029860%
message count 10000000, hash length 8, collisions 0.000800%
message count 10000000, hash length 9, collisions 0.000060%
message count 10000000, hash length 10, collisions 0.000000%

===

#!/usr/bin/python
import random
def compute(message_count, hashlength):
  database = {}
  for i in range(message_count):
    n = random.randint(0, pow(2, 5 * hashlength))
    if n in database:
      database[n] += 1
    else:
      database[n] = 1
  collisions = 0
  for i in database:
    if database[i] > 1:
      collisions += database[i]
  p1 = (100.0 * collisions) / float(message_count)
  print("message count %d, hash length %d, collisions %f%%" %
        (message_count, hashlength, p1))

for i in range(4, 11):
  compute(10000000, i)


More information about the Mailman-Developers mailing list