[Mailman-Developers] [Bug 985149] Add List-Post value to permalink hash input
Jeff Breidenbach
jeff at jab.org
Fri Apr 20 22:19:44 CEST 2012
A couple quick practical notes:
1) Terri is exactly right. The reason for including list identity as
part of the hash calculation is for cross-posted messages. An
archiving service shows context. Here's the message AND the thread it
fits into, AND information about the list it travelled over AND the
ability to search that list further. Archives need to know the list to
provide context.
2) The reason mail-archive.com uses List-Post and not List-Id in the
calculation is because every list, RFC2369 compliant or not, has a
concept of a posting address. It is natural idea, easy to think of and
understand. Hence all mail-archive.com archives are keyed off of
posting address. It would be technical possible (but an architectural
pain) for mail-archive.com to calculate using List-Id. We'd probably
not bother and instead store whatever was calculated by mailman and
placed in the Archived-At: header. Okay, I'll admit my prejudice. I've
always found List-Id annoying, and wish that it didn't exist.
3) As long as things are changing, I want to mention that these URLs
feel too long. SHA-1 is a 160 bit hash consuming 32 URL characters. I
think trimming to a 64 bit (13 character) hash is plenty. According to
wikipedia collision tables, with the shorter hash we'd expect to get
our first collision after archiving 5 billion messages. That's 50X the
current corpus size of public archival services like GMane. And it
isn't like an occasional hash collision is a big deal or a security
problem. http://en.wikipedia.org/wiki/Birthday_attack
3b) For that matter, a sequence number would also do the trick, but I
can understand that this is much more dangerous; it is easy for a
sequence number to get reset and cause all hell to break loose.
4) I'm really not that picky. Our archival service could deal with all
sorts of URLs, including the ones Terri was trying to avoid, such as
http://example.com/archiver/listname.example.com/$hash
In fact, we've found that lots of small, per-list databases have speed
and reliability advantages over big global databases. But I also like
short URLs. Bottom line, please don't let these comments delay or
derail forward progress.
-Jeff
More information about the Mailman-Developers
mailing list