
On Dec 27, 2014, at 03:57 PM, Stephen J. Turnbull wrote:
By the way, I would say to adopt modern IETF practice here and drop the "X-" (in practice collisions are rare while the annoyance of fixing platforms to use the standardized name is frequent), and include the algorithm in the name. Eg, Message-ID-MD5 or Hashed-Message-ID-MD5. Or we could use the List-* namespace.
We should do this while we still can. :-) If you want I can try to write an RFC to make it official.
I like the idea of putting this information in a List-* header, and I'll take you up on the RFC offer. Are you thinking about trying to push this through the IETF to make it official?
The spec currently lives on the wiki:
http://wiki.list.org/display/DEV/Stable+URLs
MM3 and HK should both be implementing this now, and I think mail-archive.com does too.
If we change the header name, I'd want to keep X-Message-ID-Hash for the MM3 final release, but deprecate it. I.e. MM3 would write both headers.
As for what the List-* header would be, well, if you wanted to include the algorithm name, to be completely accurate it would have to be something like List-Base32-Encoded-SHA1-Hash-Of-The-Message-ID. Yuck ;)
The value of this header both serves to uniquely identify the message in a more regular format, and to serve as the final path component in the Archived-At (RFC 5064) header. So the following names come to mind:
List-Message-ID List-Archive-ID List-Archived-At-ID
suggestions welcome.
The only reason I can think of is that you want to check that the permalink isn't already occupied (that's the only thing HyperKitty proper knows that can't be computed the same way in the IArchiver as in HyperKitty proper AFAICS)
Right. However, when this was discussed several years ago, the mail-archive.com guys did some extensive data analysis on their vast collection of email. You'd have to go spelunking in the -developers archives for details, but I recall that the collision rate was so small as to be effectively negligible, even more so if you ignore spam. And if the X-Message-ID-Hash collides, then the Message-ID will collide, and it's likely that any archiver would drop the message anyway.
My own preference is for a permalink that can be computed from the originator header data (author, recipients, date, message ID, subject) by anyone with access to the message, and that means you need the archive server to be able to deal gracefully with collisions. (In practice message IDs are not perfect UUIDs, although they're very close, and some messages don't have them or have different ones assigned by mediating hosts at arrival at multiple recipients.)
Right, we hash (pun intended :) all this out years ago. We can ignore collisions, and we can do the entire calculation on the server side, using Message-ID as the sole input. I think the only issue that's worth reopening is the name of the header.
Cheers, -Barry