
Barry Warsaw writes:
I like the idea of putting this information in a List-* header, and I'll take you up on the RFC offer.
OK.
Are you thinking about trying to push this through the IETF to make it official?
Yes. It will depend on how much resistance I get, but having it already implemented and used in Mailman will certainly help. On the other hand, there may be resistance on the basis that RFC 5064 already does everything that is "really" needed.
The spec currently lives on the wiki:
Yes, I'm a little bit familiar with that spec. :-)
If we change the header name, I'd want to keep X-Message-ID-Hash for the MM3 final release, but deprecate it. I.e. MM3 would write both headers.
I'll ask some of the IETF guys what they think about that. But if you put it in a public release, you're screwing the same kind of people Tanstaafl was talking about. Beta testers (and I mean beta testers, ie, people who have put the code in production even though it's not considered a public release) have signed up for this kind of annoyance. Random ancient Debian sysadmins haven't.
Of course we don't want to abuse our beta testers if we can avoid it, but I think if we don't want to maintain dual headers indefinitely, the public release is the time to get rid of the X- version.
As for what the List-* header would be, well, if you wanted to include the algorithm name, to be completely accurate it would have to be something like List-Base32-Encoded-SHA1-Hash-Of-The-Message-ID. Yuck ;)
We'd have to think somewhat carefully about how strong a hash we want to use if we don't specify algorithm in the field name. I'm not particularly concerned with how many bytes the header takes up. Future users can just deal with the implied BASE32 vs. BASE85 or whatever. However, if somebody thinks they need a stronger hash than we chose, we'll have interoperability problems for people who receive the message off-list.
The value of this header both serves to uniquely identify the message in a more regular format, and to serve as the final path component in the Archived-At (RFC 5064) header. So the following names come to mind:
List-Message-ID List-Archive-ID List-Archived-At-ID
suggestions welcome.
The last two are too easily confused with Archived-At.
Right. However, when this was discussed several years ago, the mail-archive.com guys did some extensive data analysis on their vast collection of email. You'd have to go spelunking in the -developers archives for details, but I recall that the collision rate was so small as to be effectively negligible,
Yes. The problem is that there are people out there with MUAs that provide bogus Message-IDs (Kyle Jones's VM used to do that), and for those people all messages after the first get dropped.
Note that if the server does indeed ignore the possibility of collisions on Message-ID, then there is no need (AFAICS) for the "thin" IArchiver to communicate with the archiver proper. I don't see how it hurts to provide for the possibility of an archiver that does check content.
Right, we hash (pun intended :) all this out years ago. We can ignore collisions, and we can do the entire calculation on the server side, using Message-ID as the sole input. I think the only issue that's worth reopening is the name of the header.
Well, that's true for *us*. The folks at the IETF don't have a habit of leaving well enough alone, though. ;-)