
On Dec 29, 2014, at 10:13 AM, Stephen J. Turnbull wrote:
If we change the header name, I'd want to keep X-Message-ID-Hash for the MM3 final release, but deprecate it. I.e. MM3 would write both headers.
I'll ask some of the IETF guys what they think about that. But if you put it in a public release, you're screwing the same kind of people Tanstaafl was talking about. Beta testers (and I mean beta testers, ie, people who have put the code in production even though it's not considered a public release) have signed up for this kind of annoyance. Random ancient Debian sysadmins haven't.
Of course we don't want to abuse our beta testers if we can avoid it, but I think if we don't want to maintain dual headers indefinitely, the public release is the time to get rid of the X- version.
I'd be willing to drop it if we can get agreement on the new header, and get buy-in from at least HK (abompard) and the mail-archive.com folks. AFAIK, they are the only two "clients" of the header atm. I'm not sure if the Jeffs are still reading this list, so I've CC'd them directly.
Jeffs: we are considering changing the X-Message-Hash-ID header name, at least dropping the X- prefix and possibly renaming the header.
As for what the List-* header would be, well, if you wanted to include the algorithm name, to be completely accurate it would have to be something like List-Base32-Encoded-SHA1-Hash-Of-The-Message-ID. Yuck ;)
We'd have to think somewhat carefully about how strong a hash we want to use if we don't specify algorithm in the field name. I'm not particularly concerned with how many bytes the header takes up. Future users can just deal with the implied BASE32 vs. BASE85 or whatever. However, if somebody thinks they need a stronger hash than we chose, we'll have interoperability problems for people who receive the message off-list.
Base 32 is a good trade-off between compactness and readability.
The value of this header both serves to uniquely identify the message in a more regular format, and to serve as the final path component in the Archived-At (RFC 5064) header. So the following names come to mind:
List-Message-ID List-Archive-ID List-Archived-At-ID
suggestions welcome.
The last two are too easily confused with Archived-At.
Suggestions welcome. :)
Right. However, when this was discussed several years ago, the mail-archive.com guys did some extensive data analysis on their vast collection of email. You'd have to go spelunking in the -developers archives for details, but I recall that the collision rate was so small as to be effectively negligible,
Yes. The problem is that there are people out there with MUAs that provide bogus Message-IDs (Kyle Jones's VM used to do that), and for those people all messages after the first get dropped.
As you know, I have limited tolerance for broken MUAs. Gosh, do people still use VM? :)
Note that if the server does indeed ignore the possibility of collisions on Message-ID, then there is no need (AFAICS) for the "thin" IArchiver to communicate with the archiver proper.
Right. MM3 does not current reject messages with duplicate Message-IDs, but I think it should. I had a branch in flight that implemented this, but it caused some failures I wasn't able to resolve, and the branch bitrotted.
I don't see how it hurts to provide for the possibility of an archiver that does check content.
Right, we hash (pun intended :) all this out years ago. We can ignore collisions, and we can do the entire calculation on the server side, using Message-ID as the sole input. I think the only issue that's worth reopening is the name of the header.
Well, that's true for *us*. The folks at the IETF don't have a habit of leaving well enough alone, though. ;-)
Right, so let's do what *we* think is right, right now, and let the committee take 10 years to define a standard. ;)
Cheers, -Barry