[Mailman-Developers] X-Message-ID-Hash header (was Re: Python 3)

Dec. 29, 2014 · *us*


      On Dec 29, 2014, at 10:13 AM, Stephen J. Turnbull wrote:
...
...
If we change the header name, I'd want to keep X-Message-ID-Hash
for the MM3 final release, but deprecate it.  I.e. MM3 would write
both headers.
I'll ask some of the IETF guys what they think about that.  But if you
put it in a public release, you're screwing the same kind of people
Tanstaafl was talking about.  Beta testers (and I mean beta testers,
ie, people who have put the code in production even though it's not
considered a public release) have signed up for this kind of
annoyance.  Random ancient Debian sysadmins haven't.
Of course we don't want to abuse our beta testers if we can avoid it,
but I think if we don't want to maintain dual headers indefinitely,
the public release is the time to get rid of the X- version.
I'd be willing to drop it if we can get agreement on the new header, and get
buy-in from at least HK (abompard) and the mail-archive.com folks.  AFAIK,
they are the only two "clients" of the header atm.  I'm not sure if the Jeffs
are still reading this list, so I've CC'd them directly.
Jeffs: we are considering changing the X-Message-Hash-ID header name, at least
dropping the X- prefix and possibly renaming the header.
...
...
As for what the List-* header would be, well, if you wanted to
include the algorithm name, to be completely accurate it would have
to be something like
List-Base32-Encoded-SHA1-Hash-Of-The-Message-ID.  Yuck ;)
We'd have to think somewhat carefully about how strong a hash we want to use
if we don't specify algorithm in the field name.  I'm not particularly
concerned with how many bytes the header takes up.  Future users can just
deal with the implied BASE32 vs. BASE85 or whatever.  However, if somebody
thinks they need a stronger hash than we chose, we'll have interoperability
problems for people who receive the message off-list.
Base 32 is a good trade-off between compactness and readability.
...
...
The value of this header both serves to uniquely identify the
message in a more regular format, and to serve as the final path
component in the Archived-At (RFC 5064) header.  So the following
names come to mind:
List-Message-ID
List-Archive-ID
List-Archived-At-ID
suggestions welcome.
The last two are too easily confused with Archived-At.
Suggestions welcome. :)
...
...
Right.  However, when this was discussed several years ago, the
mail-archive.com guys did some extensive data analysis on their
vast collection of email.  You'd have to go spelunking in the
-developers archives for details, but I recall that the collision
rate was so small as to be effectively negligible,
Yes.  The problem is that there are people out there with MUAs that
provide bogus Message-IDs (Kyle Jones's VM used to do that), and for
those people all messages after the first get dropped.
As you know, I have limited tolerance for broken MUAs.  Gosh, do people still
use VM? :)
...
Note that if the server does indeed ignore the possibility of
collisions on Message-ID, then there is no need (AFAICS) for the
"thin" IArchiver to communicate with the archiver proper.
Right.  MM3 does not current reject messages with duplicate Message-IDs, but I
think it should.  I had a branch in flight that implemented this, but it
caused some failures I wasn't able to resolve, and the branch bitrotted.
...
I don't see how it hurts to provide for the possibility of an archiver that
does check content.
...
Right, we hash (pun intended :) all this out years ago.  We can
ignore collisions, and we can do the entire calculation on the
server side, using Message-ID as the sole input.  I think the only
issue that's worth reopening is the name of the header.
Well, that's true for *us*.  The folks at the IETF don't have a habit
of leaving well enough alone, though. ;-)
Right, so let's do what *we* think is right, right now, and let the committee
take 10 years to define a standard. ;)
Cheers,
-Barry

[Mailman-Developers] X-Message-ID-Hash header (was Re: Python 3)

Barry Warsaw