[Mailman-Developers] Python 3

Sun Dec 28 23:51:08 CET 2014

On Dec 27, 2014, at 03:57 PM, Stephen J. Turnbull wrote:

>By the way, I would say to adopt modern IETF practice here and drop
>the "X-" (in practice collisions are rare while the annoyance of
>fixing platforms to use the standardized name is frequent), and
>include the algorithm in the name.  Eg, Message-ID-MD5 or
>Hashed-Message-ID-MD5.  Or we could use the List-* namespace.
>
>We should do this while we still can. :-)  If you want I can try to
>write an RFC to make it official.

I like the idea of putting this information in a List-* header, and I'll take
you up on the RFC offer.  Are you thinking about trying to push this through
the IETF to make it official?

The spec currently lives on the wiki:

http://wiki.list.org/display/DEV/Stable+URLs

MM3 and HK should both be implementing this now, and I think mail-archive.com
does too.

If we change the header name, I'd want to keep X-Message-ID-Hash for the MM3
final release, but deprecate it.  I.e. MM3 would write both headers.

As for what the List-* header would be, well, if you wanted to include the
algorithm name, to be completely accurate it would have to be something like
List-Base32-Encoded-SHA1-Hash-Of-The-Message-ID.  Yuck ;)

The value of this header both serves to uniquely identify the message in a
more regular format, and to serve as the final path component in the
Archived-At (RFC 5064) header.  So the following names come to mind:

List-Message-ID
List-Archive-ID
List-Archived-At-ID

suggestions welcome.

>The only reason I can think of is that you want to check that the permalink
>isn't already occupied (that's the only thing HyperKitty proper knows that
>can't be computed the same way in the IArchiver as in HyperKitty proper
>AFAICS)

Right.  However, when this was discussed several years ago, the
mail-archive.com guys did some extensive data analysis on their vast
collection of email.  You'd have to go spelunking in the -developers archives
for details, but I recall that the collision rate was so small as to be
effectively negligible, even more so if you ignore spam.  And if the
X-Message-ID-Hash collides, then the Message-ID will collide, and it's likely
that any archiver would drop the message anyway.

>My own preference is for a permalink that can be computed from the
>originator header data (author, recipients, date, message ID, subject)
>by anyone with access to the message, and that means you need the
>archive server to be able to deal gracefully with collisions.  (In
>practice message IDs are not perfect UUIDs, although they're very
>close, and some messages don't have them or have different ones
>assigned by mediating hosts at arrival at multiple recipients.)

Right, we hash (pun intended :) all this out years ago.  We can ignore
collisions, and we can do the entire calculation on the server side, using
Message-ID as the sole input.  I think the only issue that's worth reopening
is the name of the header.

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20141228/a3b9f2e7/attachment.sig>