Re: [Mailman-Developers] Python 3

Dec. 28, 2014


      On Dec 27, 2014, at 03:57 PM, Stephen J. Turnbull wrote:
...
By the way, I would say to adopt modern IETF practice here and drop
the "X-" (in practice collisions are rare while the annoyance of
fixing platforms to use the standardized name is frequent), and
include the algorithm in the name.  Eg, Message-ID-MD5 or
Hashed-Message-ID-MD5.  Or we could use the List-* namespace.
We should do this while we still can. :-)  If you want I can try to
write an RFC to make it official.
I like the idea of putting this information in a List-* header, and I'll take
you up on the RFC offer.  Are you thinking about trying to push this through
the IETF to make it official?
The spec currently lives on the wiki:
http://wiki.list.org/display/DEV/Stable+URLs
MM3 and HK should both be implementing this now, and I think mail-archive.com
does too.
If we change the header name, I'd want to keep X-Message-ID-Hash for the MM3
final release, but deprecate it.  I.e. MM3 would write both headers.
As for what the List-* header would be, well, if you wanted to include the
algorithm name, to be completely accurate it would have to be something like
List-Base32-Encoded-SHA1-Hash-Of-The-Message-ID.  Yuck ;)
The value of this header both serves to uniquely identify the message in a
more regular format, and to serve as the final path component in the
Archived-At (RFC 5064) header.  So the following names come to mind:
List-Message-ID
List-Archive-ID
List-Archived-At-ID
suggestions welcome.
...
The only reason I can think of is that you want to check that the permalink
isn't already occupied (that's the only thing HyperKitty proper knows that
can't be computed the same way in the IArchiver as in HyperKitty proper
AFAICS)
Right.  However, when this was discussed several years ago, the
mail-archive.com guys did some extensive data analysis on their vast
collection of email.  You'd have to go spelunking in the -developers archives
for details, but I recall that the collision rate was so small as to be
effectively negligible, even more so if you ignore spam.  And if the
X-Message-ID-Hash collides, then the Message-ID will collide, and it's likely
that any archiver would drop the message anyway.
...
My own preference is for a permalink that can be computed from the
originator header data (author, recipients, date, message ID, subject)
by anyone with access to the message, and that means you need the
archive server to be able to deal gracefully with collisions.  (In
practice message IDs are not perfect UUIDs, although they're very
close, and some messages don't have them or have different ones
assigned by mediating hosts at arrival at multiple recipients.)
Right, we hash (pun intended :) all this out years ago.  We can ignore
collisions, and we can do the entire calculation on the server side, using
Message-ID as the sole input.  I think the only issue that's worth reopening
is the name of the header.
Cheers,
-Barry

Re: [Mailman-Developers] Python 3

Barry Warsaw