Re: [Mailman-Developers] Improving the archives

Or Re: [Mailman-Developers 10417] Improving the archives
I would like to interject and highlight some use cases for stable
and predictable IDs. For us, "message IDs" are directly used both by
people and ignorant programs. Our mailing lists serve as a permanent
and concise record of our discussions, decisions, and operations, and
we find it invaluable to be able to refer to individual messages in a
simple and memorable way: "message 1210 in the calibration list", say.
Other people can then easily jot that info down or directly find the
message. Some message IDs even become shorthands for a particular
topic or decision. We have also added trac InterWiki templates
pointing into our mail archives (as listname:number), which encourages
desirable cross-referencing (PRs, wiki pages, and SVN change logs can
refer to mail messages, just as wiki pages could always refer to
changesets and PRs, etc, etc.) But trac InterWiki templates can only
interpolate $1,$2,... arguments into strings, and could not possibly
calculate anything based on the _content_ of the messages.
Globally unique IDs, hashed IDs, etc., are very appealing from
various CS-y and techie points of view, but are simply not memorable
to humans or knowable by dumb external programs. I think as much, or
more, effort should be put into delivering a straightforwardly useable
naming scheme as goes into making an arbitrary message recoverable
from anywhere. Basically, "friendly URLs" should be a primary
requirement, not an optional afterthought for careless geeks like me
to get wrong later....
We long ago added an extremely simple ID handoff between MM 2.1.8
and pipermail, and though imperfect it has served us well. Basically,
we hijacked the .post_id member in mailman (otherwise basically
unused, and mysteriously a floating point number); CookHeaders stuffed
it into a X-Mailman-Sequence-ID header line, and AfterDelivery
incremented it. In turn, pipermail uses the header to feed a sequence
ID into make_article, and the message is squirreled away as
$mailinglist/all/%d.html. There are a few other minor matters (e.g.
post_id was added to Decorators, a couple of templates were changed,
we lost having 'ls' sort chronologically [did we have to add .last
and .prev to the HyperDatabase classes?]), but it really was a minor
bit of work. And for stability, as long as the archive files aren't
lost, pipermail rebuilds should yield the same URLs even if junk
messages have been deleted. [Oh, we did also add a "never rotate"
policy to our archives, but that is finesseable. ]
As an aside on other discussions, can you get away without using
Message-ID or Date? I.e., aren't those just more of those tokens which
were standardized back before the Internet got tricky enough to
invalidate the standards? Mailing lists serialize incoming messages,
and so can generate their own unique and trustworthy IDs. "UUIDs"
would work, but if you can trust yourself to generate them,
consecutive integers provide minimal, order-preserving, perfect
hashing, too!
Anyhow, we have found that people will enthusiastically refer by
name to individual messages within mail archives if they can.
- craig

Craig Loomis writes:
Friendly URLs *are* a primary requirement. The point is that to make them *reliable* as well, either a globally unique ID is needed, or individual site admins must suffer through hard-to-document constraints on what they can do with their archives. Note that the system you describe based on the post_id member demonstrates the value of a unique ID.
"Sufficient reliability" is not a tough requirement for an individual admin to achieve, as you have demonstrated. It's much more exacting for the Mailman developers, who need to satisfy both sites with different needs *and* archivers with different features.
As an aside on other discussions, can you get away without using
Message-ID or Date?
No. Not all recipients of the messages get them through the list. Once again, Mailman developers have to consider that situation, while in your situation you may not need to worry about it.

Craig Loomis writes:
Friendly URLs *are* a primary requirement. The point is that to make them *reliable* as well, either a globally unique ID is needed, or individual site admins must suffer through hard-to-document constraints on what they can do with their archives. Note that the system you describe based on the post_id member demonstrates the value of a unique ID.
"Sufficient reliability" is not a tough requirement for an individual admin to achieve, as you have demonstrated. It's much more exacting for the Mailman developers, who need to satisfy both sites with different needs *and* archivers with different features.
As an aside on other discussions, can you get away without using
Message-ID or Date?
No. Not all recipients of the messages get them through the list. Once again, Mailman developers have to consider that situation, while in your situation you may not need to worry about it.
participants (3)
-
Craig Loomis
-
Jeff Breidenbach
-
Stephen J. Turnbull