New subject: Improving the archives

Oct. 30, 2007 · _content_

      Or Re: [Mailman-Developers 10417] Improving the archives
I would like to interject and highlight some use cases for stable

and predictable IDs. For us, "message IDs" are directly used both by

people and ignorant programs. Our mailing lists serve as a permanent

and concise record of our discussions, decisions, and operations, and

we find it invaluable to be able to refer to individual messages in a

simple and memorable way: "message 1210 in the calibration list", say.

Other people can then easily jot that info down or directly find the

message. Some message IDs even become shorthands for a particular

topic or decision. We have also added trac InterWiki templates

pointing into our mail archives (as listname:number), which encourages

desirable cross-referencing (PRs, wiki pages, and SVN change logs can

refer to mail messages, just as wiki pages could always refer to

changesets and PRs, etc, etc.)  But trac InterWiki templates can only

interpolate $1,$2,... arguments into strings, and could not possibly

calculate anything based on the _content_ of the messages.
Globally unique IDs, hashed IDs, etc., are very appealing from

various CS-y and techie points of view, but are simply not memorable

to humans or knowable by dumb external programs. I think as much, or

more, effort should be put into delivering a straightforwardly useable

naming scheme as goes into making an arbitrary message recoverable

from anywhere.  Basically, "friendly URLs" should be a primary

requirement, not an optional afterthought for careless geeks like me

to get wrong later....
We long ago added an extremely simple ID handoff between MM 2.1.8

and pipermail, and though imperfect it has served us well. Basically,

we hijacked the .post_id member in mailman (otherwise basically

unused, and mysteriously a floating point number); CookHeaders stuffed

it into a X-Mailman-Sequence-ID header line, and AfterDelivery

incremented it. In turn, pipermail uses the header to feed a sequence

ID into make_article, and the message is squirreled away as

$mailinglist/all/%d.html. There are a few other minor matters (e.g.

post_id was added to Decorators, a couple of templates were changed,

we lost having 'ls' sort chronologically [did we have to add .last

and .prev to the HyperDatabase classes?]), but it really was a minor

bit of work. And for stability, as long as the archive files aren't

lost, pipermail rebuilds should yield the same URLs even if junk

messages have been deleted. [Oh, we did also add a "never rotate"

policy to our archives, but that is finesseable. ]
As an aside on other discussions, can you get away without using

Message-ID or Date? I.e., aren't those just more of those tokens which

were standardized back before the Internet got tricky enough to

invalidate the standards? Mailing lists serialize incoming messages,

and so can generate their own unique and trustworthy IDs. "UUIDs"

would work, but if you can trust yourself to generate them,

consecutive integers provide minimal, order-preserving, perfect

hashing, too!
Anyhow, we have found that people will enthusiastically refer by

name to individual messages within mail archives if they can.

craig

Re: [Mailman-Developers] Improving the archives

Craig Loomis

Jeff Breidenbach

Stephen J. Turnbull

Jeff Breidenbach

Stephen J. Turnbull

tags

participants (3)