-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Jul 24, 2007, at 1:11 PM, Terri Oda wrote:
On 24-Jul-07, at 12:31 PM, Jeff Breidenbach wrote:
So we just specify a header to put it in, and subscribers will be able to use it, per definition of a canonical URL.
It is the archive server's job to decide what is the "canonical" URL for a message. There's a good chance these archival URLs will be served by an HTTP redirect. So let's not use the word canonical. :)
Someone already pointed out that the message ID is a bit long for a URL, so I'm guessing we're going to want some sort of shorter sequence number for messages for linking purposes.
Yes, definitely. What do you think of the base32 examples I have on
the wiki page?
Regardless of whether we *need* to generate our own unique ID, I'm leaning towards the thought that we're going to *want* to generate our own for usability reasons. In a perfect world, i think we'd have a sequence number so I could visit http://example.com/mailman/ archives/listname/204.html and know that 205.html would be the next message to that list, but any short unique id would do if sequence numbers are too much of a pain.
It seems silly to generate nice short links but then use message-id. If we can generate nice short links, we might as well use 'em throughout, unless you really think the default use of the archive will be to search it by messageid (which I sincerely doubt, from my user experiences).
We'd want sequence numbers in the urls if we think people will hand
edit them, say in a browser location bar. I'm not sure that's a
common enough use case.
Pipermail currently uses sequence numbers but there are big problems
with that. First, the mbox'ing algorithm wasn't always correct so
while sequence numbers were accurate when generating the html
archives on the fly, they broke horribly when you try to regenerate
them from an mbox file. It's also why we have tools like cleanarch
which tries to unbreak earlier mboxing bugs by crufty heuristics.
This /might/ be solved by ditching mboxes for maildir or some other
canonical raw archiving format (not a bad idea in its own right), but
manual surgery on the raw archives could still break it. Sometimes
site admins just /have/ to remove messages, disrupting the sequencing.