[Mailman-Developers] Improving the archives

Barry Warsaw barry at python.org
Wed Jul 25 15:06:32 CEST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Jul 24, 2007, at 1:11 PM, Terri Oda wrote:

> On 24-Jul-07, at 12:31 PM, Jeff Breidenbach wrote:
>>> So we just specify a header to put it in, and subscribers will be
>>> able
>>> to use it, per definition of a canonical URL.
>> It is the archive server's job to decide what is the "canonical" URL
>> for a message. There's a good chance these archival URLs will be
>> served by an HTTP redirect. So let's not use the word canonical. :)
>
> Someone already pointed out that the message ID is a bit long for a
> URL, so I'm guessing we're going to want some sort of shorter
> sequence number for messages for linking purposes.

Yes, definitely.  What do you think of the base32 examples I have on  
the wiki page?

> Regardless of whether we *need* to generate our own unique ID, I'm
> leaning towards the thought that we're going to *want* to generate
> our own for usability reasons.  In a perfect world, i think we'd have
> a sequence number so I could visit http://example.com/mailman/
> archives/listname/204.html and know that 205.html would be the next
> message to that list, but any short unique id would do if sequence
> numbers are too much of a pain.
>
> It seems silly to generate nice short links but then use message-id.
> If we can generate nice short links, we might as well use 'em
> throughout, unless you really think the default use of the archive
> will be to search it by messageid (which I sincerely doubt, from my
> user experiences).

We'd want sequence numbers in the urls if we think people will hand  
edit them, say in a browser location bar.  I'm not sure that's a  
common enough use case.

Pipermail currently uses sequence numbers but there are big problems  
with that.  First, the mbox'ing algorithm wasn't always correct so  
while sequence numbers were accurate when generating the html  
archives on the fly, they broke horribly when you try to regenerate  
them from an mbox file.  It's also why we have tools like cleanarch  
which tries to unbreak earlier mboxing bugs by crufty heuristics.   
This /might/ be solved by ditching mboxes for maildir or some other  
canonical raw archiving format (not a bad idea in its own right), but  
manual surgery on the raw archives could still break it.  Sometimes  
site admins just /have/ to remove messages, disrupting the sequencing.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRqdK2XEjvBPtnXfVAQKfDQP/ToPZ3t7+uIyMrsThOr+PVQ7aKVT/BQ7F
OgKqFSDSma4ZofQOkPgr4ZFRT1yKRURWas7jI2zQ8ADPAOKCYh0Udgq6XjpOI8mI
7/pODazVkbwzT9Oo06pGwpzaONK4eZjt1y9IDb9VkniUcAyve5EQ+5+KaG3rbo4M
wsrCnHLkvSE=
=/z/f
-----END PGP SIGNATURE-----


More information about the Mailman-Developers mailing list