Re: [Mailman-Developers] Improving the archives

20 Jul 2007 · *any*

      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Jul 20, 2007, at 9:31 AM, Stephen J. Turnbull wrote:
...
Barry Warsaw writes:
...
Second, things can happen to a list
that might cause this sequence number to get corrupted.
Add an X-Mailman-Sequence-Number header if not already present.
That doesn't deal with your other comments, but as I point out
elsewhere, if you don't use *any* Mailman-specific information in the
global ID, you have no sane way to handle collisions except throw them
away (or make the global ID refer to a collection resource, but that's
kinda unintuitive).
I'd probably call it X-List-Sequence-Number and I'd have to ensure

that archive copy had that header in it.  OTOH, if I'm going to go to

the trouble of adding this sequence number, why not just calculate a

(more likely) gid for the message myself?  If I did that, I could use

a tinyurl scheme and get much shorter urls.  The archiver would then

be obliged to use my X-List-GID header verbatim.
I've been pushing for calculating this using non-Mailman headers

because I'd /like/ for a client receiving the non-list copy to be

able to make the same calculation.  OTOH, maybe we can have it both

ways.
So, we calculate the sequence number and generate the following headers:
X-List-Sequence-Number: 801
X-List-Message-GID: RXTJ357KFOTJP3NFJA6KMO65X7VQOHJI
The latter is composed of purely author generated data, the former is

supplied by Mailman.
Assuming we also had this header:
List-Archive: http://archive.example.com/gid/
then the following url would point to the same exact resource:
http://archive.example.com/gid/RXTJ357KFOTJP3NFJA6KMO65X7VQOHJI
http://archive.example.com/gid/RXTJ357KFOTJP3NFJA6KMO65X7VQOHJI/801
If however we subsequently got a collision, then these two urls would

address different resources.  E.g.:
X-List-Sequence-Number: 2112
X-List-Message-GID: RXTJ357KFOTJP3NFJA6KMO65X7VQOHJI
Now the two messages would still be addressable by their respective

urls:
http://archive.example.com/gid/RXTJ357KFOTJP3NFJA6KMO65X7VQOHJI/801
http://archive.example.com/gid/RXTJ357KFOTJP3NFJA6KMO65X7VQOHJI/2112
but
http://archive.example.com/gid/RXTJ357KFOTJP3NFJA6KMO65X7VQOHJI
would be a disambiguation page.  For a web u/i it would be an HTML

list containing relative links to '801' and '2112'.  A RESTful XML

document would contain the set of links to the subordinate pages.  A

client of the archive.example.com service would have to be prepared

to handle disambiguation pages if it used only the author generated

GID, but it would be guaranteed that the full url would lead directly

to one and only one email message.
Archives would have to recognize the X-List-Sequence-Number and honor

it whenever it regenerated its archives so that the urls would remain

stable.
Thinking about this more (and I've been up since about 3:30am so I'm

a little foggy right now ;), we may want to optimize for fewer dupes

rather than fewer collisions, or maybe it doesn't matter.  It would

be interesting to see how big the message-id buckets are when only

using the Message-ID header.

-Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
iQCVAwUBRqDBtHEjvBPtnXfVAQLOggQAhIjxlU2jPDb5K8Lfe3NThjgwKiPblqtm
UurUj+AZCffS1ewGDlV6y3GGRnHEzdVSIVvAiATEGTRVG8Zzbbev3GXs0EKYiEyL
FZreNcPqDAPL0KSGw73RdAiwZuszfQcMTsSwOx98zS9Kz0NtbntYQTuqQZwo7wAW
3KeGe2PkpaI=
=yhaZ
-----END PGP SIGNATURE-----