[Mailman-Developers] Improving the archives
Barry Warsaw
barry at python.org
Fri Jul 20 16:07:48 CEST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Jul 20, 2007, at 9:31 AM, Stephen J. Turnbull wrote:
> Barry Warsaw writes:
>
>> Second, things can happen to a list
>> that might cause this sequence number to get corrupted.
>
> Add an X-Mailman-Sequence-Number header if not already present.
>
> That doesn't deal with your other comments, but as I point out
> elsewhere, if you don't use *any* Mailman-specific information in the
> global ID, you have no sane way to handle collisions except throw them
> away (or make the global ID refer to a collection resource, but that's
> kinda unintuitive).
I'd probably call it X-List-Sequence-Number and I'd have to ensure
that archive copy had that header in it. OTOH, if I'm going to go to
the trouble of adding this sequence number, why not just calculate a
(more likely) gid for the message myself? If I did that, I could use
a tinyurl scheme and get much shorter urls. The archiver would then
be obliged to use my X-List-GID header verbatim.
I've been pushing for calculating this using non-Mailman headers
because I'd /like/ for a client receiving the non-list copy to be
able to make the same calculation. OTOH, maybe we can have it both
ways.
So, we calculate the sequence number and generate the following headers:
X-List-Sequence-Number: 801
X-List-Message-GID: RXTJ357KFOTJP3NFJA6KMO65X7VQOHJI
The latter is composed of purely author generated data, the former is
supplied by Mailman.
Assuming we also had this header:
List-Archive: http://archive.example.com/gid/
then the following url would point to the same exact resource:
http://archive.example.com/gid/RXTJ357KFOTJP3NFJA6KMO65X7VQOHJI
http://archive.example.com/gid/RXTJ357KFOTJP3NFJA6KMO65X7VQOHJI/801
If however we subsequently got a collision, then these two urls would
address different resources. E.g.:
X-List-Sequence-Number: 2112
X-List-Message-GID: RXTJ357KFOTJP3NFJA6KMO65X7VQOHJI
Now the two messages would still be addressable by their respective
urls:
http://archive.example.com/gid/RXTJ357KFOTJP3NFJA6KMO65X7VQOHJI/801
http://archive.example.com/gid/RXTJ357KFOTJP3NFJA6KMO65X7VQOHJI/2112
but
http://archive.example.com/gid/RXTJ357KFOTJP3NFJA6KMO65X7VQOHJI
would be a disambiguation page. For a web u/i it would be an HTML
list containing relative links to '801' and '2112'. A RESTful XML
document would contain the set of links to the subordinate pages. A
client of the archive.example.com service would have to be prepared
to handle disambiguation pages if it used only the author generated
GID, but it would be guaranteed that the full url would lead directly
to one and only one email message.
Archives would have to recognize the X-List-Sequence-Number and honor
it whenever it regenerated its archives so that the urls would remain
stable.
Thinking about this more (and I've been up since about 3:30am so I'm
a little foggy right now ;), we may want to optimize for fewer dupes
rather than fewer collisions, or maybe it doesn't matter. It would
be interesting to see how big the message-id buckets are when only
using the Message-ID header.
- -Barry
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
iQCVAwUBRqDBtHEjvBPtnXfVAQLOggQAhIjxlU2jPDb5K8Lfe3NThjgwKiPblqtm
UurUj+AZCffS1ewGDlV6y3GGRnHEzdVSIVvAiATEGTRVG8Zzbbev3GXs0EKYiEyL
FZreNcPqDAPL0KSGw73RdAiwZuszfQcMTsSwOx98zS9Kz0NtbntYQTuqQZwo7wAW
3KeGe2PkpaI=
=yhaZ
-----END PGP SIGNATURE-----
More information about the Mailman-Developers
mailing list