Barry Warsaw writes:
- archive links that won't break if the archive is rebuilt
Yes, this is absolutely critical, in fact, I'd put it right at the
top of the list, even more so than a u/i overhaul. Stable urls, with
backward compatible redirecting links if at all possible, would be
+1. I've been wanting to do something about this, and have made proposals (not back with code, mea maxima culpa) for design. I would definitely be happy to help with this, but given time constraints, it would be nice if somebody else could take the lead.
Along with that, I would really like to come up with an algorithm for
calculating those urls without talking to the archiver.
Brad didn't like this when I suggested it before, but I didn't really understand why not. Anyway, FWIW:
I suggest adding an X-List-Received-ID header to all messages. I haven't really thought through whether the UUID in that field should be at least partly human-readable or not, but that doesn't matter for the basic idea. The on-disk directory format would be
for singletons (Message-ID is the author-supplied ID) and
for multiples. These would be created on-the-fly when they occur. They can be served as static pages. For almost all messages, the bare URL
should Just Work (ie, return a no-such-object result or a single message). Where it does not, you get an index of all pages with that message ID.
The main drawback to using Message IDs that I can see is that broken MUAs may supply no Message-ID, or the same one repeatedly. In the former case, as a last resort Mailman can supply one, but that won't help people who get a personal copy and want to find the thread. However, I see no way to help them, anyway, beyond a generic archive search engine. In the latter, you get lots of messages matching the Message-ID, and while most lists should have *zero* problems, a list that has any instances of this problem would have many. Again I can't see a good way to deal with this other than a general search facility, as computing a digest of headers or content is hard to do reliably. Providing an index of matching posts seems like a reasonable approach, which can be efficiently implemented (eg, as static pages). Furthermore, the examples I've seen of both in the last few years have all been either spam or (in the case of duplicate Message-IDs) actual duplicates due to some mail system problem or itchy user fingers.
A minor drawback to my proposal is that if a message gets archived as a singleton for that Message-ID, then a duplicate arrives, previously created references in the archive will of course now return an index rather than the desired message. Ie, there is data corruption. This can be dealt with in several ways; the easiest would be to provide a "if-you-got-here-by-clicking-a-ref-from-this-archive-you're-looking-for-me" link when creating the directory for multiple instances.
There's also a *very* minor benefit: repeat sends will be immediately recognizable without checking Message-ID.
Footnotes:  By partly human-readable I mean containing list-id and date information. The idea would be to have the date come first, so that users would have a shot at identifying which of several messages is most likely, and this would be searchable by eye with simply an ordinary sorted index.