There are three different parties coming to the table. One is the mail transfer agent of the sender, another is the list server, and the third is the archive server. Ideally, all three will be happy campers.
So we just specify a header to put it in, and subscribers will be able to use it, per definition of a canonical URL.
It is the archive server's job to decide what is the "canonical" URL for a message. There's a good chance these archival URLs will be served by an HTTP redirect. So let's not use the word canonical. :)
What complexity? Mailman just does
msg['X-List-Archive-Received-ID'] = Email.msgid()
Easy to introduce, harder to deal with. The archival server would now keep track of both the message-id and the x-list-archive-received-id. That's two namespaces that almost do the same thing. It's easier for the archive server to keep track of one name space than two, and - most importantly - conceptually simpler.
From the perspective of the assorted list servers, it's easier to
do nothing than to do something. So if they can get by with just message-id (which is already implemented) not have to add x-list-archive-received-id, that's a smoother implementation path. If we base on message-id, archival servers will be able to retroactively add support for all their stored messages, even those that are ten years old. And users holding an old message will be able to figure out that URL without doing any computational gymnastics.
Put another way, there's the possibility to reduce the archive servers' implementation to "search for this mesage-id" which is something really useful to have anyway, and therefore likely to get wider support.
In addition, Barry was talking about concocting a unique identifier from the Date field and Message-ID. I'm not a big fan of this idea, because the date field comes from the mail user agent and is often wildly corrupt; e;g; coming from 100 years in the future. Very painful if the archive is showing most recent message first. Therefore an archival server is very likely to determine message date from the most recent received header (generally from a trusted mail transfer agent) rather than the date field. From the archive server's perspective, the best thing to do with the date field is throw it away.
So for these reasons, I'd rather stick with message-id and risk some real world collisions, instead of introduce another identifier. If the list server receives a message with no message-id, by all means create one on the spot. To me, this feels like the sweet spot in terms of cost benefit. The main thing that bugs me is message-ids are long, which makes them awkward to embed in a URL in the footer of a message.