[John A. Martin]
"Harald" == Harald Meland
Harald> It is not clear to me that Mailman *is* an MTA. It is not Harald> an SMTP server, and is not (necessarily) an SMTP client.
To have been precise perhaps I should have said something like "a mail agent must not muck with an existing Message-Id except as specified by the applicable standards". The Applicable Standards, to quote for example rfc2822,, apply as follows:
This standard specifies a syntax for text messages that are sent between computer users, within the framework of "electronic mail" messages.
I agree that it is obvious that Mailman should strive to avoid sending non-RFC2822-compliant messages.
However, I would think that the issue at hand is not about message *syntax*, but rather about the *semantic* value of a message's Message-Id.
Now that that nit is off my chest :-), I'll be quick to agree that RFC 2822 surely do contain a fair bit of semantic specifications as well; more on that below.
The applicable standards govern what goes on the wire and therefore what Mailman causes to be put on the wire through a MTA should be compliant.
Mailman is sort of between a rock and a hard place here, as it occupies a double role:
Mailman should be liberal in what it accepts -- which seems to imply that it should accept incoming messages even if they do not not conform strictly to all aspects of RFC 2822.
As one example, Mailman shouldn't offhandedly reject an incoming message just because there is a slight address syntax error in the message's From: header.
At the same time, Mailman should be conservative in what it sends. Naively, this would mean that Mailman ought to ensure that any message it puts on the wire conforms with RFC 2822; however, that would then have to either clash with the "liberal in what you expect" idea, or with the "don't change the message" maxim.
Harald> However, even if Mailman isn't an MTA, it would be nice if Harald> it *mostly* tries to follow the MTA rules. Harald> (As a side note, I am unable to find *clear* references to Harald> the effect of your statement in RFCs 2821 or 2822.)
Rfc2822 Section 3.6.4 (the first paragraph below is the same paragraph you quoted elsewhere)
[[ ... ]]
The "Message-ID:" field provides a unique message identifier that refers to a particular version of a particular message. The uniqueness of the message identifier is guaranteed by the host that generates it (see below). This message identifier is intended to be machine readable and not necessarily meaningful to humans. A message identifier pertains to exactly one instantiation of a particular message; subsequent revisions to the message each receive new message identifiers.
Note: There are many instances when messages are "changed", but those changes do not constitute a new instantiation of that message, and therefore the message would not get a new message identifier. For example, when messages are introduced into the transport system, they are often prepended with additional header fields such as trace fields (described in section 3.6.7) and resent fields (described in section 3.6.6). The addition of such header fields does not change the identity of the message and therefore the original "Message-ID:" field is retained. In all cases, it is the meaning that the sender of the message wishes to convey (i.e., whether this is the same message or a different message) that determines whether or not the "Message-ID:" field changes, not any particular syntactic difference that appears (or does not appear) in the message.
Rfc822 Section 4.6.1 (in its entirety):
This field contains a unique identifier (the local-part address unit) which refers to THIS version of THIS message. The uniqueness of the message identifier is guaranteed by the host which generates it. This identifier is intended to be machine readable and not necessarily meaningful to humans. A message identifier pertains to exactly one instantiation of a particular message; subsequent revisions to the message should each receive new message identifiers.
Rfc2822 in this case merely codifies long established practice interpreting rfc822. Rfc2822 Appendix A.3 may be helpful for the present discussion.
The part that (still) isn't clear to me, is whether Mailman's action of putting the message back on the wire can be said to be either 1) generation of a new message (personally, I wouldn't think so) or 2) a new instantiation of the message.
To test for compliance with the rfc2822 determination "whether this is the same message or a different message" one might stipulate that if the PGP signature verifies it is the same message, if the PGP signature does not verify it is a different message.
Now we're deeply into message semantics. :-)
I'd like to point out to things about your argument:
Firstly, the RFC does not merely distinguish between "the same message or a different message"; it also allows Message-ID: to be changed whenever there is a new instantiation of a (single) message.
Secondly, having to resort to (your) *interpretation* of the RFC, by using verification of PGP signatures for the test, is in my book a clear indication that the RFC is *not* crystal clear on this issue.
(One certainly can see by inspection what would break a signature without actually verifying the signature, right?)
That is my (rather shallow, I'm afraid) understanding of PGP email signatures, yes.
Harald> Um. Mailman lists have numerous configuration options for Harald> changing messages (e.g. adding footers) before they are Harald> sent to the list members, and it has had such options Harald> since time immemorial.
Who reads the RFCs to say that footers cannot be added without changing the message?
The more interesting issue, I think, is where should the line be drawn; how much is Mailman allowed to change (various parts of) a message before it should be considered a new message?
And, how does the Mailman modus operandi fit in with the RFCs "new instantiation" use of words?
Harald> * To my mind it would not be obviously wrong to view Harald> Mailman as the *generator* of messages, at the very Harald> least in the cases where it is obvious that the Harald> previous generator didn't do its job of guaranteeing Harald> message-id uniqueness properly.
Why?
Given that there exists two (or more) distinct messages that share the same message-id, the uniqueness of this identifier (as proscribed by RFC 2822) is clearly not satisfied. Hence, if Mailman really wants to have the messages it puts on the wire conform with RFC 2822, it should take on the role of message generator, and issue distinct message-ids for such distinct messages.
The hard problem, of course, is to properly discover whether or not two messages are indeed distinct; they might differ slightly by e.g. an automatically added footer, or in some other minor, but programmatically hard to discover, fashion.
ISTM the problem you are trying to solve is how to identify the archive image of the message.
Why not construct a URL containing a scrubbed Message-Id (as Brad Knowles has indicated) and a serial number (as I have indicated)?
Because, as Barry said, that would mean the "archive image identity" of the messages could change whenever the archive needs to be rebuilt (e.g. after a disk crash, the archives are gone, and there are no backups; then some kind list member comes forward with a partial archive constructed from the messages they've received from the list).
Such a URL should go into the "List-Archive" header field pointing to the specific message without doing violence to rfc2369 Section 3.6, right?
I don't think that's too far from the intention of that header, no. That section seems rather loosely worded, something I hope was done intentionally:
3.6. List-Archive
The List-Archive field describes how to access archives for the list.
Examples:
List-Archive: <mailto:archive@host.com?subject=index%20list>
List-Archive: <ftp://ftp.host.com/pub/list/archive/>
List-Archive: <http://www.host.com/list/archive/> (Web Archive)
-- Harald