"@" in mail **text** gets replaced in archives
"Roger" == Roger Bivand <Roger.Bivand@nhh.no> on Fri, 26 Sep 2003 14:26:58 +0200 (CEST) writes:
.....
Roger> The second thing, ...: my answer on r-help to a pixmap question
Roger> was:
Roger> ...
Roger> xx@col[1:20]
Roger> ...
Roger> which in the archives is rendered:
Roger> ...
Roger> xx at col[1:20]
Roger> ...
aaarg!
The mailman-builtin archiving engine (pipermail) is really not that great....
At the moment I have to stay with it. I can only switch back to keep e-mail addresses unaltered. which caters to the spammers address-collection robots...
Of course that's a bug in mailman/pipermail ===> forwarding to the developers.
Martin Maechler <maechler@stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 <><
On Fri, 2003-09-26 at 12:15, Martin Maechler wrote:
The mailman-builtin archiving engine (pipermail) is really not that great....
It's better than nothing, but no one has ever made great claims about it. Unfortunately, no one has stepped up to improve it either although there have been several fits and starts. I continue to await volunteers!
-Barry
On Fri, 26 Sep 2003, Martin Maechler wrote:
"Roger" == Roger Bivand <Roger.Bivand@nhh.no> on Fri, 26 Sep 2003 14:26:58 +0200 (CEST) writes:
..... Roger> The second thing, ...: my answer on r-help to a pixmap question Roger> was: Roger> ... Roger> xx@col[1:20] Roger> ... Roger> which in the archives is rendered: Roger> ... Roger> xx at col[1:20] Roger> ...
aaarg!
The mailman-builtin archiving engine (pipermail) is really not that great....
At the moment I have to stay with it. I can only switch back to keep e-mail addresses unaltered. which caters to the spammers address-collection robots...
Of course that's a bug in mailman/pipermail ===> forwarding to the developers.
I could see that they would try to convert all "@", but actually the hits should be "@" followed by a dot-separated address, at least one dot, shouldn't they? Regexp? Python can do that, I'm sure. Turning off protection would be a last resort, really.
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Breiviksveien 40, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93 e-mail: Roger.Bivand@nhh.no
--- Barry Warsaw <barry@python.org> wrote:
On Fri, 2003-09-26 at 12:15, Martin Maechler wrote:
The mailman-builtin archiving engine (pipermail) is really not that great....
It's better than nothing, but no one has ever made great claims about it. Unfortunately, no one has stepped up to improve it either although there have been several fits and starts. I continue to await volunteers!
Dump pipermail completely and include (or point to) MHonARC, it does a great job and is easily integrated (UTF-8 support and all).
- Nadim
Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com
On Fri, 2003-09-26 at 13:22, Nadim Shaikli wrote:
Dump pipermail completely and include (or point to) MHonARC, it does a great job and is easily integrated (UTF-8 support and all).
In the tradition of Python's "batteries included" philosophy, I still want to bundle Pipermail. But I'm happy to make it more obvious, or easier, or whatever, to point people at MHonArc. I'm leery of including MHonArc in our distribution.
-Barry
On Fri, Sep 26, 2003 at 01:34:04PM -0400, Barry Warsaw wrote:
want to bundle Pipermail. But I'm happy to make it more obvious, or easier, or whatever, to point people at MHonArc. I'm leery of including
A simple step toward this for Mailman 2.2 would be to change the URL layout for the archives from host/pipermail/ to host/archives/ or host/mm-archives/ .
--amk
On Fri, 2003-09-26 at 16:44, amk@amk.ca wrote:
On Fri, Sep 26, 2003 at 01:34:04PM -0400, Barry Warsaw wrote:
want to bundle Pipermail. But I'm happy to make it more obvious, or easier, or whatever, to point people at MHonArc. I'm leery of including
A simple step toward this for Mailman 2.2 would be to change the URL layout for the archives from host/pipermail/ to host/archives/ or host/mm-archives/ .
Added to the wiki! -Barry
Dump pipermail completely and include (or point to) MHonARC, it does a great job and is easily integrated (UTF-8 support and all).
I agree. Earl Hood has been doing a marvelous job with MHonarc for years now. The guy is meticulous, keeps his software current and well documented, and understands mail issues as well as anyone. I understand Barry's hesitation to integrate a non-Python product but there is no denying that MHonarc is rock solid and it would be a shame to not look at it more closely as a replacement for pipermail. Two thumbs way up for MHonarc.
- k
On Fri, 2003-09-26 at 20:22, Kevin McCann wrote:
I agree. Earl Hood has been doing a marvelous job with MHonarc for years now. The guy is meticulous, keeps his software current and well documented, and understands mail issues as well as anyone. I understand Barry's hesitation to integrate a non-Python product but there is no denying that MHonarc is rock solid and it would be a shame to not look at it more closely as a replacement for pipermail. Two thumbs way up for MHonarc.
At least it's GPL'd so it would be /possible/.
-Barry
Hi,
Barry Warsaw wrote:
On Fri, 2003-09-26 at 20:22, Kevin McCann wrote:
I agree. Earl Hood has been doing a marvelous job with MHonarc for years now. The guy is meticulous, keeps his software current and well documented, and understands mail issues as well as anyone. I understand Barry's hesitation to integrate a non-Python product but there is no denying that MHonarc is rock solid and it would be a shame to not look at it more closely as a replacement for pipermail. Two thumbs way up for MHonarc.
At least it's GPL'd so it would be /possible/.
What is the status of i18n of MHonarc ?
At least, MHonArc-ed japanese mail archives doesn't impress me much.
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/
Tokio Kikuchi <tkikuchi@is.kochi-u.ac.jp> writes:
What is the status of i18n of MHonarc ?
At least, MHonArc-ed japanese mail archives doesn't impress me much.
Seconded. I once wrote a patch to MHonarc to support UTF-8 on all pages, and it was eventually incorporated, but MHonarc's multiple-encoding techniques are much behind pipermail's.
Regards, Martin
At 8:22 PM -0400 2003/09/26, Kevin McCann wrote:
Dump pipermail completely and include (or point to) MHonARC, it does a great job and is easily integrated (UTF-8 support and all).
I agree. Earl Hood has been doing a marvelous job with MHonarc for years now. The guy is meticulous, keeps his software current and well documented, and understands mail issues as well as anyone.
This is seriously weird. Back before pipermail was integrated
into mailman, my understanding is that the then-current version was the best available MLM archive tool, heads and shoulders above everything else -- especially mhonarc. Moreover, my understanding was that pipermail had only improved significantly beyond where it was at the time of initial integration.
Has mhonarc improved so much in the meantime that it has now
overtaken the current version of pipermail?
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
This is seriously weird. Back before pipermail was integrated into mailman, my understanding is that the then-current version was the best available MLM archive tool, heads and shoulders above everything else -- especially mhonarc. Moreover, my understanding was that pipermail had only improved significantly beyond where it was at the time of initial integration.
Has mhonarc improved so much in the meantime that it has now overtaken the current version of pipermail?
Hi Brad,
Try it for yourself. Maybe your needs are different. But if I want an archiver that will process huge mbox file from other systems - without crapping out, will decode encoded binary attachments, have flexible, customizable output, etc., I can't think of anything else I would rather use. If MHonarc is so bad, why do so many people go the extra lengths currently required in order to use it instead of pipermail?
As I say, it works for me. And very well. Maybe worth a second look? If you try it again let me know what you find.
- Kevin
On Sat, Sep 27, 2003 at 01:59:20PM +0200, Brad Knowles wrote:
Has mhonarc improved so much in the meantime that it has now overtaken the current version of pipermail?
Pipermail hasn't been developed for a while, though. When I originally started writing it, the leading archiver was Hypermail, and it hadn't been maintained in a while, so Pipermail could easily beat it just by being in a scripting language instead of C.
--amk
On Sat, 2003-09-27 at 03:18, Martin v. Löwis wrote:
At least, MHonArc-ed japanese mail archives doesn't impress me much.
Seconded. I once wrote a patch to MHonarc to support UTF-8 on all pages, and it was eventually incorporated, but MHonarc's multiple-encoding techniques are much behind pipermail's.
That's great to know, and would definitely influence any final decision. Also, I just spewed the mailman-developers archive through mhonarc 2.6.8. One thing I noticed off the bat is that message files are given sequential numbers just like pipermail. That isn't what I want!
-Barry
On Sat, 2003-09-27 at 09:59, Kevin McCann wrote:
Try it for yourself. Maybe your needs are different. But if I want an archiver that will process huge mbox file from other systems - without crapping out, will decode encoded binary attachments, have flexible, customizable output, etc., I can't think of anything else I would rather use. If MHonarc is so bad, why do so many people go the extra lengths currently required in order to use it instead of pipermail?
As I say, it works for me. And very well. Maybe worth a second look? If you try it again let me know what you find.
Pipermail definitely has some distinct advantages, so I'm glad we include it. It has some problems too, most notably its memory consumption, and I think inefficient on-disk storage. The most often cited problem is that it cannot handle regenerating entire very-large archives because of memory usage.
I really really want to use something like message-ids to generate message file names. I want to be able to generate links to archived messages in the footers, but I think the best way to do that is to agree on a reproducible, independent algorithm for calculating them. Another approach would be to put even the public archives behind a cgi and have that implement a mapping between message-id derived links and the sequential file names (although that won't fix the regen problem).
It would also be nice if we could more easily customize its look and feel. Anything more and we're getting into arch-ng and zest territories.
-Barry
On Sun, 2003-09-28 at 08:30, Barry Warsaw wrote:
Also, I just spewed the mailman-developers archive through mhonarc 2.6.8. One thing I noticed off the bat is that message files are given sequential numbers just like pipermail. That isn't what I want!
MHonArc has a lengthy configuration file, not to mention the possibility of patches. What's exactly the result you would like to see?
-- Alessio Bragadini <alessio@albourne.com> APL Financial Services (Overseas) Ltd
[Barry Warsaw]
I really really want to use something like message-ids to generate message file names. I want to be able to generate links to archived messages in the footers, but I think the best way to do that is to agree on a reproducible, independent algorithm for calculating them.
What input parameters should such an algorithm have? I take it that you want to use just
- list name and
- message-id
and I think that would mostly work. However, the "mostly" part means that Mailman would need to implement some policy for dealing with posts that use an already-archived message-id. Some possible policies would be:
On message arrival, Mailman checks whether the archives already contains that message-id. If it does, the message is rejected, so the sender gets a chance to re-post the message with a fresh message-id.
Whenever Mailman receives a message whose message-id is already present in the archives, the original Message-Id: header is renamed to e.g. X-Original-Message-Id:, and Mailman generates a fresh (as in "not yet present in the archives") message-id before the message is either archived or sent to the list members.
Messages with duplicate message-ids are posted to the members as-is, but won't be archived.
-- Harald
On Sun, 2003-09-28 at 05:13, Harald Meland wrote:
[Barry Warsaw]
I really really want to use something like message-ids to generate message file names. I want to be able to generate links to archived messages in the footers, but I think the best way to do that is to agree on a reproducible, independent algorithm for calculating them.
What input parameters should such an algorithm have? I take it that you want to use just
- list name and
- message-id
Exactly.
- Whenever Mailman receives a message whose message-id is already present in the archives, the original Message-Id: header is renamed to e.g. X-Original-Message-Id:, and Mailman generates a fresh (as in "not yet present in the archives") message-id before the message is either archived or sent to the list members.
This is what I was thinking about. Alternatively we could rewrite all message-id headers when we accept the message. That would guarantee uniqueness but it would break the correlation of message-ids between list copies and direct copies. Is that bad? (note that we already do this for NNTP posted messages, and there has been some off-list discussion about that).
-Barry
On Sun, 2003-09-28 at 02:06, Alessio Bragadini wrote:
MHonArc has a lengthy configuration file, not to mention the possibility of patches. What's exactly the result you would like to see?
See my other followup. -Barry
At 1:37 AM -0400 2003/09/28, Barry Warsaw wrote:
I really really want to use something like message-ids to generate message file names.
IIRC, Earl talks about this in the FAQ. In short, for security
reasons, you can't trust any of the information you are given anywhere in the message, unless you can scrub that information and guarantee that it is now safe. Otherwise, you could get a message-id like "<../.htaccess>" or some other equally nasty thing that could potentially cause other files to be over-written inappropriately.
Moreover, given that there are a lot of people out there with
home networks using RFC 1918 private addressing, and this information is being used to help generate otherwise properly formatted message-ids, the probability of message-id collision increases significantly. This issue was recently brought to my attention because of my own RFC 1918 private networking here at home, and the information my MUA uses to generate message-ids.
Therefore, I think we might want to be a bit more careful in how
we generate the file names.
I want to be able to generate links to archived
messages in the footers, but I think the best way to do that is to agree on a reproducible, independent algorithm for calculating them.
One thing that MHonArc does for messages that are not assigned a
message-id (to help detect and eliminate duplicates) is to calculate an MD5 hash of the message headers and uses that as a substitute. We could do the same, or perhaps even use the MD5 hash instead of the message-id, and then store hash/message-id mappings in a database.
Another
approach would be to put even the public archives behind a cgi and have that implement a mapping between message-id derived links and the sequential file names (although that won't fix the regen problem).
One problem that most OSes have is with too many files in a
single directory -- go much over 1000 files in a directory and accessing anything in that directory starts taking significantly longer than it used to. If you use a sequential message numbering system, it's hard to break those up into smaller chunks of messages in a hashed directory scheme. With MD5 hashes, it would be a lot easier to convert the hash into a path name, just by adding slashes every so often in the hash value.
-- Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
"baw" == Barry Warsaw "Re: [Mailman-Developers] "@" in mail **text** gets replaced inarchives" Sun, 28 Sep 2003 09:45:32 -0400
baw> On Sun, 2003-09-28 at 05:13, Harald Meland wrote:
>> [Barry Warsaw]
>>
>> > I really really want to use something like message-ids to
>> > generate message file names. I want to be able to generate
>> > links to archived messages in the footers,
In the header, please. All messages have headers, not all have footers. Footers are optional, no? This will go on the wire, right?
Also, it would be nice if the URL/link/filename, or some other token, were serialized in some way so that a subscriber could tell easily (like by looking at the fill headers, whether he had actually received all list mail and which he was missing if any. Smartlist, IIRC, did that nicely in the header.
baw> we could rewrite all message-id headers when we accept the
baw> message. That would guarantee uniqueness but it would break
baw> the correlation of message-ids between list copies and direct
baw> copies. Is that bad?
Yes. The RFCs are clear that MTAs must not muck with Message-Id other that to create one if there is none.
What happened to the notion that archives are supposed to represent a true record of what was "on the wire"? Why should mailing list managers take liberties that are inappropriate to MTAs other than adding to, but not altering nor deleting anything either in the header or the body. The standards define, although somewhat loosely, how "Resent-" header fields are to be employed but "X-" header fields are not even mentioned in rfc2822.
There have been several suggestions recently that would break the notion that list archives be a true record of what was on the wire. For public mailing lists in particular much would be lost if the lists own archives could be shown to differ from archives taken off the wire by others. We do not rewrite history and we must not be seen to be rewriting history by altering even esoteric details. I know of several instances where the availability of accurate public archives that were easily corroborated by other archives have simplified handling issues that might otherwise been very hard to deal with. A simple public example is the email in a mailing list archive that for several years was cited by URL in an early infamous Nigerian spam/scam. The ability to point to that email in the context of a long established mailing list concerning civic affairs in Nigeria deflected many would be actions against the organization running that list and those who hosted the mailing list service.
Mailman presumably serves many uses besides public mailing lists. Many lists may not need to maintain their archives as a public record. However, unless you wish to make Mailman unsuitable for those that need accurate archives I suggest that you make it possible for the site to option individual lists in a way that permits the archives to serve as a public record of what was on the wire. If you must munge the archives I suggest you arrange options such that the site administrator can coerce certain lists to have accurate archives while leaving the option open for other lists.
I also urge that the archive not break signatures on signed mail.
baw> (note that we already do this for NNTP posted messages, and
baw> there has been some off-list discussion about that).
I am not as conversant with either the standards nor the practice for NNTP but IIRC news archives have always been supposed to represent what was on the wire, no?
jam
[Barry Warsaw]
On Sun, 2003-09-28 at 05:13, Harald Meland wrote:
- Whenever Mailman receives a message whose message-id is already present in the archives, the original Message-Id: header is renamed to e.g. X-Original-Message-Id:, and Mailman generates a fresh (as in "not yet present in the archives") message-id before the message is either archived or sent to the list members.
This is what I was thinking about. Alternatively we could rewrite all message-id headers when we accept the message. That would guarantee uniqueness but it would break the correlation of message-ids between list copies and direct copies. Is that bad?
I don't think the RFCs speak clearly of this either way; however, it would break things for people who use message-id-based techniques to correlate received duplicates.
On the other hand, such message-id-based techniques are IMHO workarounds for the ((still) very common) problem of people/programs not respecting the various (more-or-less standard) headers for directing where replies should go.
The less aggressive approach would surely be to only generate new Message-Id:s for messages that already exist in a list's archive.
(note that we already do this for NNTP posted messages, and there has been some off-list discussion about that).
The Message-Id: field is (very) much more significant in NNTP than it is in SMTP.
Harald
FWIW: We have Mailman archive recent email and then use MHonArc to build our 'permanent' archives. I'm happy as long as I can get a copy of the original email. I may put up an edited version of some messages but keep the original - just in case.
Gary
John A. Martin wrote: trim
I am not as conversant with either the standards nor the practice for NNTP but IIRC news archives have always been supposed to represent what was on the wire, no?
jam
I agree.
Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers
[John A. Martin]
"baw" == Barry Warsaw "Re: [Mailman-Developers] "@" in mail **text** gets replaced inarchives" Sun, 28 Sep 2003 09:45:32 -0400
baw> On Sun, 2003-09-28 at 05:13, Harald Meland wrote: >> [Barry Warsaw] >> >> > I really really want to use something like message-ids to >> > generate message file names. I want to be able to generate >> > links to archived messages in the footers,
In the header, please. All messages have headers, not all have footers. Footers are optional, no? This will go on the wire, right?
"Be able to" is not (necessarily) the same as "will always". I read Barry as saying "it should be _possible_ to configure Mailman to add the archive URL of the message to the (Mailman-added) footer of the message".
That being said, it would be nice if a standardized header could be used for this. However, to the best of my knowledge, presently there is no standard specifying an header for such a purpose.
baw> we could rewrite all message-id headers when we accept the baw> message. That would guarantee uniqueness but it would break baw> the correlation of message-ids between list copies and direct baw> copies. Is that bad?
Yes. The RFCs are clear that MTAs must not muck with Message-Id other that to create one if there is none.
It is not clear to me that Mailman *is* an MTA. It is not an SMTP server, and is not (necessarily) an SMTP client.
However, even if Mailman isn't an MTA, it would be nice if it *mostly* tries to follow the MTA rules.
(As a side note, I am unable to find *clear* references to the effect of your statement in RFCs 2821 or 2822.)
What happened to the notion that archives are supposed to represent a true record of what was "on the wire"?
Um. Mailman lists have numerous configuration options for changing messages (e.g. adding footers) before they are sent to the list members, and it has had such options since time immemorial. As such, Mailman has to choose; should the messages be archived as they appeared on the wire _coming in_ or _going out_? To me, it is obvious that messages in the archives should reflect, as closely as possible, the messages members receive.
As to the particular issue of changing a message's message-id iff the incoming message-id is already present in the list's archives:
According to RFC 2822, section 3.6.4, it is up to the host that "generates" a message to assure that the message's message-id is unique:
The "Message-ID:" field provides a unique message identifier that refers to a particular version of a particular message. The uniqueness of the message identifier is guaranteed by the host that generates it (see below). This message identifier is intended to be machine readable and not necessarily meaningful to humans. A message identifier pertains to exactly one instantiation of a particular message; subsequent revisions to the message each receive new message identifiers.
To my mind it would not be obviously wrong to view Mailman as the *generator* of messages, at the very least in the cases where it is obvious that the previous generator didn't do its job of guaranteeing message-id uniqueness properly.
I also urge that the archive not break signatures on signed mail.
I certainly agree that this would be very nice, both for mail obtained from archives and mail received "live" through a Mailman list.
Harald
"Harald" == Harald Meland "Re: [Mailman-Developers] "@" in mail **text** gets replaced inarchives" Sun, 28 Sep 2003 22:09:09 +0200
>>>>>>> "baw" == Barry Warsaw "Re: [Mailman-Developers] "@" in
>>>>>>> mail **text** gets replaced inarchives" Sun, 28 Sep 2003
>>>>>>> 09:45:32 -0400
baw> we could rewrite all message-id headers when we accept the
baw> message. That would guarantee uniqueness but it would break
baw> the correlation of message-ids between list copies and direct
baw> copies. Is that bad?
>>
>> Yes. The RFCs are clear that MTAs must not muck with
>> Message-Id other that to create one if there is none.
Harald> It is not clear to me that Mailman *is* an MTA. It is not
Harald> an SMTP server, and is not (necessarily) an SMTP client.
To have been precise perhaps I should have said something like "a mail agent must not muck with an existing Message-Id except as specified by the applicable standards". The Applicable Standards, to quote for example rfc2822,, apply as follows:
This standard specifies a syntax for text messages that are sent between computer users, within the framework of "electronic mail" messages.
The applicable standards govern what goes on the wire and therefore what Mailman causes to be put on the wire through a MTA should be compliant.
Harald> However, even if Mailman isn't an MTA, it would be nice if
Harald> it *mostly* tries to follow the MTA rules.
Harald> (As a side note, I am unable to find *clear* references to
Harald> the effect of your statement in RFCs 2821 or 2822.)
Rfc2822 Section 3.6.4 (the first paragraph below is the same paragraph you quoted elsewhere)
[[ ... ]]
The "Message-ID:" field provides a unique message identifier that refers to a particular version of a particular message. The uniqueness of the message identifier is guaranteed by the host that generates it (see below). This message identifier is intended to be machine readable and not necessarily meaningful to humans. A message identifier pertains to exactly one instantiation of a particular message; subsequent revisions to the message each receive new message identifiers.
Note: There are many instances when messages are "changed", but those changes do not constitute a new instantiation of that message, and therefore the message would not get a new message identifier. For example, when messages are introduced into the transport system, they are often prepended with additional header fields such as trace fields (described in section 3.6.7) and resent fields (described in section 3.6.6). The addition of such header fields does not change the identity of the message and therefore the original "Message-ID:" field is retained. In all cases, it is the meaning that the sender of the message wishes to convey (i.e., whether this is the same message or a different message) that determines whether or not the "Message-ID:" field changes, not any particular syntactic difference that appears (or does not appear) in the message.
Rfc822 Section 4.6.1 (in its entirety):
This field contains a unique identifier (the local-part
address unit) which refers to THIS version of THIS message.
The uniqueness of the message identifier is guaranteed by the
host which generates it. This identifier is intended to be
machine readable and not necessarily meaningful to humans. A
message identifier pertains to exactly one instantiation of a
particular message; subsequent revisions to the message should
each receive new message identifiers.
Rfc2822 in this case merely codifies long established practice interpreting rfc822. Rfc2822 Appendix A.3 may be helpful for the present discussion.
To test for compliance with the rfc2822 determination "whether this is the same message or a different message" one might stipulate that if the PGP signature verifies it is the same message, if the PGP signature does not verify it is a different message. As can be seen with the version of this here message that Mailman will send, a trailer can be added to the body without breaking the signature. The rule might be "if you break the signature, it is a new message and needs (at least) a new Message-Id" (One certainly can see by inspection what would break a signature without actually verifying the signature, right?)
Harald> Um. Mailman lists have numerous configuration options for
Harald> changing messages (e.g. adding footers) before they are
Harald> sent to the list members, and it has had such options
Harald> since time immemorial.
Who reads the RFCs to say that footers cannot be added without changing the message?
Other munging of the message is IMHO unsuitable for public archives and have not been obligatory AFIK.
Harald> As such, Mailman has to choose; should the messages be
Harald> archived as they appeared on the wire _coming in_ or
Harald> _going out_? To me, it is obvious that messages in the
Harald> archives should reflect, as closely as possible, the
Harald> messages members receive.
Agreed, the archive should record what Mailman put on the wire. No choice required! At least as you put it. I suppose a case could be made for archiving the mail received "from the wire" but I haven't thought of a good reason why that would be better.
Harald> * To my mind it would not be obviously wrong to view
Harald> Mailman as the *generator* of messages, at the very
Harald> least in the cases where it is obvious that the
Harald> previous generator didn't do its job of guaranteeing
Harald> message-id uniqueness properly.
Why?
ISTM the problem you are trying to solve is how to identify the archive image of the message.
Why not construct a URL containing a scrubbed Message-Id (as Brad Knowles has indicated) and a serial number (as I have indicated)? Such a URL should go into the "List-Archive" header field pointing to the specific message without doing violence to rfc2369 Section 3.6, right?
jam
[John A. Martin]
"Harald" == Harald Meland
Harald> It is not clear to me that Mailman *is* an MTA. It is not Harald> an SMTP server, and is not (necessarily) an SMTP client.
To have been precise perhaps I should have said something like "a mail agent must not muck with an existing Message-Id except as specified by the applicable standards". The Applicable Standards, to quote for example rfc2822,, apply as follows:
This standard specifies a syntax for text messages that are sent between computer users, within the framework of "electronic mail" messages.
I agree that it is obvious that Mailman should strive to avoid sending non-RFC2822-compliant messages.
However, I would think that the issue at hand is not about message *syntax*, but rather about the *semantic* value of a message's Message-Id.
Now that that nit is off my chest :-), I'll be quick to agree that RFC 2822 surely do contain a fair bit of semantic specifications as well; more on that below.
The applicable standards govern what goes on the wire and therefore what Mailman causes to be put on the wire through a MTA should be compliant.
Mailman is sort of between a rock and a hard place here, as it occupies a double role:
Mailman should be liberal in what it accepts -- which seems to imply that it should accept incoming messages even if they do not not conform strictly to all aspects of RFC 2822.
As one example, Mailman shouldn't offhandedly reject an incoming message just because there is a slight address syntax error in the message's From: header.
At the same time, Mailman should be conservative in what it sends. Naively, this would mean that Mailman ought to ensure that any message it puts on the wire conforms with RFC 2822; however, that would then have to either clash with the "liberal in what you expect" idea, or with the "don't change the message" maxim.
Harald> However, even if Mailman isn't an MTA, it would be nice if Harald> it *mostly* tries to follow the MTA rules. Harald> (As a side note, I am unable to find *clear* references to Harald> the effect of your statement in RFCs 2821 or 2822.)
Rfc2822 Section 3.6.4 (the first paragraph below is the same paragraph you quoted elsewhere)
[[ ... ]]
The "Message-ID:" field provides a unique message identifier that refers to a particular version of a particular message. The uniqueness of the message identifier is guaranteed by the host that generates it (see below). This message identifier is intended to be machine readable and not necessarily meaningful to humans. A message identifier pertains to exactly one instantiation of a particular message; subsequent revisions to the message each receive new message identifiers.
Note: There are many instances when messages are "changed", but those changes do not constitute a new instantiation of that message, and therefore the message would not get a new message identifier. For example, when messages are introduced into the transport system, they are often prepended with additional header fields such as trace fields (described in section 3.6.7) and resent fields (described in section 3.6.6). The addition of such header fields does not change the identity of the message and therefore the original "Message-ID:" field is retained. In all cases, it is the meaning that the sender of the message wishes to convey (i.e., whether this is the same message or a different message) that determines whether or not the "Message-ID:" field changes, not any particular syntactic difference that appears (or does not appear) in the message.
Rfc822 Section 4.6.1 (in its entirety):
This field contains a unique identifier (the local-part address unit) which refers to THIS version of THIS message. The uniqueness of the message identifier is guaranteed by the host which generates it. This identifier is intended to be machine readable and not necessarily meaningful to humans. A message identifier pertains to exactly one instantiation of a particular message; subsequent revisions to the message should each receive new message identifiers.
Rfc2822 in this case merely codifies long established practice interpreting rfc822. Rfc2822 Appendix A.3 may be helpful for the present discussion.
The part that (still) isn't clear to me, is whether Mailman's action of putting the message back on the wire can be said to be either 1) generation of a new message (personally, I wouldn't think so) or 2) a new instantiation of the message.
To test for compliance with the rfc2822 determination "whether this is the same message or a different message" one might stipulate that if the PGP signature verifies it is the same message, if the PGP signature does not verify it is a different message.
Now we're deeply into message semantics. :-)
I'd like to point out to things about your argument:
Firstly, the RFC does not merely distinguish between "the same message or a different message"; it also allows Message-ID: to be changed whenever there is a new instantiation of a (single) message.
Secondly, having to resort to (your) *interpretation* of the RFC, by using verification of PGP signatures for the test, is in my book a clear indication that the RFC is *not* crystal clear on this issue.
(One certainly can see by inspection what would break a signature without actually verifying the signature, right?)
That is my (rather shallow, I'm afraid) understanding of PGP email signatures, yes.
Harald> Um. Mailman lists have numerous configuration options for Harald> changing messages (e.g. adding footers) before they are Harald> sent to the list members, and it has had such options Harald> since time immemorial.
Who reads the RFCs to say that footers cannot be added without changing the message?
The more interesting issue, I think, is where should the line be drawn; how much is Mailman allowed to change (various parts of) a message before it should be considered a new message?
And, how does the Mailman modus operandi fit in with the RFCs "new instantiation" use of words?
Harald> * To my mind it would not be obviously wrong to view Harald> Mailman as the *generator* of messages, at the very Harald> least in the cases where it is obvious that the Harald> previous generator didn't do its job of guaranteeing Harald> message-id uniqueness properly.
Why?
Given that there exists two (or more) distinct messages that share the same message-id, the uniqueness of this identifier (as proscribed by RFC 2822) is clearly not satisfied. Hence, if Mailman really wants to have the messages it puts on the wire conform with RFC 2822, it should take on the role of message generator, and issue distinct message-ids for such distinct messages.
The hard problem, of course, is to properly discover whether or not two messages are indeed distinct; they might differ slightly by e.g. an automatically added footer, or in some other minor, but programmatically hard to discover, fashion.
ISTM the problem you are trying to solve is how to identify the archive image of the message.
Why not construct a URL containing a scrubbed Message-Id (as Brad Knowles has indicated) and a serial number (as I have indicated)?
Because, as Barry said, that would mean the "archive image identity" of the messages could change whenever the archive needs to be rebuilt (e.g. after a disk crash, the archives are gone, and there are no backups; then some kind list member comes forward with a partial archive constructed from the messages they've received from the list).
Such a URL should go into the "List-Archive" header field pointing to the specific message without doing violence to rfc2369 Section 3.6, right?
I don't think that's too far from the intention of that header, no. That section seems rather loosely worded, something I hope was done intentionally:
3.6. List-Archive
The List-Archive field describes how to access archives for the list.
Examples:
List-Archive: <mailto:archive@host.com?subject=index%20list>
List-Archive: <ftp://ftp.host.com/pub/list/archive/>
List-Archive: <http://www.host.com/list/archive/> (Web Archive)
-- Harald
participants (13)
-
Alessio Bragadini
-
amk@amk.ca
-
Barry Warsaw
-
Brad Knowles
-
Gary Frederick
-
Harald Meland
-
John A. Martin
-
Kevin McCann
-
Martin Maechler
-
martin@v.loewis.de
-
Nadim Shaikli
-
Roger Bivand
-
Tokio Kikuchi