
Hi,
I'm a Savane (gna.org/p/savane) developer, and we noticed that when we send mail with newlines in the \r\n format, Mailman converts them to \n\n.
I would like to get your point on what we should do to fix the problem. I had a look at RFC 822; the EOL (end of line) description is not very clear, but it seems that the standard EOL convention is \r\n (CRLF).
The issue can be fixed if I make Savane convert all \r\n to \n before to send the notifications, but I am not sure this is very compliant. Anyway I do think this is a bug in Mailman. What do you think we (Mailman and Savane) should do?
I posted a detailed bug report at: http://sourceforge.net/tracker/index.php?func=detail&aid=1151439&group_id=103&at +id=100103 for more information. The Savane bug report counterpart is at: https://gna.org/bugs/?func=detailitem&item_id=1980
-- Sylvain

Sylvain Beucler wrote:
RFC 822 is obsolete. The current standards are RFC 2821 SMTP and RFC 2822 Internet Message Format. Both these standards are very clear. For example RFC 2821 says:
<quote> 2.3.7 Lines
SMTP commands and, unless altered by a service extension, message
data, are transmitted in "lines". Lines consist of zero or more data characters terminated by the sequence ASCII character "CR" (hex value 0D) followed immediately by ASCII character "LF" (hex value 0A). This termination sequence is denoted as <CRLF> in this document. Conforming implementations MUST NOT recognize or generate any other character or character sequence as a line terminator. Limits MAY be imposed on line lengths by servers (see section 4.5.3).
In addition, the appearance of "bare" "CR" or "LF" characters in
text (i.e., either without the other) has a long history of causing problems in mail implementations and applications that use the mail system as a tool. SMTP client implementations MUST NOT transmit these characters except when they are intended as line terminators and then MUST, as indicated above, transmit them only as a <CRLF> sequence. </quote>
RFC 2822 says
<quote> 2.3 Body
The body of a message is simply lines of US-ASCII characters. The
only two limitations on the body are as follows:
CR and LF MUST only occur together as CRLF; they MUST NOT appear independently in the body.
Lines of characters in the body MUST be limited to 998 characters, and SHOULD be limited to 78 characters, excluding the CRLF.
Note: As was stated earlier, there are other standards documents,
specifically the MIME documents [RFC2045, RFC2046, RFC2048, RFC2049] that extend this standard to allow for different sorts of message bodies. Again, these mechanisms are beyond the scope of this document. </quote>
I don't see any issues with mail delivered from Mailman having any "doubling" of line terminators resulting in extra blank lines.
How does Savane inject its messages into the internet mail transport system? If it is sending via SMTP, then it MUST send line terminators as <CRLF> (or \r\n). If this results in doubling, then the receiving SMTP server is non compliant. If it is sending messages via some non SMTP intermediary, then it needs to format the messages as expected by the intermediary.
Does Savane perhaps pipe outgoing messages directly to the Mailman wrapper in the same way that an incoming MTA might do? If so, this is _not_ an SMTP transfer and end of line sequences should be those of the platform, i.e just \n for Unix.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 3/19/2005 9:59, "Mark Sapiro" <msapiro@value.net> wrote:
http://mail.python.org/mailman/options/mailman-developers/jwblist%40olympus....> t
Note that 2821 is talking about how messages are transmitted over the wire as part of the SMTP transaction. An MTA will usually present the message to mailboxes, scripts, etc in the native form expected on the platform (for Unix, newline = ASCII linefeed). Mailman is very likely operating on that expectation (in particular, Mailman is not one end of an SMTP conversation, so the SMTP RFC doesn't apply).
I believe things will work much better if Savane presents Unix-like data to the Mailman scripts, not SMTP data.
The same is likely true of at least some other entities that Savane talks to.
--John

On Sat, 2005-03-19 at 19:27, John W. Baxter wrote:
Exactly. Mailman isn't going to convert line endings. The email package is the component that parses messages and generates the flattened text from the message objects. If anything was converting newlines it would be the email package.
But the email package tries very hard to preserve whatever line endings it finds in the messages it parses. John is right that typically the MTA will present the message to mail programs using native line endings. The on-the-wire representation really has no bearing on the issue.
-Barry

On Sat, Mar 19, 2005 at 09:59:46AM -0800, Mark Sapiro wrote:
Thanks for the pointer. So we should use CRLF anywhere.
Savane uses the PHP "mail()" function, that executes the local sendmail-compatible command.
At both GNU Savannah (savannah.gnu.org) and Gna! (gna.org), we use the Exim version packaged by Debian. In both cases, the mail we receive via mailing lists (and in the Mailman archives) have the newlines doubled. The mails we receive at our personnal mail addresses is correct (no doubling). As far as I'm concerned, line doubling _does_ look ugly and unprofessional ;)
As a result, I was under the impression that Mailman is the one that doubles the newlines. Now maybe Exim as an SMTP server (not as a sendmail-compatible command) is doubling lines. But are you completely sure Mailman (or a Python library used by Mailman) is not converting \r\n to \n\n?
-- Sylvain

On Sun, 2005-03-20 at 02:43, Sylvain Beucler wrote:
I could be wrong of course, but I can't think of anything in either Python or Mailman that would do that conversion. Another simple experiment is to write a little Python mail program that just parses stdin, then writes the flattened text out to a file. See if that has the newline problem.
You're basically going to have to work your way up your tool chain to see where the transformation is happening.
-Barry

Today I got mail from Sylvain Beucler:
Thanks for the pointer. So we should use CRLF anywhere.
On the network, yes. Not locally through the sendmail interface.
Savane uses the PHP "mail()" function, that executes the local sendmail-compatible command.
Php mail() is IMHO quite buggy. On windows it connects to an smtp server and sends whatever the user provided directly - \r\n is thus mandatory. (But IIRC php internally adds some header lines ending with \n.) On unix php uses sendmail (and can NOT be configured to use smtp :-( and can thus NOT confirm to the php script that the local MTA has received and accepted the message) and the message should thus not contain CRLF. Only the windows behaviour seems documented in the php docs.
My guess: When exim receives \r\n at its sendmail interface it is kind and cleans it up and converts it to a \n, and when sending over SMTP it converts to \r\n.
When Mailman receives \r\n it trusts the user and considers it a line with a \r before the lineending \n. When Mailman delivers the mail over SMTP it writes the lineending \n as \r\n, ending up with SMTP lines ending with \r\r\n. A receiving MTA might consider that a malformed (mac) lineending followed by a wellformed one and be kind enough to make the first wellformed, causing two linefeeds...
I suggest that you try to convert all \r\n to \n before calling mail(), that you avoid using mail() and delivers through smtp instead, and finally that you fix mail() ;-)
Regards, Mads Kiilerich

On Sun, 2005-03-20 at 08:29, Mads Kiilerich wrote:
When Mailman delivers the mail over SMTP it writes the lineending \n as \r\n, ending up with SMTP lines ending with \r\r\n.
Actually, that bit would happen by Python's smtplib module, which must ensure that the data it sends over the wire is RFC compliant.
-Barry

On 3/19/2005 23:43, "Sylvain Beucler" <beuc@beuc.net> wrote:
Ever since Philip Hazel tried to accommodate some broken email producers by messing with incoming line endings, there have been problems with products broken in different ways. As I recall, Philip has done various tuneups over the last many Exim versions.
You didn't say what version of Exim is running in your Debian installation...there are backports of newish Exim versions available which would likely fix your problem.
Note that switching from the Exim 3.x series to Exim 4.x involves a significant reconfiguration of Exim...it's likely best done by running debconfig and answering the Exim configuration questions as if you were starting from scratch. This will not be something to tackle the day before a vacations on your production server.
--John

Sylvain Beucler wrote:
RFC 822 is obsolete. The current standards are RFC 2821 SMTP and RFC 2822 Internet Message Format. Both these standards are very clear. For example RFC 2821 says:
<quote> 2.3.7 Lines
SMTP commands and, unless altered by a service extension, message
data, are transmitted in "lines". Lines consist of zero or more data characters terminated by the sequence ASCII character "CR" (hex value 0D) followed immediately by ASCII character "LF" (hex value 0A). This termination sequence is denoted as <CRLF> in this document. Conforming implementations MUST NOT recognize or generate any other character or character sequence as a line terminator. Limits MAY be imposed on line lengths by servers (see section 4.5.3).
In addition, the appearance of "bare" "CR" or "LF" characters in
text (i.e., either without the other) has a long history of causing problems in mail implementations and applications that use the mail system as a tool. SMTP client implementations MUST NOT transmit these characters except when they are intended as line terminators and then MUST, as indicated above, transmit them only as a <CRLF> sequence. </quote>
RFC 2822 says
<quote> 2.3 Body
The body of a message is simply lines of US-ASCII characters. The
only two limitations on the body are as follows:
CR and LF MUST only occur together as CRLF; they MUST NOT appear independently in the body.
Lines of characters in the body MUST be limited to 998 characters, and SHOULD be limited to 78 characters, excluding the CRLF.
Note: As was stated earlier, there are other standards documents,
specifically the MIME documents [RFC2045, RFC2046, RFC2048, RFC2049] that extend this standard to allow for different sorts of message bodies. Again, these mechanisms are beyond the scope of this document. </quote>
I don't see any issues with mail delivered from Mailman having any "doubling" of line terminators resulting in extra blank lines.
How does Savane inject its messages into the internet mail transport system? If it is sending via SMTP, then it MUST send line terminators as <CRLF> (or \r\n). If this results in doubling, then the receiving SMTP server is non compliant. If it is sending messages via some non SMTP intermediary, then it needs to format the messages as expected by the intermediary.
Does Savane perhaps pipe outgoing messages directly to the Mailman wrapper in the same way that an incoming MTA might do? If so, this is _not_ an SMTP transfer and end of line sequences should be those of the platform, i.e just \n for Unix.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 3/19/2005 9:59, "Mark Sapiro" <msapiro@value.net> wrote:
http://mail.python.org/mailman/options/mailman-developers/jwblist%40olympus....> t
Note that 2821 is talking about how messages are transmitted over the wire as part of the SMTP transaction. An MTA will usually present the message to mailboxes, scripts, etc in the native form expected on the platform (for Unix, newline = ASCII linefeed). Mailman is very likely operating on that expectation (in particular, Mailman is not one end of an SMTP conversation, so the SMTP RFC doesn't apply).
I believe things will work much better if Savane presents Unix-like data to the Mailman scripts, not SMTP data.
The same is likely true of at least some other entities that Savane talks to.
--John

On Sat, 2005-03-19 at 19:27, John W. Baxter wrote:
Exactly. Mailman isn't going to convert line endings. The email package is the component that parses messages and generates the flattened text from the message objects. If anything was converting newlines it would be the email package.
But the email package tries very hard to preserve whatever line endings it finds in the messages it parses. John is right that typically the MTA will present the message to mail programs using native line endings. The on-the-wire representation really has no bearing on the issue.
-Barry

On Sat, Mar 19, 2005 at 09:59:46AM -0800, Mark Sapiro wrote:
Thanks for the pointer. So we should use CRLF anywhere.
Savane uses the PHP "mail()" function, that executes the local sendmail-compatible command.
At both GNU Savannah (savannah.gnu.org) and Gna! (gna.org), we use the Exim version packaged by Debian. In both cases, the mail we receive via mailing lists (and in the Mailman archives) have the newlines doubled. The mails we receive at our personnal mail addresses is correct (no doubling). As far as I'm concerned, line doubling _does_ look ugly and unprofessional ;)
As a result, I was under the impression that Mailman is the one that doubles the newlines. Now maybe Exim as an SMTP server (not as a sendmail-compatible command) is doubling lines. But are you completely sure Mailman (or a Python library used by Mailman) is not converting \r\n to \n\n?
-- Sylvain

On Sun, 2005-03-20 at 02:43, Sylvain Beucler wrote:
I could be wrong of course, but I can't think of anything in either Python or Mailman that would do that conversion. Another simple experiment is to write a little Python mail program that just parses stdin, then writes the flattened text out to a file. See if that has the newline problem.
You're basically going to have to work your way up your tool chain to see where the transformation is happening.
-Barry

Today I got mail from Sylvain Beucler:
Thanks for the pointer. So we should use CRLF anywhere.
On the network, yes. Not locally through the sendmail interface.
Savane uses the PHP "mail()" function, that executes the local sendmail-compatible command.
Php mail() is IMHO quite buggy. On windows it connects to an smtp server and sends whatever the user provided directly - \r\n is thus mandatory. (But IIRC php internally adds some header lines ending with \n.) On unix php uses sendmail (and can NOT be configured to use smtp :-( and can thus NOT confirm to the php script that the local MTA has received and accepted the message) and the message should thus not contain CRLF. Only the windows behaviour seems documented in the php docs.
My guess: When exim receives \r\n at its sendmail interface it is kind and cleans it up and converts it to a \n, and when sending over SMTP it converts to \r\n.
When Mailman receives \r\n it trusts the user and considers it a line with a \r before the lineending \n. When Mailman delivers the mail over SMTP it writes the lineending \n as \r\n, ending up with SMTP lines ending with \r\r\n. A receiving MTA might consider that a malformed (mac) lineending followed by a wellformed one and be kind enough to make the first wellformed, causing two linefeeds...
I suggest that you try to convert all \r\n to \n before calling mail(), that you avoid using mail() and delivers through smtp instead, and finally that you fix mail() ;-)
Regards, Mads Kiilerich

On Sun, 2005-03-20 at 08:29, Mads Kiilerich wrote:
When Mailman delivers the mail over SMTP it writes the lineending \n as \r\n, ending up with SMTP lines ending with \r\r\n.
Actually, that bit would happen by Python's smtplib module, which must ensure that the data it sends over the wire is RFC compliant.
-Barry

On 3/19/2005 23:43, "Sylvain Beucler" <beuc@beuc.net> wrote:
Ever since Philip Hazel tried to accommodate some broken email producers by messing with incoming line endings, there have been problems with products broken in different ways. As I recall, Philip has done various tuneups over the last many Exim versions.
You didn't say what version of Exim is running in your Debian installation...there are backports of newish Exim versions available which would likely fix your problem.
Note that switching from the Exim 3.x series to Exim 4.x involves a significant reconfiguration of Exim...it's likely best done by running debconfig and answering the Exim configuration questions as if you were starting from scratch. This will not be something to tackle the day before a vacations on your production server.
--John
participants (5)
-
Barry Warsaw
-
John W. Baxter
-
Mads Kiilerich
-
Mark Sapiro
-
Sylvain Beucler