[Spambayes] Email package and the CRLF pair

Paul Moore lists at morpheus.demon.co.uk
Fri Apr 18 21:57:01 EDT 2003

"Meyer, Tony" <T.A.Meyer at massey.ac.nz> writes:

> I'm hoping that there are still some Python experts hanging about here
> and they've just been quiet for a bit.

Not an expert, but I'll do my best...

> As you may have noticed, we ran into a bit of trouble recently with the
> imap filter because messages had only \n (CR) and not \r\n (CRLF).
> Message bodies don't really matter here; what matters are the headers.

\n is LF not CR. You nearly had me confused with that...

> As far as I can see*, when the email package** returns a message via
> str() or as_string(), it only puts a CR after each header.

You mean LF, but with that change, you are right. Here's a simple

    Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import email
    >>> from email.MIMEText import MIMEText
    >>> msg = MIMEText("Test")
    >>> msg["Subject"] = "A Test"
    >>> msg.as_string()
    'Content-Type: text/plain; charset="us-ascii"\nMIME-Version: 1.0\nContent-Transfer-Encoding: 7bit\nSubject: A Test\n\nTest\n'

> But RFC2822 says that:
>   "Header fields are lines composed of a field name, followed by a colon
>    (":"), followed by a field body, and terminated by CRLF." [Section
> 2.2]

Interesting. Does it say anything about line terminators in the body?
It probably should, as email is a pure-text medium, so you should be
considering line termination for the whole message, not just the

For example, is the following valid? (Ignoring issues of required

    Subject: A Test[CR][LF]
    From: Me <my/email at my.computer>[CR][LF]
    Now we start to get nasty[LF]
    Let's mix things up completely[CR][LF]
    And a Mac variation, just for fun[CR]
    So how does this look?[CR][LF]

My betting is that RFC822 doesn't disallow it, which likely means that
the RFC is, to some extent, broken...

> Is this therefore a bug in the email package that I should report?
> Advice from someone that knows more than me (is Barry there?) about
> these things would be appreciated.

My instinct is to say that "it depends how you look at it". While the
RFC mandates CRLF, "usual practice" seems to be that the platform-
specific newline character sequence is used internally, and often when
messages are stored in files as well. It's only when transmitting data
across the network that standardising on CRLF is important.

I'd imagine that most network transport code converts \n to CRLF when
sending data, so that the "internal" format doesn't matter in
practice. Look at smtplib.py in the standard library:

    def quotedata(data):
        """Quote data for email.

        Double leading '.', and change Unix newline '\\n',
        or Mac '\\r' into Internet CRLF end-of-line.

Interestingly, I can't see anything equivalent in imaplib.py. So maybe
it's best argued as a bug in imaplib, rather than in the email
package. (If the IMAP protocol mandates CRLF, then imaplib should
ensure that rather than making client code - which generally uses \n
internally - care about it).

OK, I looked a bit further. The only imaplib method which deals with
messages in string format is append(), so I'd argue that that method
should convert CR, CRLF, and LF sequences into the canonical CRLF that
the IMAP protocol needs. If nothing else, it's a case of the old "be
lenient in what you accept and strict in what you deliver" mantra.

For a workaround in client code, if you want to force CRLF in the
message string, do

    def force_CRLF(data):
    """Make sure data uses CRLF for line termination.

    Nicked the regex from smtplib.quotedata.
    return re.sub(r'(?:\r\n|\n|\r(?!\n))', "\r\n", data)

    # Now, convert to canonical line endings
    msg_str = force_CRLF(msg_str)

This is a bit more rigorous than a simple str.replace("\n", "\r\n").

Hope this helps,
This signature intentionally left blank

More information about the Spambayes mailing list