Part of RFC 822 ignored by email module
Carl Banks
pavlovevidence at gmail.com
Thu Jan 20 12:23:11 EST 2011
On Jan 20, 7:08 am, Bob Kline <bkl... at rksystems.com> wrote:
> I just noticed that the following passage in RFC 822:
>
> The process of moving from this folded multiple-line
> representation of a header field to its single line represen-
> tation is called "unfolding". Unfolding is accomplished by
> regarding CRLF immediately followed by a LWSP-char as
> equivalent to the LWSP-char.
>
> is not being honored by the email module. The following two invocations
> of message_from_string() should return the same value, but that's not
> what happens:
>
> >>> import email
> >>> email.message_from_string("Subject: blah").get('SUBJECT')
> 'blah'
> >>> email.message_from_string("Subject:\n blah").get('SUBJECT')
> ' blah'
>
> Note the space in front of the second value returned, but missing from
> the first. Can someone convince me that this is not a bug?
That's correct, according to my reading of RFC 822 (I doubt it's
changed so I didn't bother to look up what the latest RFC on that
subject is.)
The RFC says that in a folded line the whitespace on the following
line is considered a part of the line. Relevant quite (section
3.1.1):
Each header field can be viewed as a single, logical line of
ASCII characters, comprising a field-name and a field-body.
For convenience, the field-body portion of this conceptual
entity can be split into a multiple-line representation; this
is called "folding". The general rule is that wherever there
may be linear-white-space (NOT simply LWSP-chars), a CRLF
immediately followed by AT LEAST one LWSP-char may instead be
inserted. Thus, the single line
To: "Joe & J. Harvey" <ddd @Org>, JJV @ BBN
can be represented as:
To: "Joe & J. Harvey" <ddd @ Org>,
JJV at BBN
and
To: "Joe & J. Harvey"
<ddd@ Org>, JJV
@BBN
and
To: "Joe &
J. Harvey" <ddd @ Org>, JJV @ BBN
The process of moving from this folded multiple-line
representation of a header field to its single line represen-
tation is called "unfolding". Unfolding is accomplished by
regarding CRLF immediately followed by a LWSP-char as
equivalent to the LWSP-char.
Carl Banks
More information about the Python-list
mailing list