Part of RFC 822 ignored by email module
Carl Banks
pavlovevidence at gmail.com
Fri Jan 21 00:48:29 EST 2011
On Jan 20, 9:55 am, Bob Kline <bkl... at rksystems.com> wrote:
> On 1/20/2011 12:23 PM, Carl Banks wrote:
>
>
>
> > On Jan 20, 7:08 am, Bob Kline<bkl... at rksystems.com> wrote:
> >> I just noticed that the following passage in RFC 822:
>
> >> The process of moving from this folded multiple-line
> >> representation of a header field to its single line represen-
> >> tation is called "unfolding". Unfolding is accomplished by
> >> regarding CRLF immediately followed by a LWSP-char as
> >> equivalent to the LWSP-char.
>
> >> is not being honored by the email module. The following two invocations
> >> of message_from_string() should return the same value, but that's not
> >> what happens:
>
> >> >>> import email
> >> >>> email.message_from_string("Subject: blah").get('SUBJECT')
> >> 'blah'
> >> >>> email.message_from_string("Subject:\n blah").get('SUBJECT')
> >> ' blah'
>
> >> Note the space in front of the second value returned, but missing from
> >> the first. Can someone convince me that this is not a bug?
> > That's correct, according to my reading of RFC 822 (I doubt it's
> > changed so I didn't bother to look up what the latest RFC on that
> > subject is.)
>
> > The RFC says that in a folded line the whitespace on the following
> > line is considered a part of the line.
>
> Thanks for responding. I think your interpretation of the RFC is the
> same is mine. What I'm saying is that by not returning the same value
> in the two cases above the module is not "regarding CRLF immediately
> followed by a LWSP-char as equivalent to the LWSP-char."
That makes sense. The space after \n is part of the reconstructed
subject and the email module should have treated it same as if the
line hadn't been folded. I agree that it's a bug. The line-folding
needs to be moved earlier in the parse process.
Carl Banks
More information about the Python-list
mailing list