Part of RFC 822 ignored by email module

Fri Jan 21 00:48:29 EST 2011

On Jan 20, 9:55 am, Bob Kline <bkl... at rksystems.com> wrote:
> On 1/20/2011 12:23 PM, Carl Banks wrote:
>
>
>
> > On Jan 20, 7:08 am, Bob Kline<bkl... at rksystems.com>  wrote:
> >> I just noticed that the following passage in RFC 822:
>
> >>           The process of moving  from  this  folded   multiple-line
> >>           representation  of a header field to its single line represen-
> >>           tation is called "unfolding".  Unfolding  is  accomplished  by
> >>           regarding   CRLF   immediately  followed  by  a  LWSP-char  as
> >>           equivalent to the LWSP-char.
>
> >> is not being honored by the email module.  The following two invocations
> >> of message_from_string() should return the same value, but that's not
> >> what happens:
>
> >>   >>>  import email
> >>   >>>  email.message_from_string("Subject: blah").get('SUBJECT')
> >> 'blah'
> >>   >>>  email.message_from_string("Subject:\n blah").get('SUBJECT')
> >> ' blah'
>
> >> Note the space in front of the second value returned, but missing from
> >> the first.  Can someone convince me that this is not a bug?
> > That's correct, according to my reading of RFC 822 (I doubt it's
> > changed so I didn't bother to look up what the latest RFC on that
> > subject is.)
>
> > The RFC says that in a folded line the whitespace on the following
> > line is considered a part of the line.
>
> Thanks for responding.  I think your interpretation of the RFC is the
> same is mine.  What I'm saying is that by not returning the same value
> in the two cases above the module is not "regarding CRLF immediately
> followed by a LWSP-char as equivalent to the LWSP-char."

That makes sense.  The space after \n is part of the reconstructed
subject and the email module should have treated it same as if the
line hadn't been folded.  I agree that it's a bug.  The line-folding
needs to be moved earlier in the parse process.

Carl Banks