[issue1974] email.MIMEText.MIMEText.as_string incorrectly folding long subject header

Wed Jun 25 21:42:06 CEST 2008

Ori Avtalion <ori at avtalion.name> added the comment:

I think there's been a little misinterpretation of the standard in the
comments above.

It's important to note that RFC 2822 basically defines folding as
"adding a CRLF before an existing whitespace in the original message". 

See http://tools.ietf.org/html/rfc2822#section-2.2.3

It does *not* allow prepending folded lines with extra characters that
were not in the original message such as '\t' or ' '.

This is exactly what _encode_chunks does in header.py:
    joiner = NL + self._continuation_ws

(Note that the email package docs and Header docstring use the word
'prepend' which is reflects the error in the code).

With a correct implementation, why would I want to choice of which type
of character to line-break on when folding?
The whole notion of controlling the value of continuation_ws seems wrong.

However, changing the default continuation_ws to ' ', as the patch
suggests, will output syntactically correct headers in the majority of
cases (due to other bugs that remove trailing whitespace and merge
consecutive whitespace into one character).

All in all, I agree with the change of the default continuation_ws due
to its lucky side-effects, but as Barry hinted, the algorithm needs some
serious work to really output valid headers.

Some examples of the good and bad behaviors:

>>> from email.Header import Header
>>> l = ['<%d at dom.ain>' % i for i in range(8)]

>>> # this turns out fine
>>> Header(' '.join(l), continuation_ws=' ').encode()
'<0 at dom.ain> <1 at dom.ain> <2 at dom.ain> <3 at dom.ain> <4 at dom.ain>
<5 at dom.ain>\n <6 at dom.ain> <7 at dom.ain>'

# This does not fold even though it should
>>> Header('\t'.join(l), continuation_ws=' ').encode()
'<0 at dom.ain>\t<1 at dom.ain>\t<2 at dom.ain>\t<3 at dom.ain>\t<4 at dom.ain>\t<5 at dom.ain>\t<6 at dom.ain>\t<7 at dom.ain>'

# And here the 4-char whitespace is shrinked into one
>>> Header('    '.join(l), continuation_ws=' ').encode()
'<0 at dom.ain> <1 at dom.ain> <2 at dom.ain> <3 at dom.ain> <4 at dom.ain>
<5 at dom.ain>\n <6 at dom.ain> <7 at dom.ain>'

----------
nosy: +salty-horse

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1974>
_______________________________________