[Email-SIG] email.header.decode_header eats my spaces
Barry Warsaw
barry at python.org
Thu Mar 29 06:24:42 CEST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Mar 28, 2007, at 8:13 PM, Tokio Kikuchi wrote:
> Well, it looks to me that RFC2047 prohibits this at least in header
> text. An example for comment text in section 8 states:
>
> (=?ISO-8859-1?Q?a?= b) (a b)
>
> Within a 'comment', white space MUST appear between an
> 'encoded-word' and surrounding text. [Section 5,
> paragraph (2)]. However, white space is not needed between
> the initial "(" that begins the 'comment', and the
> 'encoded-word'.
>
> The word MUST means there is no way omitting spaces between encoded-
> word and surrounding ascii text. The '(' before the encoded-word
> appears to violate this but it is a higher syntax token.
>
> Current email.header violate this example because we have no class
> which recognizes comment in a structured header.
Thanks Tokio, I agree with all of this. I think you're right in
identifying that the problem here is that we don't really have any
way to understand the semantics of the a particular header's body.
> This current behavior is correct if '(' is in a *text field and the
> example is not appropriate. The problem in email.header module is
> it can not distiguish between the structured and unstructured (text
> only) headers. The Header class may have a member function like
> 'add_comment', IMHO.
I think we might want to try to address this in a more general and
extensible way, so that we can support future semantically meaningful
headers.
>> >>> h = Header()
>> >>> h.append('hello', 'us-ascii')
>> >>> h.append('world', 'us-ascii')
>> >>> print h
>> hello world
>> >>> print unicode(h)
>> helloworld
>> I think we're nearly correct here. The unicode version is what
>> I'd expect, but the string version is not. I think in both cases
>> we should print 'helloworld'.
>
> No. email.header module is not a word processor. Because RFC2047
> is dealing with 'word's, we should treat these parts as 'word's for
> consitency. unicode() function should be fixed. If these words
> are to be concatnated without a space, it should be done outside
> header module.
Right, but these parts aren't being encoded, and yet we've still
stuck a space between the parts that didn't exist there before. I'd
feel better about it if we encoded these chunks too.
- -Barry
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)
iQCVAwUBRgs/k3EjvBPtnXfVAQLv3gQAl3598ge8qge7epkdqqjBq4F+478374z6
DuvfcBWeBGNZ/b4PEesPbtOwUKprz9mp988N1aoiMWiBa3p5OMQvhIl6q0w1d7Tj
Gm2aCxrXa2JRfkFsj+VygDalK8aYT0XcDxh+56vCjfwhTvKHz1MmkAEwWLbJ6Cp/
GxGfW4l6a6g=
=7akO
-----END PGP SIGNATURE-----
More information about the Email-SIG
mailing list