[Mailman-Users] Garbled headers - was: gmail marks mailman confirmation mail as spam...
Mark Sapiro
mark at msapiro.net
Mon Jun 15 19:17:09 CEST 2009
I am trying to move this thread to email-sig at python.org since the
underlying issue is in the email package. Further, since as of Mailman
2.1.12, we no longer install a Mailman specific version of the email
package, it really has to be addressed in the email package.
Stephen J. Turnbull wrote:
> Mark Sapiro writes:
>
> > I think there is a minor bug in decode_header() in that it won't
> > recognize a RFC 2047 encoded word in a comment if the encoded word is
> > not separated by whitespace from the ")" that terminates the comment.
> > However, this is the only place where an encoded word need not be
> > followed by whitespace or the end of the header.
>
> Indeed that's a bug. I gather that you're saying that this bug is not
> the cause of the OP's problem, though?
Correct.
> > The Subject: header above is non-compliant in two respects. It is too
> > long. [...] However, decode_header will accept it anyway and do
> > the right thing.
>
> As it should, according to the Postel Principle. Anyway, IIRC the
> length limit is a SHOULD NOT, not a MUST NOT, right?
The RFC (8|28|53)22 limits are MUST BE <= 998 and SHOULD BE <= 78. RFC
2047 seems to want to impose stricter limits on encoded words, but
unfortunately does not use the defined terms MUST and SHOULD. Section 2
says in part:
An 'encoded-word' may not be more than 75 characters long, including
'charset', 'encoding', 'encoded-text', and delimiters. If it is
desirable to encode more text than will fit in an 'encoded-word' of
75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may
be used.
While there is no limit to the length of a multiple-line header
field, each line of a header field that contains one or more
'encoded-word's is limited to 76 characters.
so it is not clear whether these are 'recommendations' or
'requirements'. In any case, email.header.decode_header() is not
enforcing any limits so we are being generous in what we accept in this
respect.
> > real problem is item (1) in section 5 of the RFC says in part:
> >
> > Ordinary ASCII text and 'encoded-word's may appear together in the
> > same header field. However, an 'encoded-word' that appears in a
> > header field defined as '*text' MUST be separated from any adjacent
> > 'encoded-word' or 'text' by 'linear-white-space'.
> >
> > The header above does not comply with this.
>
> Agreed, but I think that by default[1] email should try to parse this
> header as the user intended it. It's not like encoded-words are that
> easy to confuse with intended text; it's unlikely that changing
> 'linear-white-space' above to 'linear-white-space or specials' would
> harm anyone.
I fully agree. There is a regexp (ecre) in email/header.py that ends
with the lookahead assertion "(?=[ \t]|$)". Even in "strict mode", I
think the lookahead needs to accept ")" as well as space and tab, but I
think by default, it should just be removed.
> > This is a problem with the MUA (mail client) that encoded the Subject:
> > header in the first place.
>
> Agreed, but I think following the Postel Principle here is likely to
> do less harm than adhering strictly to the RFC.
I agree here too, and note that some MUAs (all three I tried including
mutt and Thunderbird) decode the original header as intended.
> That said, I'm not in a position to contribute code, and this is a
> pretty invasive change, so the user is unlikely to see a version of
> Mailman that handles this any time soon. They are likely to have more
> luck switching clients.
>
> Footnotes:
> [1] Ie, there should be an option to be strict.
>
>
--
Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
San Francisco Bay Area, California better use your sense - B. Dylan
More information about the Mailman-Users
mailing list