[Mailman-Users] Garbled headers - was: gmail marks mailman confirmation mail as spam...

Mark Sapiro mark at msapiro.net
Mon Jun 15 04:36:24 CEST 2009


Mark Sapiro wrote:
> Kārlis Repsons wrote:
>> On Sunday 14 June 2009 17:12:22 you wrote:
> 
>>> Kārlis forwarded an email to me off list. It's salient feature is the
>>> subject header
>>>
>>> Subject: skatamies, cik ilgi google =?utf-8?q?m=C4=93=C4=A3ina?=
>>> =?utf-8?q?_nog=C4=81d=C4=81t?=, kad nevar...
>>>
>>> which is wrapped here but was all one line in the original. I have
>>> verified that there is a problem in the underlying Python email package
>>> with headers containing multiple RFC 2047 encoded words whether or not
>>> they are separated by non-encoded text.
> 
> 
> Actually, the problem is not multiple encoded words. It is the fact that
>  Python's email.header.decode_header() function doesn't recognize an RFC
> 2047 encoded word as such if the trailing "?=" is not followed by
> whitespace or the end of the string - here it is followed by a ",".
> 
> I think this is a bug in decode_header(), but I won't have time to look
> further at this until tomorrow.


I think there is a minor bug in decode_header() in that it won't
recognize a RFC 2047 encoded word in a comment if the encoded word is
not separated by whitespace from the ")" that terminates the comment.
However, this is the only place where an encoded word need not be
followed by whitespace or the end of the header.

The Subject: header above is non-compliant in two respects. It is too
long. RFC 2047 section 2 says in part:

   While there is no limit to the length of a multiple-line header
   field, each line of a header field that contains one or more
   'encoded-word's is limited to 76 characters.

However, decode_header will accept it anyway and do the right thing. The
real problem is item (1) in section 5 of the RFC says in part:

    Ordinary ASCII text and 'encoded-word's may appear together in the
    same header field.  However, an 'encoded-word' that appears in a
    header field defined as '*text' MUST be separated from any adjacent
    'encoded-word' or 'text' by 'linear-white-space'.

The header above does not comply with this. Instead of being

Subject: skatamies, cik ilgi google =?utf-8?q?m=C4=93=C4=A3ina?=
=?utf-8?q?_nog=C4=81d=C4=81t?=, kad nevar...

(all on one line), it should be

Subject: skatamies, cik ilgi google =?utf-8?q?m=C4=93=C4=A3ina?=
  =?utf-8?q?_nog=C4=81d=C4=81t,?= kad nevar...

I.e., it should be folded so no part is longer than 76 characters, but
more importantly for this, the "," near the end should be part of the
encoded word rather than following the "?=" with no intervening whitespace.

This is a problem with the MUA (mail client) that encoded the Subject:
header in the first place.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Users mailing list