[ mailman-Bugs-1605144 ] mailman corrupts RFC2047-encoded headers

SourceForge.net noreply at sourceforge.net
Tue Apr 10 02:27:39 CEST 2007


Bugs item #1605144, was opened at 2006-11-29 21:28
Message generated for change (Comment added) made by chrissamuel
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100103&aid=1605144&group_id=103

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: mail delivery
Group: 2.1 (stable)
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: David Woodhouse (dwmw2)
Assigned to: Nobody/Anonymous (nobody)
Summary: mailman corrupts RFC2047-encoded headers

Initial Comment:
Given an input like this:

Subject: =?UTF-8?Q?[MTD]=20NAND:=20CAF=C3=89=20NAND=20driver=20cleanup,=20fix=20ECC=20on=20reading=20empty=20flash?=

Mailman appears to emit mail like this:

Subject: =?UTF-8?Q?[MTD]=20NAND:=20CAF=C3=89=20NAND=20driver=20cleanup,
        =20fix=20ECC=20on=20reading=20empty=20flash?=

The input was RFC2047-compliant. The output isn't.

----------------------------------------------------------------------

Comment By: Chris Samuel (chrissamuel)
Date: 2007-04-10 10:27

Message:
Logged In: YES 
user_id=1581966
Originator: NO

You can disable header wrapping in the module (if I am looking at the
correct Python docs) according to this page:

http://docs.python.org/lib/module-email.generator.html

It implies that by passing through maxheaderlen set to 0 to all calls of
Generator then you shouldn't get this wrapping behaviour, though I don't
know when this appeared in Python.

I believe this may also be the the cause of Mailman breaking my PGP/MIME
messages as diff'ing the saved original and the version that comes back
shows that the only differences are for long MIME headers and for
reformatting of the headers in the message/rfc822 attached email.

I am not sure if this is related to 815297, but it sure looks like it.

Caveat: I am not a Python programmer, just a Postmaster..

----------------------------------------------------------------------

Comment By: Tokio Kikuchi (tkikuchi)
Date: 2006-11-30 23:46

Message:
Logged In: YES 
user_id=67709
Originator: NO

This is derived from the python email module behavior that try to keep a
header line within 78 characters.  Mailman parses the message first and do
something like adding subject prefix or message body header/footer then
regenerate RFC-2822 message.  Email module thinks your subject has two part
structure separated by a comma and split by CRLF.  I am not very sure but
current version of email doesn't distinguish structured and unstructured
headers defined in 2.2.1 and 2.2.2 of RFC-2822.  Anyway, It is safer to
shorten the header lines within 78 charcters.

FYI, email module generates your subject header like this:
Subject:
=?utf-8?q?=5BMTD=5D_NAND=3A_CAF=C3=89_NAND_driver_cleanup=2C_fix?=
 =?utf-8?q?_ECC_on_reading_empty_flash?=


----------------------------------------------------------------------

Comment By: David Woodhouse (dwmw2)
Date: 2006-11-30 01:00

Message:
Logged In: YES 
user_id=81423
Originator: YES

Your thunderbird also refuses to send this:

    To: Some people : ;
    Bcc: foo at bar.org, baz at turnip.com

Thunderbird isn't necessarily the best test of what's valid :)

The pertinent question is: why is mailman munging this _anyway_? Why can't
it just pass the header through as it was originally sent? If I put line
breaks in and lined things up sensibly like a SpamAssassin report does, why
should that be mangled by mailman?



----------------------------------------------------------------------

Comment By: Harald Hoyer (Red Hat) (saturn_de)
Date: 2006-11-30 00:46

Message:
Logged In: YES 
user_id=5540
Originator: NO

Hmm, my thunderbird encodes "," with =2C

----------------------------------------------------------------------

Comment By: David Woodhouse (dwmw2)
Date: 2006-11-30 00:15

Message:
Logged In: YES 
user_id=81423
Originator: YES

That's only for the charset (UTF-8) and the encoding (Q). The comma
appears in the encoded-text, and should be fine (since this is a Subject:
header and hence comes under paragraph (1) of §5.

  encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

   charset = token    ; see section 3

   encoding = token   ; see section 4

   token = 1*<Any CHAR except SPACE, CTLs, and especials>

   especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
               <"> / "/" / "[" / "]" / "?" / "." / "="

   encoded-text = 1*<Any printable ASCII character other than "?"
                     or SPACE>
                  ; (but see "Use of encoded-words in message
                  ; headers", section 5)



----------------------------------------------------------------------

Comment By: Harald Hoyer (Red Hat) (saturn_de)
Date: 2006-11-29 23:21

Message:
Logged In: YES 
user_id=5540
Originator: NO

Hmm, I don't thin "," is allowed unencoded...

   token = 1*<Any CHAR except SPACE, CTLs, and especials>

   especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
               <"> / "/" / "[" / "]" / "?" / "." / "="

----------------------------------------------------------------------

Comment By: David Woodhouse (dwmw2)
Date: 2006-11-29 22:53

Message:
Logged In: YES 
user_id=81423
Originator: YES

Hm, good point; thanks. I've fixed the script which generates mail for
each commit to the Linux kernel git tree, and it should no longer generate
encoded-words longer than 75 characters.

I still see this input...

Subject:
=?UTF-8?Q?[MTD]_NAND:_CAF=C3=89_NAND_driver_cleanup,_fix_ECC_on_reading?=
        =?UTF-8?Q?_empty_flash?=

and this output...

Subject: =?UTF-8?Q?[MTD]_NAND:_CAF=C3=89_NAND_driver_cleanup,
        _fix_ECC_on_reading?= =?UTF-8?Q?_empty_flash?=

The comma is allowed, and doesn't have to be '=2C', does it? See §4.2 (3)
and §5 (1).

----------------------------------------------------------------------

Comment By: Harald Hoyer (Red Hat) (saturn_de)
Date: 2006-11-29 21:44

Message:
Logged In: YES 
user_id=5540
Originator: NO

http://www.ietf.org/rfc/rfc2047.txt

   An 'encoded-word' may not be more than 75 characters long, including
   'charset', 'encoding', 'encoded-text', and delimiters.  If it is
   desirable to encode more text than will fit in an 'encoded-word' of
   75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may
   be used.



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100103&aid=1605144&group_id=103


More information about the Mailman-coders mailing list