[ mailman-Bugs-1605144 ] mailman corrupts RFC2047-encoded headers
data:image/s3,"s3://crabby-images/bdc73/bdc73c5eb5629f821ba74621d6cacedf4be2424d" alt=""
Bugs item #1605144, was opened at 2006-11-29 21:28 Message generated for change (Comment added) made by chrissamuel You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100103&aid=1605144&group_id=103 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: mail delivery Group: 2.1 (stable) Status: Open Resolution: None Priority: 5 Private: No Submitted By: David Woodhouse (dwmw2) Assigned to: Nobody/Anonymous (nobody) Summary: mailman corrupts RFC2047-encoded headers Initial Comment: Given an input like this: Subject: =?UTF-8?Q?[MTD]=20NAND:=20CAF=C3=89=20NAND=20driver=20cleanup,=20fix=20ECC=20on=20reading=20empty=20flash?= Mailman appears to emit mail like this: Subject: =?UTF-8?Q?[MTD]=20NAND:=20CAF=C3=89=20NAND=20driver=20cleanup, =20fix=20ECC=20on=20reading=20empty=20flash?= The input was RFC2047-compliant. The output isn't. ---------------------------------------------------------------------- Comment By: Chris Samuel (chrissamuel) Date: 2007-04-10 10:27 Message: Logged In: YES user_id=1581966 Originator: NO You can disable header wrapping in the module (if I am looking at the correct Python docs) according to this page: http://docs.python.org/lib/module-email.generator.html It implies that by passing through maxheaderlen set to 0 to all calls of Generator then you shouldn't get this wrapping behaviour, though I don't know when this appeared in Python. I believe this may also be the the cause of Mailman breaking my PGP/MIME messages as diff'ing the saved original and the version that comes back shows that the only differences are for long MIME headers and for reformatting of the headers in the message/rfc822 attached email. I am not sure if this is related to 815297, but it sure looks like it. Caveat: I am not a Python programmer, just a Postmaster.. ---------------------------------------------------------------------- Comment By: Tokio Kikuchi (tkikuchi) Date: 2006-11-30 23:46 Message: Logged In: YES user_id=67709 Originator: NO This is derived from the python email module behavior that try to keep a header line within 78 characters. Mailman parses the message first and do something like adding subject prefix or message body header/footer then regenerate RFC-2822 message. Email module thinks your subject has two part structure separated by a comma and split by CRLF. I am not very sure but current version of email doesn't distinguish structured and unstructured headers defined in 2.2.1 and 2.2.2 of RFC-2822. Anyway, It is safer to shorten the header lines within 78 charcters. FYI, email module generates your subject header like this: Subject: =?utf-8?q?=5BMTD=5D_NAND=3A_CAF=C3=89_NAND_driver_cleanup=2C_fix?= =?utf-8?q?_ECC_on_reading_empty_flash?= ---------------------------------------------------------------------- Comment By: David Woodhouse (dwmw2) Date: 2006-11-30 01:00 Message: Logged In: YES user_id=81423 Originator: YES Your thunderbird also refuses to send this: To: Some people : ; Bcc: foo@bar.org, baz@turnip.com Thunderbird isn't necessarily the best test of what's valid :) The pertinent question is: why is mailman munging this _anyway_? Why can't it just pass the header through as it was originally sent? If I put line breaks in and lined things up sensibly like a SpamAssassin report does, why should that be mangled by mailman? ---------------------------------------------------------------------- Comment By: Harald Hoyer (Red Hat) (saturn_de) Date: 2006-11-30 00:46 Message: Logged In: YES user_id=5540 Originator: NO Hmm, my thunderbird encodes "," with =2C ---------------------------------------------------------------------- Comment By: David Woodhouse (dwmw2) Date: 2006-11-30 00:15 Message: Logged In: YES user_id=81423 Originator: YES That's only for the charset (UTF-8) and the encoding (Q). The comma appears in the encoded-text, and should be fine (since this is a Subject: header and hence comes under paragraph (1) of §5. encoded-word = "=?" charset "?" encoding "?" encoded-text "?=" charset = token ; see section 3 encoding = token ; see section 4 token = 1*<Any CHAR except SPACE, CTLs, and especials> especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / " <"> / "/" / "[" / "]" / "?" / "." / "=" encoded-text = 1*<Any printable ASCII character other than "?" or SPACE> ; (but see "Use of encoded-words in message ; headers", section 5) ---------------------------------------------------------------------- Comment By: Harald Hoyer (Red Hat) (saturn_de) Date: 2006-11-29 23:21 Message: Logged In: YES user_id=5540 Originator: NO Hmm, I don't thin "," is allowed unencoded... token = 1*<Any CHAR except SPACE, CTLs, and especials> especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / " <"> / "/" / "[" / "]" / "?" / "." / "=" ---------------------------------------------------------------------- Comment By: David Woodhouse (dwmw2) Date: 2006-11-29 22:53 Message: Logged In: YES user_id=81423 Originator: YES Hm, good point; thanks. I've fixed the script which generates mail for each commit to the Linux kernel git tree, and it should no longer generate encoded-words longer than 75 characters. I still see this input... Subject: =?UTF-8?Q?[MTD]_NAND:_CAF=C3=89_NAND_driver_cleanup,_fix_ECC_on_reading?= =?UTF-8?Q?_empty_flash?= and this output... Subject: =?UTF-8?Q?[MTD]_NAND:_CAF=C3=89_NAND_driver_cleanup, _fix_ECC_on_reading?= =?UTF-8?Q?_empty_flash?= The comma is allowed, and doesn't have to be '=2C', does it? See §4.2 (3) and §5 (1). ---------------------------------------------------------------------- Comment By: Harald Hoyer (Red Hat) (saturn_de) Date: 2006-11-29 21:44 Message: Logged In: YES user_id=5540 Originator: NO http://www.ietf.org/rfc/rfc2047.txt An 'encoded-word' may not be more than 75 characters long, including 'charset', 'encoding', 'encoded-text', and delimiters. If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100103&aid=1605144&group_id=103
participants (1)
-
SourceForge.net