[ mailman-Bugs-1605144 ] mailman corrupts RFC2047-encoded headers
SourceForge.net
noreply at sourceforge.net
Thu Nov 30 13:46:51 CET 2006
Bugs item #1605144, was opened at 2006-11-29 10:28
Message generated for change (Comment added) made by tkikuchi
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100103&aid=1605144&group_id=103
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: mail delivery
Group: 2.1 (stable)
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: David Woodhouse (dwmw2)
Assigned to: Nobody/Anonymous (nobody)
Summary: mailman corrupts RFC2047-encoded headers
Initial Comment:
Given an input like this:
Subject: =?UTF-8?Q?[MTD]=20NAND:=20CAF=C3=89=20NAND=20driver=20cleanup,=20fix=20ECC=20on=20reading=20empty=20flash?=
Mailman appears to emit mail like this:
Subject: =?UTF-8?Q?[MTD]=20NAND:=20CAF=C3=89=20NAND=20driver=20cleanup,
=20fix=20ECC=20on=20reading=20empty=20flash?=
The input was RFC2047-compliant. The output isn't.
----------------------------------------------------------------------
>Comment By: Tokio Kikuchi (tkikuchi)
Date: 2006-11-30 12:46
Message:
Logged In: YES
user_id=67709
Originator: NO
This is derived from the python email module behavior that try to keep a
header line within 78 characters. Mailman parses the message first and do
something like adding subject prefix or message body header/footer then
regenerate RFC-2822 message. Email module thinks your subject has two
part structure separated by a comma and split by CRLF. I am not very sure
but current version of email doesn't distinguish structured and
unstructured headers defined in 2.2.1 and 2.2.2 of RFC-2822. Anyway, It
is safer to shorten the header lines within 78 charcters.
FYI, email module generates your subject header like this:
Subject:
=?utf-8?q?=5BMTD=5D_NAND=3A_CAF=C3=89_NAND_driver_cleanup=2C_fix?=
=?utf-8?q?_ECC_on_reading_empty_flash?=
----------------------------------------------------------------------
Comment By: David Woodhouse (dwmw2)
Date: 2006-11-29 14:00
Message:
Logged In: YES
user_id=81423
Originator: YES
Your thunderbird also refuses to send this:
To: Some people : ;
Bcc: foo at bar.org, baz at turnip.com
Thunderbird isn't necessarily the best test of what's valid :)
The pertinent question is: why is mailman munging this _anyway_? Why can't
it just pass the header through as it was originally sent? If I put line
breaks in and lined things up sensibly like a SpamAssassin report does,
why should that be mangled by mailman?
----------------------------------------------------------------------
Comment By: Harald Hoyer (Red Hat) (saturn_de)
Date: 2006-11-29 13:46
Message:
Logged In: YES
user_id=5540
Originator: NO
Hmm, my thunderbird encodes "," with =2C
----------------------------------------------------------------------
Comment By: David Woodhouse (dwmw2)
Date: 2006-11-29 13:15
Message:
Logged In: YES
user_id=81423
Originator: YES
That's only for the charset (UTF-8) and the encoding (Q). The comma
appears in the encoded-text, and should be fine (since this is a Subject:
header and hence comes under paragraph (1) of §5.
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
charset = token ; see section 3
encoding = token ; see section 4
token = 1*<Any CHAR except SPACE, CTLs, and especials>
especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
<"> / "/" / "[" / "]" / "?" / "." / "="
encoded-text = 1*<Any printable ASCII character other than "?"
or SPACE>
; (but see "Use of encoded-words in message
; headers", section 5)
----------------------------------------------------------------------
Comment By: Harald Hoyer (Red Hat) (saturn_de)
Date: 2006-11-29 12:21
Message:
Logged In: YES
user_id=5540
Originator: NO
Hmm, I don't thin "," is allowed unencoded...
token = 1*<Any CHAR except SPACE, CTLs, and especials>
especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
<"> / "/" / "[" / "]" / "?" / "." / "="
----------------------------------------------------------------------
Comment By: David Woodhouse (dwmw2)
Date: 2006-11-29 11:53
Message:
Logged In: YES
user_id=81423
Originator: YES
Hm, good point; thanks. I've fixed the script which generates mail for
each commit to the Linux kernel git tree, and it should no longer generate
encoded-words longer than 75 characters.
I still see this input...
Subject:
=?UTF-8?Q?[MTD]_NAND:_CAF=C3=89_NAND_driver_cleanup,_fix_ECC_on_reading?=
=?UTF-8?Q?_empty_flash?=
and this output...
Subject: =?UTF-8?Q?[MTD]_NAND:_CAF=C3=89_NAND_driver_cleanup,
_fix_ECC_on_reading?= =?UTF-8?Q?_empty_flash?=
The comma is allowed, and doesn't have to be '=2C', does it? See §4.2 (3)
and §5 (1).
----------------------------------------------------------------------
Comment By: Harald Hoyer (Red Hat) (saturn_de)
Date: 2006-11-29 10:44
Message:
Logged In: YES
user_id=5540
Originator: NO
http://www.ietf.org/rfc/rfc2047.txt
An 'encoded-word' may not be more than 75 characters long, including
'charset', 'encoding', 'encoded-text', and delimiters. If it is
desirable to encode more text than will fit in an 'encoded-word' of
75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may
be used.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100103&aid=1605144&group_id=103
More information about the Mailman-coders
mailing list