Problem with header encoding on 2.1.9 - any ideas?

Hi there, I have a problem with MailMan and Japanese ISO-2022-JP encoding. When a header includes a ";" as part of the ISO-2022-JP encoding, MailMan seems to replace it with "; " (note the extra space). This messes up the characters. Real-life example: Original: Subject: =?ISO-2022-JP?Q?607716139:_=1B$B%a%C%;!<%8%m%1!< %=3FF0:nIT6q9g=1B(B?= Mailman-sent: Subject: =?ISO-2022-JP?Q?607716139:_=1B$B%a%C%; !<%8%m%1!< %=3FF0:nIT6q9g=1B(B?= I tried looking at the code, but I couldn't figure it out and so I'm asking here first before I dig deeper. Does anybody know what might cause this? Thanks! Wout.

Wout Mertens wrote:
Hi there,
Hi,
I have a problem with MailMan and Japanese ISO-2022-JP encoding.
When a header includes a ";" as part of the ISO-2022-JP encoding, MailMan seems to replace it with "; " (note the extra space). This messes up the characters.
Real-life example:
Original: Subject: =?ISO-2022-JP?Q?607716139:_=1B$B%a%C%;!<%8%m%1!< %=3FF0:nIT6q9g=1B(B?=
Mailman-sent: Subject: =?ISO-2022-JP?Q?607716139:_=1B$B%a%C%; !<%8%m%1!< %=3FF0:nIT6q9g=1B(B?=
This is because the python email package can't distinguish between structured and un-structured RFC2822 headers. The Q-encoded iso-2022-jp string contains ';' character which cause the email package to think it is a syntactic separator, thus insert a space. Most Japanese capable mailers use B-encoding to avoid such confusion. Workaround is rather tricky but try add a subject_prefix like [listname] on the admin interface which may trigger normalization by the Mailman CookHeader module.
I tried looking at the code, but I couldn't figure it out and so I'm asking here first before I dig deeper. Does anybody know what might cause this?
Thanks!
Wout.
Cheers, -- Tokio Kikuchi, tkikuchi@is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

Hello Tokio, many thanks for your swift reply! Looking further at this, I also found a header that wasn't impacted by this, but it wasn't encoded either. This header is the same in both original and mailman-processed mails: X-IronPort-AV: E=Sophos;i="4.24,292,1196668800"; d="scan'208";a="8300346" note the second line, there the ";" is without a space. So it seems to me that MailMan only changes certain headers? Do you think there's an easy way to configure MailMan to leave the Subject line alone? Thanks, Wout. On Jan 18, 2008, at 1:40 AM, Tokio Kikuchi wrote:
Wout Mertens wrote:
Hi there,
Hi,
I have a problem with MailMan and Japanese ISO-2022-JP encoding. When a header includes a ";" as part of the ISO-2022-JP encoding, MailMan seems to replace it with "; " (note the extra space). This messes up the characters. Real-life example: Original: Subject: =?ISO-2022-JP?Q?607716139:_=1B$B%a%C%;!<%8%m%1!< %=3FF0:nIT6q9g=1B(B?= Mailman-sent: Subject: =?ISO-2022-JP?Q?607716139:_=1B$B%a%C%; !<%8%m%1!< %=3FF0:nIT6q9g=1B(B?=
This is because the python email package can't distinguish between structured and un-structured RFC2822 headers. The Q-encoded iso-2022-jp string contains ';' character which cause the email package to think it is a syntactic separator, thus insert a space. Most Japanese capable mailers use B-encoding to avoid such confusion.
Workaround is rather tricky but try add a subject_prefix like [listname] on the admin interface which may trigger normalization by the Mailman CookHeader module.
I tried looking at the code, but I couldn't figure it out and so I'm asking here first before I dig deeper. Does anybody know what might cause this? Thanks! Wout.
Cheers,
-- Tokio Kikuchi, tkikuchi@is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

Hi, On Jan 18, 2008, at 1:40 AM, Tokio Kikuchi wrote:
Wout Mertens wrote:
I have a problem with MailMan and Japanese ISO-2022-JP encoding. When a header includes a ";" as part of the ISO-2022-JP encoding, MailMan seems to replace it with "; " (note the extra space). This messes up the characters. Real-life example: Original: Subject: =?ISO-2022-JP?Q?607716139:_=1B$B%a%C%;!<%8%m%1!< %=3FF0:nIT6q9g=1B(B?= Mailman-sent: Subject: =?ISO-2022-JP?Q?607716139:_=1B$B%a%C%; !<%8%m%1!< %=3FF0:nIT6q9g=1B(B?=
This is because the python email package can't distinguish between structured and un-structured RFC2822 headers. The Q-encoded iso-2022-jp string contains ';' character which cause the email package to think it is a syntactic separator, thus insert a space. Most Japanese capable mailers use B-encoding to avoid such confusion.
Workaround is rather tricky but try add a subject_prefix like [listname] on the admin interface which may trigger normalization by the Mailman CookHeader module.
That workaround didn't work. Instead, I used procmail with the following rule: ##################### # Fix a mailman issue: When encountering Q-encoded charset headers (RFC2047), # mailman will add a space after ';'. Replace it with the hex encoding instead. :0 fWh * Subject: *=\?ISO-2022-JP\?Q | sed '/=\?ISO-2022-JP\?[qQ]\?/s/;/=3B/g' ##################### Obviously you need to pipe the mails through procmail before giving them to mailman then, but at least this is a functioning workaround. I consider this behaviour to be a bug. I'm just not sure where the bug is, inside MailMan or inside the email.Header package. Tokio, do you have any idea? Thanks, Wout.
participants (2)
-
Tokio Kikuchi
-
Wout Mertens