[Mailman-Developers]
[ mailman-Bugs-449677 ] HyperArch.py assumes charsets are in \w+
noreply@sourceforge.net
noreply@sourceforge.net
Fri, 23 Aug 2002 14:57:03 -0700
Bugs item #449677, was opened at 2001-08-09 21:25
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100103&aid=449677&group_id=103
Category: Pipermail
>Group: 2.0.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Ben Gertzfield (che_fox)
Assigned to: Nobody/Anonymous (nobody)
Summary: HyperArch.py assumes charsets are in \w+
Initial Comment:
Using Mailman 2.0.6, I noticed that Japanese messages
in charset iso-2022-jp are not archived correctly;
their subject lines stay in MIME-encoded format, like
Subject:
=?ISO-2022-JP?B?WxskQiQqTD5BMBsoQi5jb21dLmNvbS8uag==?=
etc.
I tracked this down to the following line in
HyperArch.py:158
# content-type charset
rx_charset = re.compile('charset="(\w+)"')
This is incorrect. According to the de-facto list of
charsets at
http://www.iana.org/assignments/character-sets
charsets can have all sorts of characters outside of
[a-zA-Z0-9_] , such as - ( ) : . etc. So, we must
accept anything between the quotes with a fuzzy .+?
match, instead of forcing \w+. Patch attached,
against Mailman 2.0.6.
----------------------------------------------------------------------
Comment By: Ben Gertzfield (che_fox)
Date: 2001-08-09 22:53
Message:
Logged In: YES
user_id=89313
Note: this is the solution to the problem in Bug #431511.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=100103&aid=449677&group_id=103