[ mailman-Bugs-449677 ] HyperArch.py assumes charsets are in \w+
Bugs item #449677, was opened at 2001-08-09 21:25 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100103&aid=449677&group_id=103
Category: Pipermail Group: 2.0.x Status: Open Resolution: None
Priority: 1 Submitted By: Ben Gertzfield (che_fox) Assigned to: Nobody/Anonymous (nobody) Summary: HyperArch.py assumes charsets are in \w+
Initial Comment: Using Mailman 2.0.6, I noticed that Japanese messages in charset iso-2022-jp are not archived correctly; their subject lines stay in MIME-encoded format, like
Subject: =?ISO-2022-JP?B?WxskQiQqTD5BMBsoQi5jb21dLmNvbS8uag==?=
etc.
I tracked this down to the following line in HyperArch.py:158
# content-type charset rx_charset = re.compile('charset="(\w+)"')
This is incorrect. According to the de-facto list of charsets at http://www.iana.org/assignments/character-sets charsets can have all sorts of characters outside of [a-zA-Z0-9_] , such as - ( ) : . etc. So, we must accept anything between the quotes with a fuzzy .+? match, instead of forcing \w+. Patch attached, against Mailman 2.0.6.
Comment By: Ben Gertzfield (che_fox) Date: 2001-08-09 22:53
Message: Logged In: YES user_id=89313
Note: this is the solution to the problem in Bug #431511.
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100103&aid=449677&group_id=103
participants (1)
-
noreply@sourceforge.net