[Mailman-Developers] [ mailman-Bugs-449677 ] HyperArch.py assumes charsets are in \w+

noreply@sourceforge.net noreply@sourceforge.net
Fri, 23 Aug 2002 14:57:03 -0700


Bugs item #449677, was opened at 2001-08-09 21:25
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100103&aid=449677&group_id=103

Category: Pipermail
>Group: 2.0.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Ben Gertzfield (che_fox)
Assigned to: Nobody/Anonymous (nobody)
Summary: HyperArch.py assumes charsets are in \w+

Initial Comment:
Using Mailman 2.0.6, I noticed that Japanese messages 
in charset iso-2022-jp are not archived correctly; 
their subject lines stay in MIME-encoded format, like 

Subject: 
=?ISO-2022-JP?B?WxskQiQqTD5BMBsoQi5jb21dLmNvbS8uag==?=

etc.

I tracked this down to the following line in 
HyperArch.py:158

# content-type charset
rx_charset = re.compile('charset="(\w+)"')

This is incorrect.  According to the de-facto list of 
charsets at 
http://www.iana.org/assignments/character-sets 
charsets can have all sorts of characters outside of 
[a-zA-Z0-9_] , such as - ( ) : . etc.  So, we must 
accept anything between the quotes with a fuzzy .+? 
match, instead of forcing \w+.  Patch attached, 
against Mailman 2.0.6.




----------------------------------------------------------------------

Comment By: Ben Gertzfield (che_fox)
Date: 2001-08-09 22:53

Message:
Logged In: YES 
user_id=89313

Note: this is the solution to the problem in Bug #431511.




----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100103&aid=449677&group_id=103