[ mailman-Bugs-449677 ] HyperArch.py assumes charsets are in \w+

SourceForge.net noreply at sourceforge.net
Mon Feb 2 19:16:09 EST 2004


Bugs item #449677, was opened at 2001-08-10 03:25
Message generated for change (Comment added) made by cfaerber
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100103&aid=449677&group_id=103

Category: Pipermail
Group: 2.0.x
Status: Open
Resolution: None
Priority: 1
Submitted By: Ben Gertzfield (che_fox)
Assigned to: Nobody/Anonymous (nobody)
Summary: HyperArch.py assumes charsets are in \w+

Initial Comment:
Using Mailman 2.0.6, I noticed that Japanese messages 
in charset iso-2022-jp are not archived correctly; 
their subject lines stay in MIME-encoded format, like 

Subject: 
=?ISO-2022-JP?B?WxskQiQqTD5BMBsoQi5jb21dLmNvbS8uag==?=

etc.

I tracked this down to the following line in 
HyperArch.py:158

# content-type charset
rx_charset = re.compile('charset="(\w+)"')

This is incorrect.  According to the de-facto list of 
charsets at 
http://www.iana.org/assignments/character-sets 
charsets can have all sorts of characters outside of 
[a-zA-Z0-9_] , such as - ( ) : . etc.  So, we must 
accept anything between the quotes with a fuzzy .+? 
match, instead of forcing \w+.  Patch attached, 
against Mailman 2.0.6.




----------------------------------------------------------------------

Comment By: Claus Färber (cfaerber)
Date: 2004-02-03 01:16

Message:
Logged In: YES 
user_id=126984

This is still incorrect:
The quotes are not required.
There can be whitespace and folding around the "=" sign.
A string matching 'charset="(\w+)"' can occur within other 
parameters.

----------------------------------------------------------------------

Comment By: Ben Gertzfield (che_fox)
Date: 2001-08-10 04:53

Message:
Logged In: YES 
user_id=89313

Note: this is the solution to the problem in Bug #431511.




----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=100103&aid=449677&group_id=103



More information about the Mailman-coders mailing list