[ mailman-Bugs-449677 ] HyperArch.py assumes charsets are in \w+

Bugs item #449677, was opened at 2001-08-10 03:25 Message generated for change (Comment added) made by cfaerber You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100103&aid=449677&group_id=103 Category: Pipermail Group: 2.0.x Status: Open Resolution: None Priority: 1 Submitted By: Ben Gertzfield (che_fox) Assigned to: Nobody/Anonymous (nobody) Summary: HyperArch.py assumes charsets are in \w+ Initial Comment: Using Mailman 2.0.6, I noticed that Japanese messages in charset iso-2022-jp are not archived correctly; their subject lines stay in MIME-encoded format, like Subject: =?ISO-2022-JP?B?WxskQiQqTD5BMBsoQi5jb21dLmNvbS8uag==?= etc. I tracked this down to the following line in HyperArch.py:158 # content-type charset rx_charset = re.compile('charset="(\w+)"') This is incorrect. According to the de-facto list of charsets at http://www.iana.org/assignments/character-sets charsets can have all sorts of characters outside of [a-zA-Z0-9_] , such as - ( ) : . etc. So, we must accept anything between the quotes with a fuzzy .+? match, instead of forcing \w+. Patch attached, against Mailman 2.0.6. ---------------------------------------------------------------------- Comment By: Claus Färber (cfaerber) Date: 2004-02-03 01:16 Message: Logged In: YES user_id=126984 This is still incorrect: The quotes are not required. There can be whitespace and folding around the "=" sign. A string matching 'charset="(\w+)"' can occur within other parameters. ---------------------------------------------------------------------- Comment By: Ben Gertzfield (che_fox) Date: 2001-08-10 04:53 Message: Logged In: YES user_id=89313 Note: this is the solution to the problem in Bug #431511. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100103&aid=449677&group_id=103
participants (1)
-
SourceForge.net