"Tokio" == Tokio Kikuchi <tkikuchi@is.kochi-u.ac.jp> writes:
Tokio> Consider mailman get a spam from a foreign country and
Tokio> caused an error. Mailman may complain UnicodeDecodeError
Tokio> and spew an excerpt containing unknown charset string.
This really should not happen. Mailman should trap *all* UnicodeDecodeErrors at a very low level. (You simply cannot yet count on malformed message == SPAM in all contexts yet. Eg, just last week the Mac users here started flaming the Windows-using administration for distributing mojibake.)
Then it should wash the message to make it safe. RFC 2047-encode any 8-bit headers, and use a base64 Content-Transfer-Encoding for any 8-bit message bodies or body parts that don't have a known, approved charset specified. Bonus points for checking that 8-bit body parts with a specified charset actually conform to it.
Finally, reraise some kind of exception that can be handled at the filtering policy level.
-- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.