[Mailman-Developers] handling multi-byte characters in templates
Tokio Kikuchi
tkikuchi@is.kochi-u.ac.jp
Fri, 20 Sep 2002 09:28:34 +0900
Jason,
Japanese is the most difficult language when you internationalize
applications. ;-)
1. it is multibyte
2. there are three coding schemes (although standad is one: JIS)
they are iso-2022-jp, shift-jis, and euc-jp
3. iso-2022-jp is used for mail and news messages.
you will hear many complaints if you use other code even
if it is followed by MIME scheme.
4. because iso-2022-jp is 7bit, it contains many special
characters like \,%,&,... (they are ESCaped)
5. among the three, euc-jp is the best for using in programming
because all the japanese characters are msb set 1.
(like UTF-8)
Therefore, japanese messages are best treated
1. use euc-jp within internal process of messages and patterns.
2. convert the message charset from iso-2022-jp to euc-jp, when it
first enter the processing pipeline.
3. convert again to iso-2022-jp when the message going out.
Jason R. Mastaler wrote:
> When Mailman.Utils.maketext() does string substitution in a template
> containing multi-byte characters (such as in templates/ja/), how does
> it avoid errors during dictionary interpolation?
euc-jp is used in the templates.
>
> TMDA is using a nearly identical function to make text from templates,
> but certain multi-byte characters (Japanese in particular) in the
> templates trigger the following exceptions:
>
> ValueError: incomplete format key
>
> TypeError: not enough arguments for format string
>
> Someone suggested that the Japanese text probably has characters in it
> that include an ascii % as part of the multi-byte character.
>
> I'm wondering how Mailman gets around this problem.
>
--
Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp
http://weather.is.kochi-u.ac.jp/