[Python-Dev] Adding Japanese Codecs to the distro
Martin v. Löwis
martin@v.loewis.de
16 Jan 2003 11:05:55 +0100
"M.-A. Lemburg" <mal@lemburg.com> writes:
> Thoughts ?
I'm in favour of adding support for Japanese codecs, but I wonder
whether we shouldn't incorporate the C version of the Japanese codecs
package instead, despite its size.
I would also suggest that it might be more worthwhile to expose
platform codecs, which would give us all CJK codecs on a number of
major platforms, with a minimum increase in the size of the Python
distribution, and with very good performance.
*If* Suzuki's code is incorporated, I'd like to get independent
confirmation that it is actually correct. I know Tamito has taken many
iterations until it was correct, where "correct" is a somewhat fuzzy
term, since there are some really tricky issues for which there is no
single one correct solution (like whether \x5c is a backslash or a Yen
sign, in these encodings). I notice (with surprise) that the actual
mapping tables are extracted from Java, through Jython.
I also dislike absence of the cp932 encoding in Suzuki's codecs. The
suggestion to equate this to "mbcs" on Windows is not convincing, as
a) "mbcs" does not mean cp932 on all Windows installations, and b)
cp932 needs to be processed on other systems, too. I *think* cp932
could be implemented as a delta to shift-jis, as shown in
http://hp.vector.co.jp/authors/VA003720/lpproj/test/cp932sj.htm
(although I wonder why they don't list the backslash issue as a
difference between shift-jis and cp932)
Regards,
Martin