[Python-Dev] Adding Japanese Codecs to the distro
Atsuo Ishimoto
ishimoto@axissoft.co.jp
Thu, 16 Jan 2003 20:08:21 +0900
Hello from Japan,
On 16 Jan 2003 11:05:55 +0100
martin@v.loewis.de (Martin v. Lvwis) wrote:
> "M.-A. Lemburg" <mal@lemburg.com> writes:
>
> > Thoughts ?
>
> I'm in favour of adding support for Japanese codecs, but I wonder
> whether we shouldn't incorporate the C version of the Japanese codecs
> package instead, despite its size.
I also vote for JapaneseCodec.
Talking about it's size, JapaneseCodec package is much lager because it
contains both C version and pure Python version. Size of C version part
of JapaneseCodec is about 160kb(compiled on Windows platform), and I
don't think it makes difference.
> *If* Suzuki's code is incorporated, I'd like to get independent
> confirmation that it is actually correct. I know Tamito has taken many
> iterations until it was correct, where "correct" is a somewhat fuzzy
> term, since there are some really tricky issues for which there is no
> single one correct solution (like whether \x5c is a backslash or a Yen
> sign, in these encodings).
Yes, Tamito's JapaneseCodec has been used for years by many Japanese
users, while I've never heard about Suzuki's one.
> mapping tables are extracted from Java, through Jython.
>
> I also dislike absence of the cp932 encoding in Suzuki's codecs. The
> suggestion to equate this to "mbcs" on Windows is not convincing, as
> a) "mbcs" does not mean cp932 on all Windows installations, and b)
> cp932 needs to be processed on other systems, too.
Agreed.
> I *think* cp932
> could be implemented as a delta to shift-jis, as shown in
>
> http://hp.vector.co.jp/authors/VA003720/lpproj/test/cp932sj.htm
>
> (although I wonder why they don't list the backslash issue as a
> difference between shift-jis and cp932)
>
http://www.ingrid.org/java/i18n/unicode-utf8.html may be better
reference. This page is written in English with utf-8.
--------------------------
Atsuo Ishimoto
ishimoto@gembook.org
Homepage:http://www.gembook.jp