[Python-Dev] Unicode debate

Christopher Petrilli petrilli@amber.org
Thu, 27 Apr 2000 12:48:16 -0400


Guido van Rossum [guido@python.org] wrote:
> I've heard a few people claim that strings should always be considered
> to contain "characters" and that there should be one character per
> string element.  I've also heard a clamoring that there should only be
> one string type.  You folks have never used Asian encodings.  In
> countries like Japan, China and Korea, encodings are a fact of life,
> and the most popular encodings are ASCII supersets that use a variable
> number of bytes per character, just like UTF-8.  Each country or
> language uses different encodings, even though their characters look
> mostly the same to western eyes.  UTF-8 and Unicode is having a hard
> time getting adopted in these countries because most software that
> people use deals only with the local encodings.  (Sounds familiar?)

Actually a bigger concern that we hear from our customers in Japan is
that Unicode has *serious* problems in asian languages.  Theey took
the "unification" of Chinese and Japanese, rather than both, and
therefore can not represent los of phrases quite right.  I can have
someone write up a better dscription, but I was told by several
Japanese people that they wouldn't use Unicode come hell or high
water, basically.

Basically it's JJIS, Shift-JIS or nothing for most Japanese
companies.  This was my experience working with Konica a few years ago 
as well.

Chris
-- 
| Christopher Petrilli
| petrilli@amber.org