[I18n-sig] Re: [Python-Dev] Re: [XML-SIG] Python 1.6a2 Unicode experiences?
Fri, 28 Apr 2000 06:56:50 -0400 (EDT)
M.-A. Lemburg writes:
> > > Unicode has many encodings: Shift-JIS, Big-5, EBCDIC ... You can use
> > > 8-bit encodings of Unicode if you want.
This is meaningless: legacy encodings of national character
sets such Shift-JIS, Big Five, GB2312, or TIS620 are not "encodings"
TIS620 is a single-byte, 8-bit encoding: each character is
represented by a single byte. The Japanese and Chinese encodings are
multibyte, 8-bit, encodings. ISO-2022 is a multi-byte, 7-bit encoding
for multiple character sets.
Unicode has several possible encodings: UTF-8, UCS-2, UCS-4,
UTF-16... You can view all of these as 8-bit encodings, if you
like. Some are multibyte (such as UTF-8, where each character in
Unicode is represented in 1 to 3 bytes) while others are fixed length,
two or four bytes per character.
> > Um, if you go:
> > JIS -> Unicode -> JIS
> > you don't get the same thing out that you put in (at least this is
> > what I've been told by a lot of Japanese developers), and therefore
> > it's not terribly popular because of the nature of the Japanese (and
> > Chinese) langauge.
This is simply not true any more. The ability to round trip between
Unicode and legacy encodings is dependent on the software: being able
to use code points in the PUA for this is acceptable and commonly
The big advantage is in using Unicode as a pivot when transcoding
between different CJK encodings. It is very difficult to map between,
say, Shift JIS and GB2312, directly. However, Unicode provides a good
It isn't a panacea: transcoding between legacy encodings like GB2312
and Big Five is still difficult: Unicode or not.
> > My experience with Unicode is that a lot of Western people think it's
> > the answer to every problem asked, while most asian language people
> > disagree vehemently. This says the problem isn't solved yet, even if
> > people wish to deny it.
This is a shame: it is an indication that they don't understand the
technology. Unicode is a tool: nothing more.
Tom Emerson Basis Technology Corp.
Language Hacker http://www.basistech.com
"Beware the lollipop of mediocrity: lick it once and you suck forever"