[I18n-sig] JapaneseCodecs 1.4.8 released

Tom Emerson tree@basistech.com
Fri, 6 Sep 2002 11:44:38 -0400


Martin v. Loewis writes:
> Tom Emerson <tree@basistech.com> writes:
> > The "usual" recommendation is to map 0x5C to U+00A5 when dealing with
> > pure ShiftJIS and to U+005C when dealing with CP932.
> > 
> > There is a similar problem with 0x7E where it maps to different things
> > in ShiftJIS and CP932.
> 
> That indicates that JapanseCodecs should *not* treat shift-jis and
> cp932 as synonyms, right?

Yes, that is my feeling. However, there are complications: most
Japanese web pages that I've seen which claim to be in Shift JIS are
in fact CP932, which is why the two are seen to be synonymous.

The problem is even worse in Chinese, where pages claiming to be
encoded in GB2312 (which isn't even an encoding, its a character set,
but I digress ;) are actually in CP936, which has a significantly
larger character repertoire (i.e., all of Unicode 2.1's unified
ideographic block) than GB2312.

> > You answer your own question, sort of. The Consortium no longer
> > maintains the East Asian mapping tables (with the exception of JIS X
> > 0213, GB 18030, and HKSCS, where mappings are supplied by the
> > Japanese, Chinese, and Kong Kong SAR governments, respectively). This
> > has been a point of contention between me and the UTC, but they don't
> > want to and I don't have time.
> 
> Yes, but they claim that the UniHan database is a replacement. That
> appears to be the case for a lot of code points, but that database
> fails to document mappings for non-Hanzi, right?

That's a cop-out, and they know it. UniHAN cannot serve as a mapping
table source, and shouldn't be trusted.

-- 
Tom Emerson                                          Basis Technology Corp.
Software Architect                                 http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"