[I18n-sig] JapaneseCodecs 1.4.8 released

05 Sep 2002 00:32:12 +0200

Tamito KAJIYAMA <kajiyama@grad.sccs.chukyo-u.ac.jp> writes:

> Japanese characters that will be changed their corresponding
> code points in Unicode are the following 7 characters.
> (Although only Shift_JIS code points are shown below, EUC-JP and
> ISO-2022-JP codecs will also be changed.)
> 
> 1. Shift_JIS 0x81ca
>    japanese.sjis  -> U+00ac (NOT SIGN)
>    japanese.ms932 -> U+ffe2 (FULLWIDTH NOT SIGN)

Can you please elaborate on the rationale for picking the Microsoft
mapping over the Consortium's mapping? It appears that, if only a
single form is available SJIS, that Microsoft picks the FULLWIDTH
form, whereas the Consortium picks the default form.

Methinks that the consortium does the right thing, here: It *should*
be a matter of fonts or presentation how a NOT SIGN is displayed. If
SJIS gives users a choice to pick either the default form or the
full-width form, it is clear that the Unicode mapping should support
that choice. If SJIS users have no choice (as in 0x81ca), the SJIS
character should IMO be considered the default version - despite the
fact that SJIS-based fonts would usually display it in a double-wide
fashion.

I also fail to see the need to align those encodings, at all. Why is
it necessary that SJIS -> Unicode -> MS932 works for all SJIS texts?

You might consider supporting "transliteration", either by default, or
by means of a sjis//translit (and ms932//translit) encoding: If people
use this encoding, you still have the Shift_JIS->Unicode mapping as
above, but you would *also* map U+ffe2 to 0x81ca in sjis, and U+00ac
to 0x81ca in Shift_JIS. That may solve the problems people have with
the status-quo, while preserving backwards compatibility (and also
compatibility with, say, Linux glibc codecs - which use the
Consortium's database).

HTH,
Martin