[I18n-sig] JapaneseCodecs 1.4.8 released

Tamito KAJIYAMA kajiyama@grad.sccs.chukyo-u.ac.jp
Thu, 5 Sep 2002 18:25:57 +0900

martin@v.loewis.de (Martin v. Loewis) writes:
| > Japanese characters that will be changed their corresponding
| > code points in Unicode are the following 7 characters.
| > (Although only Shift_JIS code points are shown below, EUC-JP and
| > ISO-2022-JP codecs will also be changed.)
| > 
| > 1. Shift_JIS 0x81ca
| >    japanese.sjis  -> U+00ac (NOT SIGN)
| >    japanese.ms932 -> U+ffe2 (FULLWIDTH NOT SIGN)
| Can you please elaborate on the rationale for picking the Microsoft
| mapping over the Consortium's mapping?

The only one reason for choosing the Microsoft mapping is that
it seems better.  The Consortium's mapping has a problem that
both 0x5c and 0x815f in Shift_JIS are mapped to U+005c, which
is in turn mapped to 0x5c in Shift_JIS.  In other words, the
Consortium's mapping is one-to-many.  On the other hand, the
Microsoft's mapping is one-to-one.  There is no conversion
problem like the one in the Consortium's mapping.  That's why
I think the Microsoft's mapping is better.

To tell the truth, I don't care whether a Unicode character that
corresponds to a character in Shift_JIS is a full-width form or
not.  What I want to solve by choosing the Microsoft's mapping
is only the problem just mentioned above.

| I also fail to see the need to align those encodings, at all. Why is
| it necessary that SJIS -> Unicode -> MS932 works for all SJIS texts?

The interoperability of the MS932 codec and other codecs is a
plus.  I don't think it is necessary.  However, it seems not
preferable to me that a small package like JapaneseCodecs has
an interoperability problem due to differences among vendor-
specific mappings.

| You might consider supporting "transliteration", either by default, or
| by means of a sjis//translit (and ms932//translit) encoding: If people
| use this encoding, you still have the Shift_JIS->Unicode mapping as
| above, but you would *also* map U+ffe2 to 0x81ca in sjis, and U+00ac
| to 0x81ca in Shift_JIS. That may solve the problems people have with
| the status-quo, while preserving backwards compatibility (and also
| compatibility with, say, Linux glibc codecs - which use the
| Consortium's database).

Sorry, I not sure I've got the picture of what transliteration
support would do.  Transliteration support is meant to solve
interoperability problems due to differences among vendor-
specific mappings, right?  I believe it's worth tackling the
interoperability problems, but I've not intended to do so in
the next release of JapaneseCodecs.


KAJIYAMA, Tamito <kajiyama@grad.sccs.chukyo-u.ac.jp>