[I18n-sig] JapaneseCodecs 1.4.8 released
Thu, 5 Sep 2002 18:25:57 +0900
firstname.lastname@example.org (Martin v. Loewis) writes:
| > Japanese characters that will be changed their corresponding
| > code points in Unicode are the following 7 characters.
| > (Although only Shift_JIS code points are shown below, EUC-JP and
| > ISO-2022-JP codecs will also be changed.)
| > 1. Shift_JIS 0x81ca
| > japanese.sjis -> U+00ac (NOT SIGN)
| > japanese.ms932 -> U+ffe2 (FULLWIDTH NOT SIGN)
| Can you please elaborate on the rationale for picking the Microsoft
| mapping over the Consortium's mapping?
The only one reason for choosing the Microsoft mapping is that
it seems better. The Consortium's mapping has a problem that
both 0x5c and 0x815f in Shift_JIS are mapped to U+005c, which
is in turn mapped to 0x5c in Shift_JIS. In other words, the
Consortium's mapping is one-to-many. On the other hand, the
Microsoft's mapping is one-to-one. There is no conversion
problem like the one in the Consortium's mapping. That's why
I think the Microsoft's mapping is better.
To tell the truth, I don't care whether a Unicode character that
corresponds to a character in Shift_JIS is a full-width form or
not. What I want to solve by choosing the Microsoft's mapping
is only the problem just mentioned above.
| I also fail to see the need to align those encodings, at all. Why is
| it necessary that SJIS -> Unicode -> MS932 works for all SJIS texts?
The interoperability of the MS932 codec and other codecs is a
plus. I don't think it is necessary. However, it seems not
preferable to me that a small package like JapaneseCodecs has
an interoperability problem due to differences among vendor-
| You might consider supporting "transliteration", either by default, or
| by means of a sjis//translit (and ms932//translit) encoding: If people
| use this encoding, you still have the Shift_JIS->Unicode mapping as
| above, but you would *also* map U+ffe2 to 0x81ca in sjis, and U+00ac
| to 0x81ca in Shift_JIS. That may solve the problems people have with
| the status-quo, while preserving backwards compatibility (and also
| compatibility with, say, Linux glibc codecs - which use the
| Consortium's database).
Sorry, I not sure I've got the picture of what transliteration
support would do. Transliteration support is meant to solve
interoperability problems due to differences among vendor-
specific mappings, right? I believe it's worth tackling the
interoperability problems, but I've not intended to do so in
the next release of JapaneseCodecs.
KAJIYAMA, Tamito <email@example.com>