[I18n-sig] JapaneseCodecs 1.4.8 released

Tamito KAJIYAMA kajiyama@grad.sccs.chukyo-u.ac.jp
Thu, 5 Sep 2002 06:25:08 +0900


Hi,

I've released JapaneseCodecs 1.4.8.  The source tarball is
available at the following locations:

  http://www.asahi-net.or.jp/~rd6t-kjym/python/
  http://www.python.jp/Zope/download/JapaneseCodecs

Fixed are bugs in EUC-JP, Shift_JIS and MS932 codecs that failed
to encode U+00A5 and U+203E which originate from ISO-2022-JP and
its variant codecs.  I moved my home page recently, so that the
primary distribution site of and the author's e-mail address
also changed.

       *    *    *

By the way, I have a plan to change mappings between Unicode and
traditional Japanese encodings such as EUC-JP and Shift_JIS in
the next release of JapaneseCodecs.  The main reasons of the
change are (1) to improve the interoperabilities between the
japanese.ms932 codec and other codecs, and (2) to eliminate
non-revesibilities that exist in mappings between Unicode and
traditional Japanese encodings.

Japanese characters that will be changed their corresponding
code points in Unicode are the following 7 characters.
(Although only Shift_JIS code points are shown below, EUC-JP and
ISO-2022-JP codecs will also be changed.)

1. Shift_JIS 0x81ca
   japanese.sjis  -> U+00ac (NOT SIGN)
   japanese.ms932 -> U+ffe2 (FULLWIDTH NOT SIGN)
2. Shift_JIS 0x815f
   japanese.sjis  -> U+005c (REVERSE SOLIDUS)
   japanese.ms932 -> U+ff3c (FULLWIDTH REVERSE SOLIDUS)
3. Shift_JIS 0x8161
   japanese.sjis  -> U+2016 (DOUBLE VERTICAL LINE)
   japanese.ms932 -> U+2225 (PARALLEL TO)
4. Shift_JIS 0x8160
   japanese.sjis  -> U+301c (WAVE DASH)
   japanese.ms932 -> U+ff5e (FULLWIDTH TILDE)
5. Shift_JIS 0x817c
   japanese.sjis  -> U+2212 (MINUS SIGN)
   japanese.ms932 -> U+ff0d (FULLWIDTH HYPHEN-MINUS)
6. Shift_JIS 0x8191
   japanese.sjis  -> U+00a2 (CENT SIGN)
   japanese.ms932 -> U+ffe0 (FULLWIDTH CENT SIGN)
7. Shift_JIS 0x8192
   japanese.sjis  -> U+00a3 (POUND SIGN)
   japanese.ms932 -> U+ffe1 (FULLWIDTH POUND SIGN)

Due to the differences of the mappings shown above, for example,
decoding a byte string using japanese.ms932 and encoding the
Unicode string using japanese.sjis may raise a UnicodeError
saying "no corresponding character in Shift_JIS".

Also, there are non-reversibilities in the codecs for
traditional Japanese encodings.  For example, the code point
0x815f in Shift_JIS is mapped to U+005c (REVERSE SOLIDUS) when
decoded using japanese.sjis.  The code point U+005c in Unicode
in turn is mapped to 0x005c in Shift_JIS when encoded by the
same codec.  This non-reversible behavior of the mapping between
Shift_JIS and Unicode would be "correct" from the Unicode
Consortium's viewpoint, but in practice it would be desired that
mappings are reversible.  The same non-reversibility exists in
other codecs for traditional Japanese encodings.

Therefore, I'd like to change the mapping between Unicode and
the traditional Japanese encodings so that all codecs use the
same 7 code points in Unicode as japanese.ms932.

In my plan, for example, 0x815f in Shift_JIS, 0xa1c0 in EUC-JP,
and 0x2140 in ISO-2022-JP (JIS X 0208:1990) will be all mapped
to U+ff3c (FULLWIDTH REVERSE SOLIDUS) instead of U+005c (REVERSE
SOLIDUS).  The corresponding code points in Unicode for the
other 6 characters will be changed similarly.

This change means in effect that Microsoft's mappings will be
adopted instead of Unicode consortium's ones.

I think the reversibility of mappings is important.  However,
this change is not backward-compatible, so that it may affect
the existing systems and data.  I expect both pros and cons.
I really appreciate any kind of feedback.

I'd like, at the moment, not to support the current mappings in
the next release of JapaneseCodecs, since the maintenance cost
would be high otherwise, and also someone who needs the current
mappings can make use of older JapaneseCodecs.

Any comments and suggestions are welcome.

Thank you,

-- 
KAJIYAMA, Tamito <kajiyama@grad.sccs.chukyo-u.ac.jp>