[I18n-sig] JapaneseCodecs 1.4.8 released
Tamito KAJIYAMA
kajiyama@grad.sccs.chukyo-u.ac.jp
Fri, 6 Sep 2002 11:05:17 +0900
martin@v.loewis.de (Martin v. Loewis) writes:
|
| > One addition: the mapping used in Java is also one-to-one so
| > that it may be another candidate.
|
| That is not true (according to the ICU data). Java maps U+00A5 to
| 0x5c, which it maps back to U+005C.
A test program showed that Java's mapping works as follows:
0x815f -> U+ff3c -> 0x815f
0x5c -> U+005c -> 0x5c
U+00a5 -> 0x5c
It is not true that Java's mapping is one-to-one. But both
0x815f and 0x5c show a round-trip, which is what I want to have.
The mapping of U+00a5 to 0x5c seems a fallback.
The test program and its execution result are shown below. I've
used Sun's J2SE 1.3 on Linux.
$ cat UnicodeTest1.java
class UnicodeTest1 {
public static void main(String args[]) {
try {
byte[] b = { -127, 95, 92 }; /* 0x815f, 0x5c */
String s = new String(b, "Shift_JIS") + "\u00a5";
System.out.print("Unicode: "); dump(s.getBytes("UnicodeBig"));
System.out.print("Shift_JIS:"); dump(s.getBytes("Shift_JIS"));
} catch (java.io.UnsupportedEncodingException e) {
e.printStackTrace();
}
}
public static void dump(byte[] b) {
for (int i = 0; i < b.length ; i++) {
String h = "0" + Integer.toHexString(b[i]);
System.out.print(" " + h.substring(h.length()-2, h.length()));
}
System.out.println();
}
}
$ javac UnicodeTest1.java
$ java UnicodeTest1
Unicode: fe ff ff 3c 00 5c 00 a5
Shift_JIS: 81 5f 5c 5c
$
Regards,
--
KAJIYAMA, Tamito <kajiyama@grad.sccs.chukyo-u.ac.jp>