[I18n-sig] Re: CJKCodecs 0.9 is released

Martin v. Löwis martin@v.loewis.de
21 Jun 2003 18:53:05 +0200


Hye-Shik Chang <perky@i18n.org> writes:

> So, I changed mapping for them to as cp950 does instead of U+FFFD or
> user-defined area. I think that's affordable.

Indeed. I'd encourage you to list all "critical" cases in the
documentation of your package. This is all tricky stuff, and opinions
vary widely. So users should be able to find out up-front what they
get - they are much more angry if they find out by surprise.

> Quoting Ken Lunde's CJKV Information Processing p.206 table 4-66:
> ] Table 4-66: Shift-JIS to Unicode and EUC-JP for User-Defined Region
> ]
> ] Shift-JIS     Unicode     EUC-JP
> ] F040-F0FC     E000-E0BB   F5A1-F5FE, F6A1-F6FE
> ] F140-F1FC     E0BC-E177   F7A1-F7FE, F8A1-F8FE
> ] F240-F2FC     E178-E233   F9A1-F9FE, FAA1-FAFE
> ] --snip--
> ] F940-F9FC     E69C-E757   8FFDA1-8FFDFE, 8FFEA1-8FFEFE

Is this really necessary? Using PUA characters is evil, IMO, and
should be avoided unless explicitly requested by the application.  If
those characters are not supported in Unicode, they can't be really
important, no?

Or, are you sure that they are still unsupported in Unicode 4.0?

> Okay. Here it is! :)
> 
>         CJK    Japanese GNU     WindowsXP
> 0080    -       -       -       0080
> 00a0    -       -       -       f8f0
> 00fd    -       -       -       f8f1
> 00fe    -       -       -       f8f2
> 00ff    -       -       -       f8f3
> 
> I'll add 0x80, 0xa0, 0xfd, 0xfe, 0xff to CJKCodecs's cp932 to conform
> Windows's real mapping.

This is, in fact, a place where the mapping-to-PUA might be
acceptable. CP932 is Microsoft's "private" encoding, anyway, so they
set the rules :-(

Regards,
Martin