[I18n-sig] Re: CJKCodecs 0.9 is released
Martin v. Löwis
martin@v.loewis.de
21 Jun 2003 18:53:05 +0200
Hye-Shik Chang <perky@i18n.org> writes:
> So, I changed mapping for them to as cp950 does instead of U+FFFD or
> user-defined area. I think that's affordable.
Indeed. I'd encourage you to list all "critical" cases in the
documentation of your package. This is all tricky stuff, and opinions
vary widely. So users should be able to find out up-front what they
get - they are much more angry if they find out by surprise.
> Quoting Ken Lunde's CJKV Information Processing p.206 table 4-66:
> ] Table 4-66: Shift-JIS to Unicode and EUC-JP for User-Defined Region
> ]
> ] Shift-JIS Unicode EUC-JP
> ] F040-F0FC E000-E0BB F5A1-F5FE, F6A1-F6FE
> ] F140-F1FC E0BC-E177 F7A1-F7FE, F8A1-F8FE
> ] F240-F2FC E178-E233 F9A1-F9FE, FAA1-FAFE
> ] --snip--
> ] F940-F9FC E69C-E757 8FFDA1-8FFDFE, 8FFEA1-8FFEFE
Is this really necessary? Using PUA characters is evil, IMO, and
should be avoided unless explicitly requested by the application. If
those characters are not supported in Unicode, they can't be really
important, no?
Or, are you sure that they are still unsupported in Unicode 4.0?
> Okay. Here it is! :)
>
> CJK Japanese GNU WindowsXP
> 0080 - - - 0080
> 00a0 - - - f8f0
> 00fd - - - f8f1
> 00fe - - - f8f2
> 00ff - - - f8f3
>
> I'll add 0x80, 0xa0, 0xfd, 0xfe, 0xff to CJKCodecs's cp932 to conform
> Windows's real mapping.
This is, in fact, a place where the mapping-to-PUA might be
acceptable. CP932 is Microsoft's "private" encoding, anyway, so they
set the rules :-(
Regards,
Martin