[I18n-sig] Big5 Codecs
Tom Emerson
tree@basistech.com
Tue, 31 Oct 2000 14:39:41 -0500 (EST)
Frank J.S. Chen writes:
> > a) What source did you use for the mapping table?
>
> It follows the proposal issued by M.A. Lemburg.
> BIG5 encoding can map to Unicode encoding and reversely.
But the Unicode Consortium's mapping table does not round-trip Big 5
--- so where did you get the table?
> There are Level 1 and Level 2 in BIG5, so I define them apart.
> This table is complete, but I just make a small test, not well-tested
> indeed.
I have a few megabytes of Big Five encoded text --- I'll test it
out. ;-)
> > b) How do you handle EUDC code-points?
>
> What is EUDC code point? I cannot find this field name in
> the BMP layout.
EUDC are the End-User Defined Character region, the 3rd level of Big
5. Several groups, including HKUST, the Hong Kong government, and the
Taiwan military define characters in the 3rd region. Other Big 5
extensions, such as ETen, also use this block.
EUDC is divided into three segments: 0xFA40 -- 0xFEFE, 0x8E40 --
0xA0FE, and 0x8140 -- 0x8DFE.
-tree
--
Tom Emerson Basis Technology Corp.
Zenkaku Language Hacker http://www.basistech.com
"Beware the lollipop of mediocrity: lick it once and you suck forever"