[I18n-sig] naming codecs

Andy Robinson andy@reportlab.com
Tue, 2 Jan 2001 10:41:50 -0000

> I found some time to look into this, and it appears that
> your encoding
> deals with "JIS X 0201 Katakana", which I also found with the name
> "JIS X 0201 (GR)".
> I know you already found a name, but ... if your codec is indeed
> *only* JISX 0201 Katakana, then why not name it that way
> (e.g. "jisx-0201-katakana").
JIS X 0201 Katakana is a character set, not an encoding.  It defines
the half-width katakana characters (about 60 of them).  Japanese
encodings contain multiple character sets.  IS0-2022-JP is a 'way of
making encodings' and within this there can be many variants; he is
talking about a specific encoding which combines two character sets...

(1) The JIS 0208 character set, 1st and 2nd levels (about 7000
characters including symbols, numeric characters, Latin, Cyrillic and
Greek alphabets, Japanese HIRAGANA, KATAKANA, and KANJI),
(2) The JIS 0201 Katakana characters (which are about 60 half-width
variants different from the Katakana listed in JIS0208)

...all encoded according to ISO-2022-JP

The half width katakana are basically 'deprecated' - they predate the
ability to use Kanji in computers - but won't go away in practice, so
people in Japanese IT frequently need to extend codecs to deal with

I hope this explains a little further.  It is hard to understand this
without knowing a little about Japanese writing systems; Ken Lunde's
"CJKV" book does quite a good job of explaining it.


Andy Robinson