[I18n-sig] Asian Encodings
Brian Takashi Hooper
brian@garage.co.jp
Wed, 22 Mar 2000 11:17:43 +0900
Hi again,
One other thing I forgot to mention, is that we'll have to start
thinking about (canonical) normalization, at least on a rudimentary
level, for Asian encodings - one specific example I can think of is in
Japanese with half-width katakana characters, there are a few
diacritical marks (dakuten) which are represented themselves as separate
characters - most encoding packages I've seen special case on these and
turn them into their corresponding canonical representations. Without
normalization, searches and processing for these characters become a bit
of pain.
So, one other goal of creating the East Asian codecs should also be to
add some normalization support to the existing framework... other
Unicode packages / implementations mostly use normalization form C for
everything.
Those that aren't familiar with Unicode Normalization Forms, here's the
technical report, which is a good reference:
http://www.unicode.org/unicode/reports/tr15/tr15-18.html
--Brian