Tamito Kajiyama has written pure Python codecs for the two main Japanese
encodings!  Many thanks!

They include the 6879 characers in the JIS0208 character set in literal
Python dictionaries; so it should be trivial to write modified ones which
support vendor-specific extensions with a few extra characters, as long as
the extras are in Unicode.

I'm now rewriting something I did last year in-house for a customer - a
script to generate HTML tables and text files which exactly match the layout
of the code charts for JIS0208 in "CJKV Information Processing".  I ran
these through both codecs and viewed the results in IE5, and as far as I can
see the results are perfect.  I will post up my scripts when they look a bit
prettier :-)

It would be nice to put this code somewhere 'out there' so people can work
on it - not just codecs, but test suites.  How do people feel about starting
a project on www.sourceforge.net under CVS?

Since lots of us want to work on fast Asian codecs, another things we need
is a 'benchmark suite' - maybe a megabyte of Japanese text (mixing
everything - ASII, Kanji, half-width katakana?).  We can then use these pure
Python codecs as a baseline.

- Andy Robinson

From: Tamito KAJIYAMA <kajiyama@grad.sccs.chukyo-u.ac.jp>
To: <andy@reportlab.com>
> | >two Japanese character encodings EUC-JP and Shift_JIS.  The codecs
> | Many thanks for this!  I have copied it to the Internationalisation
> | Special Interest Group, where we discuss this stuff, and taken the
> | liberty of copying your message.
> Good news.  Thanks for the coordination.
> | We need to start coordinating a separate codecs library for
> | Asian languages, and I'd like to use this as a starting point
> | if OK with you.
> That's absolutely okay.  I'm grad if my codecs contribute to the
> the i18n SIG.  I joined the i18n-sig@python.org just after I got
> your message.  Please carry on the further discussion about the
> Japanese codecs (if any) in the list.
