[Patches] [ python-Patches-666484 ] Japanese Unicode Codecs

Thu, 16 Jan 2003 01:28:17 -0800

Patches item #666484, was opened at 2003-01-12 04:46
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=666484&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 3
Submitted By: SUZUKI Hisao (suzuki_hisao)
Assigned to: Nobody/Anonymous (nobody)
Summary: Japanese Unicode Codecs

Initial Comment:
This is an implementation of a set of Japanese Unicode
codecs
for Python 2.2 and 2.3.  Three major encodings are
supported:
EUC-JP, Shift_JIS and ISO-2022-JP.

It is in pure Python, of a reasonable size (< 80KB),
and with
an effective means to modify the mapping tables.

----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2003-01-16 10:28

Message:
Logged In: YES 
user_id=38388

Sorry for not having read the README earlier. 

You do have a point in that it is useful to be able to modify 
encodings in user-specific ways. Of course, this needs to 
be done by creating new codecs and Python files sure
make this process easier.

Now, AFAIK, none of the current Python developers know 
much about Japanese, so we'd need a maintainer for the 
codecs. If you would be able to take over this part, then
I see a good chance of getting the codecs into the Python
core (Tamito's codecs didn't get accepted for the core
distribution because of their size).

Perhaps you could team up with Tamito in this effort ?!

----------------------------------------------------------------------

Comment By: SUZUKI Hisao (suzuki_hisao)
Date: 2003-01-16 01:22

Message:
Logged In: YES 
user_id=495142

Yes, I know KAJIYAMA's work from version 1.0 to
version 1.4.9.  Indeed I had contributed a patch
to JapaneseCodecs-1.2.  Please read the README
file included in the tar-ball for rationale of
ja-codecs.

As for the efficiency, ja-codecs is fairly fast
and small in practice.  In addition, its mapping
possesses a good mathematical property,
encode(decode(c)) == c for every valid character
c, which is pragmatically useful for many
applications.  (The last version (1.4.9) of
KAJIYAMA's codecs has also remedied it on a
particular character: REVERSE SOLIDUS.  It seems
to lack a validation test like that of
ja-codes-0.6/ja/map_jisx206.py, though.)

As you know, KAJIYAMA's codecs set does not also
cover all the encodings used in Japan today.  For
example, it does not support those of Macintosh.
It might be almost impossible to make a perfect
set of codecs in a realistic size.  It would be
best for "standard library" to prepare a few
"standard" (based on public specifications and in
use over various platforms) encodings, which can
be _easily_ modified by users/developers in order
to be adapted to their specific platforms (in the
spirit of "open source" ;-).

So I think it would be mandatory for Japanese
codecs of standard library to be written in Python
cleanly as well as efficiently enough, or at
least, to effectively allow users to modify
character mappings.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2003-01-12 13:33

Message:
Logged In: YES 
user_id=38388

Are you aware of the codecs written by Tamito KAJIYAMA ?

   http://www.asahi-net.or.jp/~rd6t-kjym/python/

These are written in C and provide a much improved performance
over Python based ones. They cover the same set of encodings you
have in your packagea dn also include a complete test suite
for the
codecs.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=666484&group_id=5470