[I18n-sig] Changing case

M.-A. Lemburg mal@lemburg.com
Wed, 12 Apr 2000 11:30:40 +0200

[CCing to i18n too]

Andy Robinson wrote:
> > To make all this work without too many hassles we'd need
> > (at least the most commonly used) CJKV codecs in the core
> > distribution. How big would these be ? Would someone contribute
> > them... Tamito ?
> >
> He may be at home by now, but he indicated to me that he was
> happy for them to be used in any way.  The nice things about
> his codecs are
> (a) one could extract the mapping tables for other codecs
>     from data at www.unicode org and use a very similar
>     approach.
> (b) the mappings may be 168k, but they at least zip nicely.
>     I'm guessing at 5-6 such codecs in the distribution
>     initially.
> (c) the algorithmic bit can be accelerated later in C or our
>     vaporware state machine, and nobody needs to change
>     any interfaces.
> (d) if we slightly parameterise his codecs so that one could
>     substitute a different mapping table if needed, then
>     all the corporate variations just need to create a
>     new dictionary with the deltas - Microsoft Code Page
>     932 would not be another 168k, but just a few k and
>     could build its mapping on the fly.

Sounds ok to me.
> However, I suspect putting it in the core for June 1st may
> be too aggressive; if the compiler is going to use them on
> every source file for a Japanese user, we really want to
> move from byte-level loops in Python to something much faster.

Speed is not an issue now: what we need is a good concept
and some proof-of-concept code to go with it.

BTW, all this will go into 1.7 AFAIK... 1.6 will have to do
with what's there now. I may get a patch done for the -e
command line switch -- but only as experimental feature
in 1.6.

Unfortunately, Guido's out at the moment, so he can't
comment on this...

Marc-Andre Lemburg
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/