[I18n-sig] Random thoughts on Unicode and Python

Andy Robinson andy@reportlab.com
Mon, 12 Feb 2001 08:19:09 -0000


> -----Original Message-----
> From: M.-A. Lemburg [mailto:mal@lemburg.com]
> Sent: 11 February 2001 13:23
> To: tree@basistech.com
> Cc: Andy Robinson; i18n-sig@python.org
> Subject: Re: [I18n-sig] Random thoughts on Unicode and Python
> 
> 
> Tom Emerson wrote:
> > 
> > Andy Robinson writes:
> > > (1) user defined characters:  the big three Japanese encodings
> > > use the Kuten space of 94x94 characters. There are lots 
> of slight
> > > venddor variations on the basic JIS0208 character set, as well
> > > as people adding new Gaiji in their office workgroups. Generic
> > > conversion routines from, say, EUC to Shift-JIS still work
> > > perfectly whether you use Shift-JIS, cp932, or cp932 plus
> > > ten extra in-house characters.  Conversions to Unicode involve
> > > selecting new codecs, or even making new ones, for all these
> > > situations.
> > 
> > There is no reason that we couldn't provide a set of 
> unified codecs
> > for EUC-JP, Shift JIS, ISO-2022-JP, and CP932 that 
> provide appropriate
> > mappings between the EUDC sections in the legacy 
> character sets and
> > the PUA of Unicode, such that these conversions work.
> 
> Right.

Exactly. Both the problems I mentioned can and should be solved 
properly with Unicode.  I'm just noting that a while bunch
of people have solved them without Unicode in the past
and that's where to look for code that will break.

- Andy

p.s. and yes, I'm working on those extended codecs now.