[Python-Dev] More Unicode support

M.-A. Lemburg mal@lemburg.com
Mon, 06 Nov 2000 10:14:12 +0100

Guido van Rossum wrote:
> [me]
> > > - Internationalization.  Barry knows what he wants here; I bet Martin
> > >   von Loewis and Marc-Andre Lemburg have ideas too.
> [MAL]
> > We'd need a few more codecs, support for the Unicode compression,
> > normalization and collation algorithms.
> Hm...  There's also the problem that there's no easy way to do Unicode
> I/O.  I'd like to have a way to turn a particular file into a Unicode
> output device (where the actual encoding might be UTF-8 or UTF-16 or a
> local encoding), which should mean that writing Unicode objects to the
> file should "do the right thing" (in particular should not try to
> coerce it to an 8-bit string using the default encoding first, like
> print and str() currently do) and that writing 8-bit string objects to
> it should first convert them to Unicode using the default encoding
> (meaning that at least ASCII strings can be written to a Unicode file
> without having to specify a conversion).  I support that reading from
> a "Unicode file" should always return a Unicode string object (even if
> the actual characters read all happen to fall in the ASCII range).
> This requires some serious changes to the current I/O mechanisms; in
> particular str() needs to be fixed, or perhaps a ustr() needs to be
> added that it used in certain cases.  Tricky, tricky!

It's not all that tricky since you can write a StreamRecoder
subclass which implements this. AFAIR, I posted such an implementation
on i18n-sig.

BTW, one of my patches on SF adds unistr(). Could be that it's
time to apply it :-)

Marc-Andre Lemburg
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/