[Python-Dev] Unicode debate
Wed, 03 May 2000 10:56:08 +0200
Fredrik Lundh wrote:
> Guido van Rossum <email@example.com> wrote:
> > > What do we do about str( my_unicode_string )? Perhaps escape the Unicode
> > > characters with backslashed numbers?
> > Hm, good question. Tcl displays unknown characters as \x or \u
> > escapes. I think this may make more sense than raising an error.
> but that's on the display side of things, right? similar to
> repr, in other words.
> > But there must be a way to turn on Unicode-awareness on e.g. stdout
> > and then printing a Unicode object should not use str() (as it
> > currently does).
> to throw some extra gasoline on this, how about allowing
> str() to return unicode strings?
> (extra questions: how about renaming "unicode" to "string",
> and getting rid of "unichr"?)
> count to ten before replying, please.
1 2 3 4 5 6 7 8 9 10 ... ok ;-)
Guido's problem with printing Unicode can easily be solved
using the standard codecs.StreamRecoder class as I've done
in the example I posted some days ago.
Basically, what the stdout wrapper would do is take strings
as input, converting them to Unicode and then writing
them encoded to the original stdout. For Unicode objects
the conversion can be skipped and the encoded output written
directly to stdout.
This can be done for any encoding supported by Python; e.g.
you could do the indirection in site.py and then have
Unicode printed as Latin-1 or UTF-8 or one of the many
code pages supported through the mapping codec.
About having str() return Unicode objects: I see str()
as constructor for string objects and under that assumption
str() will always have to return string objects.
unicode() does the same for Unicode objects, so renaming
it to something else doesn't really help all that much.
BTW, __str__() has to return strings too. Perhaps we
need __unicode__() and a corresponding slot function too ?!
Python Pages: http://www.lemburg.com/python/