Fredrik Lundh wrote:
Guido van Rossum firstname.lastname@example.org wrote:
What do we do about str( my_unicode_string )? Perhaps escape the Unicode characters with backslashed numbers?
Hm, good question. Tcl displays unknown characters as \x or \u escapes. I think this may make more sense than raising an error.
but that's on the display side of things, right? similar to repr, in other words.
But there must be a way to turn on Unicode-awareness on e.g. stdout and then printing a Unicode object should not use str() (as it currently does).
to throw some extra gasoline on this, how about allowing str() to return unicode strings?
(extra questions: how about renaming "unicode" to "string", and getting rid of "unichr"?)
count to ten before replying, please.
1 2 3 4 5 6 7 8 9 10 ... ok ;-)
Guido's problem with printing Unicode can easily be solved using the standard codecs.StreamRecoder class as I've done in the example I posted some days ago.
Basically, what the stdout wrapper would do is take strings as input, converting them to Unicode and then writing them encoded to the original stdout. For Unicode objects the conversion can be skipped and the encoded output written directly to stdout.
This can be done for any encoding supported by Python; e.g. you could do the indirection in site.py and then have Unicode printed as Latin-1 or UTF-8 or one of the many code pages supported through the mapping codec.
About having str() return Unicode objects: I see str() as constructor for string objects and under that assumption str() will always have to return string objects. unicode() does the same for Unicode objects, so renaming it to something else doesn't really help all that much.
BTW, __str__() has to return strings too. Perhaps we need __unicode__() and a corresponding slot function too ?!