[Python-3000] Displaying strings containing unicode escapes

Guido van Rossum guido at python.org
Wed Apr 16 17:43:05 CEST 2008


I just had a shower, and I think it's cleared my thoughts a bit. :-)

Clearly this is an important problem to those in countries where ASCII
doesn't cut it. And just like in Python 3000 we're using UTF-8 as the
default source encoding and allowing Unicode letters in identifiers, I
think we should bite the bullet and allow repr() of a string to pass
through all characters that the Unicode standard considers printable.
For those of us with less capable IO devices, setting the error flag
for stdout and stderr to backslashreplace is probably the best
solution (and it solves more problems than just repr()).

I will have another look at Atsuo's patch.

I do think we should use some kind of Unicode-standard-endorsed
definition of "printable" (as long as it excludes all ASCII escapes),
since there are plenty of undefined code points that even Japanese
people would probably prefer to see rendered as \uxxxx rather than
completely invisible. I'm also not sure what people would want to
happen for surrogate pairs. (OTOH an unpaired surrogate should be
rendered as \uxxxx.)

I expect that this will require some more research and agreement.
Perhaps someone can produce a draft PEP and attempt to sort out the
details of specification and implementation? It would also be nice if
it could be friendly to Jython, IronPython and PyPy.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list