[Python-3000] Displaying strings containing unicode escapes at the interactive prompt (was Re: Recursive str)

Nick Coghlan ncoghlan at gmail.com
Wed Apr 16 14:11:13 CEST 2008


atsuo ishimoto wrote:
> Using repr() to build output string is common practice in Python world,
> so repr() is called everywhere in Python-core and third-party applications
> to print objects, emitting logs, etc.,.
> 
> For example,
> 
>>>> f = open("日本語")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "c:\ww\Python-3.0a4-orig\lib\io.py", line 212, in __new__
>     return open(*args, **kwargs)
>   File "c:\ww\Python-3.0a4-orig\lib\io.py", line 151, in open
>     closefd)
> IOError: [Errno 2] No such file or directory: '\u65e5\u672c\u8a9e'
> 
> This is annoying error message. Or, in Python 2,
> 
>>>> f = open(u"日本語", "w")
>>>> f
> <open file u'\u65e5\u672c\u8a9e', mode 'w' at 0x009370F8>
> 
> This repr()ed form is difficult to read. When Japanese (or Chinise)
> programmers look u'\u65e5\u672c\u8a9e',  they'll have strong
> impression that Python is not intended to be used in their country.

This is starting to seem to me more like something to be addressed 
through sys.displayhook/excepthook at the interactive interpreter level 
than it is to be dealt with through changes to any __repr__() 
implementations.

Given the following setup code:

def replace_escapes(escaped_str):
     return escaped_str.encode('latin-1').decode('unicode_escape')

def displayhook_unicode(expr_result):
   if expr_result is not None:
     __builtins__._ = expr_result
     print(replace_escapes(repr(expr_result)))

from traceback import format_exception
def excepthook_unicode(*exc_details):
     msg = ''.join(format_exception(*exc_details))
     print(replace_escapes(msg), end='')

import sys
sys.displayhook = displayhook_unicode
sys.excepthook = excepthook_unicode

I get the following behaviour:

 >>> "\u65e5\u672c\u8a9e"
'日本語'
 >>> print("\u65e5\u672c\u8a9e")
日本語
 >>> '日本語'
'日本語'
 >>> print('日本語')
日本語
 >>> 日本語 = 1
 >>> 日本語
1
 >>> dir()
['__builtins__', '__doc__', '__name__', '__package__', 
'displayhook_unicode', 'excepthook_unicode', 'format_exception', 
'replace_escapes', 'sys', '日本語']
 >>> b"\u65e5\u672c\u8a9e"
b'\u65e5\u672c\u8a9e'
 >>> print(b"\u65e5\u672c\u8a9e")
b'\\u65e5\\u672c\\u8a9e'
 >>> f = open("\u65e5\u672c\u8a9e")
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/home/ncoghlan/devel/py3k/Lib/io.py", line 212, in __new__
     return open(*args, **kwargs)
   File "/home/ncoghlan/devel/py3k/Lib/io.py", line 151, in open
     closefd)
IOError: [Errno 2] No such file or directory: '日本語'
 >>> f = open("\u65e5\u672c\u8a9e", 'w')
 >>> f.name
'日本語'

Note that even though the bytes object representation is slightly 
different from that for the normal displayhook (which doubles up on the 
backslashes, just like the bytes printing example above), the two 
different representations are equivalent because \u isn't a valid escape 
sequence for bytes literals.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org


More information about the Python-3000 mailing list