[New-bugs-announce] [issue18870] eval() uses latin-1 to decode str
Merlijn van Deen
report at bugs.python.org
Wed Aug 28 20:18:22 CEST 2013
New submission from Merlijn van Deen:
Steps to reproduce:
-------------------
>>> eval("u'ä'")
# in an utf-8 console, so this is equivalent to
>>> eval("u'\xc3\xa4'")
Actual result:
----------------
u'\xc3\xa4'
# i.e.: u'ä'
Expected result:
-----------------
SyntaxError: Non-ASCII character '\xc3' in file <string> on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
(which is what would happen if it was in a source file)
Or, alternatively:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)
(which is what results from decoding the str with sys.getdefaultencoding())
Instead, the string is interpreted as latin-1. The same happens for ast.literal_eval - even calling compile() directly.
In python 3.2, this is the result, as utf-8 is used as default source encoding:
>>> eval(b"'\xc3\xa4'")
'ä'
Workarounds
----------
>>> eval("# encoding: utf-8\nu'\xc3\xa4'")
u'\xe4'
>>> eval("u'\xc3\xa4'".decode('utf-8'))
u'\xe4'
I understand this might be considered a WONTFIX, as it would change behavior some people might depend on. Nonetheless, documenting this explicitly seems a sensible thing to do.
----------
messages: 196398
nosy: valhallasw
priority: normal
severity: normal
status: open
title: eval() uses latin-1 to decode str
versions: Python 2.7
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18870>
_______________________________________
More information about the New-bugs-announce
mailing list