[issue19539] The 'raw_unicode_escape' codec buggy + not apropriate for Python 3.x

Jan Kaliszewski report at bugs.python.org
Sun Nov 10 03:51:45 CET 2013


New submission from Jan Kaliszewski:

It seems that the 'raw_unicode_escape' codec:

1) produces data that could be suitable for Python 2.x raw unicode string literals and not for Python 3.x raw unicode string literals (in Python 3.x \u... escapes are also treated literally);

2) seems to be buggy anyway: bytes in range 128-255 are encoded with the 'latin-1' encoding (in Python 3.x it is definitely a bug; and even in Python 2.x the feature is dubious, although at least the Py2's eval() and compile() functions officially accept 'latin-1'-encoded byte strings...).

Python 3.3:

>>> b = "zażółć".encode('raw_unicode_escape')
>>> literal = b'r"' + b + b'"'
>>> literal
b'r"za\\u017c\xf3\\u0142\\u0107"'
>>> eval(literal)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xf3 in position 8: invalid continuation byte
>>> b'\xf3'.decode('latin-1')
'ó'
>>> b = "zaż".encode('raw_unicode_escape')
>>> literal = b'r"' + b + b'"'
>>> literal
b'r"za\\u017c"'
>>> eval(literal)
'za\\u017c'
>>> print(eval(literal))
za\u017c

It believe that the 'raw_unicode_escape' codes should either be deprecated and later removed or be modified to accept only printable ascii characters.


PS. Also, as a side note: neither 'raw_unicode_escape' nor 'unicode_escape' does escape quotes (see issue #7615) -- shouldn't it be at least documented explicitly?

----------
components: Library (Lib), Unicode
messages: 202505
nosy: ezio.melotti, haypo, zuo
priority: normal
severity: normal
status: open
title: The 'raw_unicode_escape' codec buggy + not apropriate for Python 3.x
versions: Python 3.4, Python 3.5

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19539>
_______________________________________


More information about the Python-bugs-list mailing list