[issue1477] UnicodeDecodeError that cannot be caught in narrow unicode builds

Amaury Forgeot d'Arc report at bugs.python.org
Tue Mar 18 02:46:20 CET 2008


Amaury Forgeot d'Arc <amauryfa at gmail.com> added the comment:

The error is not uncatchable; but it is generated while compiling, like
a SyntaxError. No bytecode is generated for the input, and the "except"
opcode is not run at all.

OTOH, there is a bug in PyUnicode_DecodeRawUnicodeEscape(): it should
accept code points > 0xffff. It has another problem:

>>> ur'\U00010000'
u'\x00'

I join a patch to make raw-unicode-escape similar to unicode-escape:
characters outside the Basic Plane are encoded into a utf-16 surrogate
pair; on decoding, utf-16 surrogates are decoded into \U00xxxxxx.

----------
keywords: +patch
nosy: +amaury.forgeotdarc
Added file: http://bugs.python.org/file9714/raw-unicode-escape.patch

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue1477>
__________________________________


More information about the Python-bugs-list mailing list