[Python-bugs-list] [ python-Bugs-817156 ] invalid \U escape gives
0=length unistr
SourceForge.net
noreply at sourceforge.net
Mon Oct 6 01:08:56 EDT 2003
Bugs item #817156, was opened at 2003-10-03 13:30
Message generated for change (Comment added) made by jhylton
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=817156&group_id=5470
Category: Unicode
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Jeff Epler (jepler)
Assigned to: M.-A. Lemburg (lemburg)
Summary: invalid \U escape gives 0=length unistr
Initial Comment:
>>> u'\Ufffffffe' # CORRECT
UnicodeDecodeError: 'unicodeescape' codec can't decode
bytes in position 0-9: illegal Unicode character
>>> u'\Uffffffff' # WRONG
u''
>>> len(_)
0
Observed on 2.2.2 (redhat wide-unicode build,
sys.maxunicode=1114111), 2.3.1 (custom build,
sys.maxunicode == 65535)
I think the problem is due to this logic in
unicodeobject.c:PyUnicode_DecodeUnicodeEscape()
if (chr == 0xffffffff)
/* _decoding_error will have already
written into the
target buffer. */
break;
perhaps it should be (chr == 0xffffffff &&
PyErr_Occurred())
I tried this change locally, and it fixes the problem:
>>> u'\Uffffffff'
UnicodeDecodeError: 'unicodeescape' codec can't decode
bytes in position 0-9: illegal Unicode character
>>> u'\Ufffffffe'
UnicodeDecodeError: 'unicodeescape' codec can't decode
bytes in position 0-9: illegal Unicode character
and doesn't change the outcome of the test suite.
Patch against 2.3.1 attached.
----------------------------------------------------------------------
>Comment By: Jeremy Hylton (jhylton)
Date: 2003-10-06 05:08
Message:
Logged In: YES
user_id=31392
Fixed in rev. 2.199 of unicodeobject.c.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=817156&group_id=5470
More information about the Python-bugs-list
mailing list