[New-bugs-announce] [issue1477] UnicodeDecodeError that cannot be caught in narrow unicode builds

Sean B. Palmer report at bugs.python.org
Tue Nov 20 22:17:39 CET 2007


New submission from Sean B. Palmer:

The following error is uncatchable:

>>> try: ur'\U0010FFFF'
... except UnicodeDecodeError: pass
... 
UnicodeDecodeError: 'rawunicodeescape' codec can't decode byte 0x5c 
in position 0: \Uxxxxxxxx out of range

This is in a narrow unicode build:

>>> sys.version_info, hex(sys.maxunicode)
((2, 5, 1, 'final', 0), '0xffff')

Of course the r in ur'...' is redundant in the test case above, but
there are cases in which it isn't...

>>> ur'\U0010FFFF\test'
u'\U0010ffff\\test'
- from a wide unicode build

>>> ur'\U0010FFFF\test'
UnicodeDecodeError: 'rawunicodeescape' codec can't decode byte 0x5c 
in position 0: \Uxxxxxxxx out of range
- from the narrow unicode build

The problem occurs with .decode('raw-unicode-escape') too.

>>> '\U0010FFFF\test'.decode('raw-unicode-escape')
Traceback (most recent call last):
[&c.]

Most surprisingly of all, however, this problem doesn't occur when you
don't use a raw string:

>>> u'\U0010ffff\\test'
u'\U0010ffff\\test'

So there is at least a workaround for all cases, which is why this bug
is marked as Severity: minor. It did take a while to work out that what
manifests with ur mightn't apply to u, however; it's usually one's first
thought to think the bug is with you, not with python.

----------
components: Unicode
messages: 57710
nosy: sbp
severity: minor
status: open
title: UnicodeDecodeError that cannot be caught in narrow unicode builds
type: behavior
versions: Python 2.5

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue1477>
__________________________________


More information about the New-bugs-announce mailing list