[Python-3000] Invalid \U escape in source code give hard-to-trace error

Kurt B. Kaiser kbk at shore.net
Wed Jul 18 08:04:13 CEST 2007


"Guido van Rossum" <guido at python.org> writes:

> When a source file contains a string literal with an out-of-range \U
> escape (e.g. "\U12345678"), instead of a syntax error pointing to the
> offending literal, I get this, without any indication of the file or
> line:
>
> UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in
> position 0-9: illegal Unicode character
>
> This is quite hard to track down. (Both the location of the bad
> literal in the source file, and the origin of the error in the parser.
> :-) Can someone come up with a fix?
>
> I note that raw escapes show a slightly different error. I also note
> that the same issue exists for u"..." literals in Python 2.5.

For what it's worth, I posted a patch to ast.c against the 2.6 trunk
which massages the unicode exception into a SyntaxError showing the
location.

That approach lets unicodeobject.c handle the gory details while ast.c
handles the SyntaxError generation.  It might be a solution until
something deeper along the lines of Martin's thoughts is possibly
developed.

I don't think that any reference adjustments are needed, but someone
should check the patch.

www.python.org/sf/1755885

-- 
KBK


More information about the Python-3000 mailing list