[Python-Dev] \u and \U escapes in raw unicode string literals

Guido van Rossum guido at python.org
Thu May 10 20:45:57 CEST 2007


I just discovered that, in all versions of Python as far back as I
have access to (2.0), \uXXXX escapes are interpreted inside raw
unicode strings. Thus:

>>> a = ur"\u1234"
>>> len(a)
1
>>>

Contrast this with:

>>> a = ur"\x12"
>>> len(a)
4
>>>

The \U escape has the same behavior, in versions that support it.

Does anyone remember why it is done this way? The reference manual
describes this behavior, but doesn't give an explanation:

"""
When an "r" or "R" prefix is used in conjunction with a "u" or "U"
prefix, then the \uXXXX and \UXXXXXXXX escape sequences are processed
while all other backslashes are left in the string. For example, the
string literal ur"\u0062\n" consists of three Unicode characters:
`LATIN SMALL LETTER B', `REVERSE SOLIDUS', and `LATIN SMALL LETTER N'.
Backslashes can be escaped with a preceding backslash; however, both
remain in the string. As a result, \uXXXX escape sequences are only
recognized when there are an odd number of backslashes.
"""

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list