[Python-Dev] \u and \U escapes in raw unicode string literals

Thomas Heller theller at ctypes.org
Fri May 11 13:05:05 CEST 2007


M.-A. Lemburg schrieb:
> On 2007-05-11 07:52, Martin v. Löwis wrote:
>>> This is what prompted my question, actually: in Py3k, in the
>>> str/unicode unification branch, r"\u1234" changes meaning: before the
>>> unification, this was an 8-bit string, where the \u was not special,
>>> but now it is a unicode string, where \u *is* special.
>> 
>> That is true for non-raw strings also: the meaning of "\u1234" also
>> changes.
>> 
>> However, traditionally, there was *no* escaping mechanism in raw strings
>> in Python, and I feel that this is a good principle, because it is
>> easy to learn (if you leave out the detail that \ can't be the last
>> character in a raw string - which should get fixed also, IMO). So I
>> think in Py3k, "\u1234" should continue to be a string with 6
>> characters. Otherwise, people will complain that
>> os.stat(r"c:\windows\system32\user32.dll") fails. Telling them to write
>> os.stat(r"c:\windows\system32\u005Cuser32.dll") will just cause puzzled
>> faces.
> 
> Using double backslashes won't cause that reaction:
> 
> os.stat("c:\\windows\\system32\\user32.dll")

Sure.  But I want to use raw strings for Windows path names; it's much easier
to type.

> Also note that Windows is smart enough nowadays to parse
> the good old Unix forward slash:
> 
> os.stat("c:/windows/system32/user32.dll")

In my opinion this is a windows bug and not a features.  Especially because there
are Windows api functions (the shell functions, IIRC) that do NOT accept
forward slashes.

Would you say that *nix is dumb because it doesn't parse "\\usr\\include"?

>> Windows path names are one of the two primary applications of raw
>> strings (the other being regexes).
> 

Thomas



More information about the Python-Dev mailing list