[Python-Dev] \u and \U escapes in raw unicode string literals

M.-A. Lemburg mal at egenix.com
Sat May 12 01:30:52 CEST 2007

On 2007-05-12 00:48, Martin v. Löwis wrote:
>> Using double backslashes won't cause that reaction:
>> os.stat("c:\\windows\\system32\\user32.dll")
> Please refer to the subject. We are talking about raw strings.

If you'd leave the context in place, the reason for my suggestion
would become evident.

>>> Windows path names are one of the two primary applications of raw
>>> strings (the other being regexes).
>> IMHO the primary use case are regexps
> It's not a matter of opinion. It's a statistical fact that these
> are the two cases where people use raw strings most.

Ah, statistics :-) It always depends on who you ask: a Windows
user will obviously have more use for raw string use-case you
gave than a Unix user. At the end of the day, I still believe
that the regexp use-case is by far more common than the Windows
path name one.

FWIW: Zope has 2 uses of raw string for Windows path names (if I
counted correctly) and around 100 for regexps. Python itself
has maybe 10-20 Windows path name (and registry name) uses of
raw string (in the msi lib and distutils) vs. around 300 uses
for regexps.

>> and for those you'd
>> definitely want to be able to put Unicode characters into your
>> expressions.
> For regular expressions, you don't need them as part of the
> string literal syntax: The re parser itself could support \u,
> just like it supports \x today.

True and perhaps that's the right path to follow.

You'd still have the problem of writing Windows path names with
embedded Unicode characters, but I guess that's something we
can fix another day ;-)

>> BTW, if you use ur"..." for your expressions today (which you should
>> if you parse text), then nothing will change when removing the
>> 'u' prefix in Py3k.
> How do you know? Py3k hasn't been released, yet.

Sorry, I wasn't clear: if the raw-unicode-escape codec continues
to work the way it does not, you won't run into trouble in Py3k.

[and later:]
>> BTW, there's an easy work-around for this special case:
>> > 
>> > os.stat(os.path.join(r"c:\windows\system32", "user32.dll"))
> No matter what the decision is, there are always work-arounds.
> The question is what language suits the users most. Being
> able to specify characters by ordinal IMO has much less value
> than the a consistent, concise definition of raw strings has.

I wonder how we managed to survive all these years with
the existing consistent and concise definition of the
raw-unicode-escape codec ;-)

There are two options:

 * no one really uses Unicode raw strings nowadays

 * none of the existing users has ever stumbled across the
   "problem case" that triggered all this

Both ways, we're discussing a non-issue.

>> > I think it's nice that you can use forward slashes on Windows -
>> > makes writing code that works in both worlds (Unix and Windows)
>> > a lot easier.
> But, as Thomas says: you can't. You may be able to do so
> when using the API directly, however, it fails if you
> pass the file name in a command line of some tool that
> takes /foo to mean a command line option "foo".

Strange. I've doing exactly that for years. Maybe it's just
because I stick to common os module APIs. So far, I've never
run into any problem with it.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, May 12 2007)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Python-Dev mailing list