[Python-3000] Raw strings containing \u or \U

Steven Bethard steven.bethard at gmail.com
Wed May 16 20:32:37 CEST 2007


On 5/16/07, Guido van Rossum <guido at python.org> wrote:
> On 5/16/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> > +1 for no escaping of quotes in raw strings.  Python provides so many
> > different ways to quote a string, the cases in which you can't just
> > switch to another quoting style are vanishingly small.  Examples from
> > the stdlib and their translations::
> >
> >     '\'' --> "'"
> >     '("|\')' --> '''("|')'''
> >     'Can\'t stat' --> "Can't stat"
> >     '(\'[^\']*\'|"[^"]*")?' --> '''('[^']*'|"[^"]*")?'''
> >
> > Note that allowing trailing backslashes could also clean up stuff in
> > modules like ntpath::
> >
> >     path[-1] in "/\\" --> path[-1] in r"/\"
> >     firstTwo == '\\\\' --> firstTwo == r'\\'
>
> Can you also search for how often this feature is *used* (i.e. a raw
> string that has to be raw for other reasons also contains an escaped
> quote)? If that's rare or we can agree on easy fixes it would ease my
> mind about this part of the proposal.

Well, remembering that when you escape a quote in a raw string, the
backslash is left in regardless of the enclosing quote type, e.g.::

    r"\"" == r'\"' == r"""\"""" == r'''\"''' == '\\"'

the question is then whether there are any situations where you can't
just switch the quote type. The only things in the stdlib that I could
find[1] where the string quotes and the escaped quote were of the same
type were:

    r"^\s*=\s*\"([^\"\\]*(?:\\.[^\"\\]*)*)\""
    r"([\"\\])"
    r'[^\\\'\"%s ]*'
    r'#\s*doctest:\s*([^\n\'"]*)$',
    r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~@]*))?'
    r"([^.'\"\\#]\b|^)"
    r'(\'[^\']*\'|"[^"]*")\s*'
    r'((\\[\\abfnrtv\'"]|\\[0-9]..|\\x..|\\u....)+)',
    r'(\'[^\']*\'|"[^"]*"|[][\-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~\'"@]*))?'
    r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))'
    r'[\"\']?'
    r'[ \(\)<>@,;:\\"/\[\]\?=]'
    r"[&<>\"\x80-\xff]+"

I believe every one of these would continue to work if you simply
replaced r'...' or r"..." with r'''...''', that is, if you used the
triple-quoted version. Even some much nastier ones than what's in the
stdlib (e.g. where the string starts and ends with different quote
types) seem to work out okay when you switch to the appropriate triple
quotes::

    r'\'\"' == r'''\'\"'''
    r'"\'' == r""""\'"""

I actually wasn't able to find something I couldn't translate.  It
would be helpful to have another set of eyes if anyone has the time.

[1] I skipped the tests dir because I'm lazy. ;-)

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy


More information about the Python-3000 mailing list