inserting \ in regular expressions

John Roth johnroth1 at gmail.com
Thu Oct 27 10:18:25 EDT 2011


On Oct 26, 2:47 pm, Dave Angel <d... at davea.name> wrote:
> On 10/26/2011 03:48 PM, Ross Boylan wrote:
>
>
>
>
>
>
>
> > I want to replace every \ and " (the two characters for backslash and
> > double quotes) with a \ and the same character, i.e.,
> > \ ->  \\
> > " ->  \"
>
> > I have not been able to figure out how to do that.  The documentation
> > for re.sub says "repl can be a string or a function; if it is a string,
> > any backslash escapes in it are processed.That is, \n is converted to a
> > single newline character, \r is converted to a carriage return, and so
> > forth. Unknown escapes such as \j are left alone."
>
> > \\ is apparently unknown, and so is left as is. So I'm unable to get a
> > single \.
>
> > Here are some tries in Python 2.5.2.  The document suggested the result
> > of a function might not be subject to the same problem, but it seems to
> > be.
> >>>> def f(m):
> > ...    return "\\"+m.group(1)
> > ...
> >>>> re.sub(r"([\\\"])", f, 'Silly " quote')
> > 'Silly \\" quote'
> > <SNIP>
> >>> re.sub(r"([\\\"])", "\\\\\\1", 'Silly " quote')
> > 'Silly \\" quote'
>
> > Or perhaps I'm confused about what the displayed results mean.  If a
> > string has a literal \, does it get shown as \\?
>
> > I'd appreciate it if you cc me on the reply.
>
> > Thanks.
> > Ross Boylan
>
> I can't really help on the regex aspect of your code, but I can tell you
> a little about backslashes, quote literals, the interpreter, and python.
>
>
>   Now, one way to cheat on the string if you know you'll want to put
> actual backslashes is to use the raw string. That works quite well
> unless you want the string to end with a backslash.  There isn't a way
> to enter that as a single raw literal.  You'd have to do something
> string like
>       a = r"strange\literal\with\some\stuff" + "\\"
>
> My understanding is that no valid regex ends with a backslash, so this
> may not affect you.
>
> --
>
> DaveA

Dave's answer is excellent background. I've snipped everything except
the part I want to emphasize, which is to use raw strings. They were
put into Python specifically for your problem: that is, how to avoid
the double and triple backslashes while writing regexes.

John Roth




More information about the Python-list mailing list