Regex substitution trouble

Cameron Simpson cs at zip.com.au
Wed Oct 29 03:14:54 CET 2014


On 28Oct2014 04:02, massi_srb at msn.com <massi_srb at msn.com> wrote:
>I'm not really sure if this is the right place to ask about regular 
>expressions, but since I'm usin python I thought I could give a try :-)
>Here is the problem, I'm trying to write a regex in order to substitute all the occurences in the form $"somechars" with another string. This is what I wrote:
>
>newstring = re.sub(ur"""(?u)(\$\"[\s\w]+\")""", subst, oldstring)
>
>This works pretty well, but it has a problem, I would need it also to handle the case in which the internal string contains the double quotes, but only if preceeded by a backslash, that is something like $"somechars_with\\"doublequotes".
>Can anyone help me to correct it?

People seem to be making this harder than it should be.

I'd just be fixing up your definition of what's inside the quotes. There seem 
to be 3 kinds of things:

   - not a double quote or backslash
   - a backslash followed by a double quote
   - a backslash followed by not a double quote

Kind 3 is a policy call - take the following character or not? I would go with 
treating it like kind 2 myself.

So you have:

   1 [^\\"]
   2 \\"
   3 \\[^"]

and fold 2 and 3 into:

   2+3 \\.

So your regexp inner becomes:

   ([^\\"]|\\.)*

and the whole thing becomes:

   \$"(([^\\"]|\\.)*)"

and as a raw string:
   
   ur'\$"(([^\\"]|\\.)*)"'

choosing single quotes to be more readable given the double quotes in the 
regexp.

Cheers,
Cameron Simpson <cs at zip.com.au>
-- 
cat: /Users/cameron/rc/mail/signature.: No such file or directory

Language... has created the word "loneliness" to express the pain of
being alone. And it has created the word "solitude" to express the glory
of being alone. - Paul Johannes Tillich



More information about the Python-list mailing list