using re module to find " but not " alone ... is this a BUG in re?

Peter Otten __peter__ at web.de
Thu Jun 12 09:09:53 EDT 2008


anton wrote:

> I want to replace all occourences of " by \" in a string.
> 
> But I want to leave all occourences of \" as they are.
> 
> The following should happen:
> 
>   this I want " while I dont want this \"
> 
> should be transformed to:
> 
>   this I want \" while I dont want this \"
> 
> and NOT:
> 
>   this I want \" while I dont want this \\"
> 
> I tried even the (?<=...) construction but here I get an unbalanced
> paranthesis error.
> 
> It seems tha re is not able to do the job due to parsing/compiling
> problems for this sort of strings.
> 
> 
> Have you any idea??

The problem is underspecified. Should r'\\"' become r'\\\"' or remain
unchanged? If the backslash is supposed to escape the following letter
including another backslash -- that can't be done with regular expressions
alone:

# John's proposal:
>>> print re.sub(r'(?<!\\)"', r'\"', 'no " one \\", two \\\\"')
no \" one \", two \\"


One possible fix:

>>> parts = re.compile("(\\\\.)").split('no " one \\", two \\\\"')
>>> parts[::2] = [p.replace('"', '\\"') for p in parts[::2]]
>>> print "".join(parts)
no \" one \", two \\\"

Peter




More information about the Python-list mailing list