using re module to find " but not " alone ... is this a BUG in re?
Paul McGuire
ptmcg at austin.rr.com
Fri Jun 13 09:46:21 EDT 2008
On Jun 12, 4:11 am, anton <anto... at gmx.de> wrote:
> Hi,
>
> I want to replace all occourences of " by \" in a string.
>
> But I want to leave all occourences of \" as they are.
>
> The following should happen:
>
> this I want " while I dont want this \"
>
> should be transformed to:
>
> this I want \" while I dont want this \"
>
> and NOT:
>
> this I want \" while I dont want this \\"
>
A pyparsing version is not as terse as an re, and certainly not as
fast, but it is easy enough to read. Here is my first brute-force
approach to your problem:
from pyparsing import Literal, replaceWith
escQuote = Literal(r'\"')
unescQuote = Literal(r'"')
unescQuote.setParseAction(replaceWith(r'\"'))
test1 = r'this I want " while I dont want this \"'
test2 = r'frob this " avoid this \", OK?'
for test in (test1, test2):
print (escQuote | unescQuote).transformString(test)
And it prints out the desired:
this I want \" while I dont want this \"
frob this \" avoid this \", OK?
This works by defining both of the patterns escQuote and unescQuote,
and only defines a transforming parse action for the unescQuote. By
listing escQuote first in the list of patterns to match, properly
escaped quotes are skipped over.
Then I looked at your problem slightly differently - why not find both
'\"' and '"', and replace either one with '\"'. In some cases, I'm
"replacing" '\"' with '\"', but so what? Here is the simplfied
transformer:
from pyparsing import Optional, replaceWith
quotes = Optional(r'\\') + '"'
quotes.setParseAction(replaceWith(r'\"'))
for test in (test1, test2):
print quotes.transformString(test)
Again, this prints out the desired output.
Now let's retrofit this altered logic back onto John Machin's
solution:
import re
for test in (test1, test2):
print re.sub(r'\\?"', r'\"', test)
Pretty short and sweet, and pretty readable for an re.
To address Peter Otten's question about what to do with an escaped
backslash, I can't compose this with an re, but I can by adjusting the
first pyparsing version to include an escaped backslash as a "match
but don't do anything with it" expression, just like we did with
escQuote:
from pyparsing import Optional, Literal, replaceWith
escQuote = Literal(r'\"')
unescQuote = Literal(r'"')
unescQuote.setParseAction(replaceWith(r'\"'))
backslash = chr(92)
escBackslash = Literal(backslash+backslash)
test3 = r'no " one \", two \\"'
for test in (test1, test2, test3):
print (escBackslash | escQuote |
unescQuote).transformString(test)
Prints:
this I want \" while I dont want this \"
frob this \" avoid this \", OK?
no \" one \", two \\\"
At first I thought the last transform was an error, but on closer
inspection, I see that the input line ends with an escaped backslash,
followed by a lone '"', which must be replaced with '\"'. So in the
transformed version we see '\\\"', the original escaped backslash,
followed by the replacement '\"' string.
Cheers,
-- Paul
More information about the Python-list
mailing list