[Tutor] RE module is working ?

Karim karim.liateni at free.fr
Fri Feb 4 00:23:19 CET 2011


On 02/03/2011 07:47 PM, Karim wrote:
> On 02/03/2011 02:15 PM, Peter Otten wrote:
>> Karim wrote:
>>
>>> I am trying to subsitute a '""' pattern in '\"\"' namely escape 2
>>> consecutives double quotes:
>>>
>>>      * *In Python interpreter:*
>>>
>>> $ python
>>> Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
>>> [GCC 4.4.3] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> >>>  expression = *' "" '*
>>> >>>  re.subn(*r'([^\\])?"', r'\1\\"', expression*)
>>> Traceback (most recent call last):
>>>     File "<stdin>", line 1, in<module>
>>>     File "/home/karim/build/python/install/lib/python2.7/re.py", line
>>> 162, in subn
>>>       return _compile(pattern, flags).subn(repl, string, count)
>>>     File "/home/karim/build/python/install/lib/python2.7/re.py", line
>>> 278, in filter
>>>       return sre_parse.expand_template(template, match)
>>>     File "/home/karim/build/python/install/lib/python2.7/sre_parse.py",
>>> line 787, in expand_template
>>>       raise error, "unmatched group"
>>> sre_constants.error: unmatched group
>>>
>>> But if I remove '?' I get the following:
>>>
>>> >>>  re.subn(r'([^\\])"', r'\1\\"', expression)
>>> (' \\"" ', 1)
>>>
>>> Only one substitution..._But this is not the same REGEX._ And the
>>> count=2 does nothing. By default all occurrence shoul be substituted.
>>>
>>>      * *On linux using my good old sed command, it is working with 
>>> my '?'
>>>        (0-1 match):*
>>>
>>> *$* echo *' "" '* | sed *'s/\([^\\]\)\?"/\1\\"/g*'*
>>>    \"\"
>>>
>>> *Indeed what's the matter with RE module!?*
>> You should really fix the problem with your email program first;
> Thunderbird issue with bold type (appears as stars) but I don't know 
> how to fix it yet.
>>   afterwards
>> it's probably a good idea to try and explain your goal clearly, in plain
>> English.
>
> I already did it. (cf the mails queue). But to resume I pass the 
> expression string to TCL command which delimits string with double 
> quotes only.
> Indeed I get error with nested double quotes => That's the key problem.
>> Yes. What Steven said ;)
>>
>> Now to your question as stated: if you want to escape two consecutive 
>> double
>> quotes that can be done with
>>
>> s = s.replace('""', '\"\"')
>>
> I have already done it as a workaround but I have to add another 
> replacement before to consider all other cases.
> I want to make the original command work to suppress the workaround.
>
>
>> but that's probably *not* what you want. Assuming you want to escape two
>> consecutive double quotes and make sure that the first one isn't already
>> escaped,
>
> You hit it !:-)
>
>> this is my attempt:
>>
>>>>> def sub(m):
>> ...     s = m.group()
>> ...     return r'\"\"' if s == '""' else s
>> ...
>>>>> print re.compile(r'[\\].|""').sub(sub, r'\\\"" \\"" \"" "" \\\" 
>>>>> \\" \"')
>
> That is not the thing I want. I want to escape any " which are not 
> already escaped.
> The sed regex  '/\([^\\]\)\?"/\1\\"/g' is exactly what I need (I have 
> made regex on unix since 15 years).
>
> For me the equivalent python regex is buggy: r'([^\\])?"', r'\1\\"'
> '?' is not accepted Why? character which should not be an antislash 
> with 0 or 1 occurence. This is quite simple.
>
> I am a poor tradesman but I don't deny evidence.

Recall:

 >>> re.subn(r'([^\\])?"', r'\1\\"', expression)

Traceback (most recent call last):
     File "<stdin>", line 1, in<module>
     File "/home/karim/build/python/install/lib/python2.7/re.py", line
162, in subn
       return _compile(pattern, flags).subn(repl, string, count)
     File "/home/karim/build/python/install/lib/python2.7/re.py", line
278, in filter
       return sre_parse.expand_template(template, match)
     File "/home/karim/build/python/install/lib/python2.7/sre_parse.py",
line 787, in expand_template
       raise error, "unmatched group"
sre_constants.error: unmatched group


Found the solution: '?' needs to be inside parenthesis (saved pattern) 
because outside we don't know if the saved match argument
will exist or not namely '\1'.

 >>> re.subn(r'([^\\]?)"', r'\1\\"', expression)

(' \\"\\" ', 2)

sed unix command is more permissive: sed 's/\([^\\]\)\?"/\1\\"/g' 
because '?' can be outside parenthesis (saved pattern but escaped for sed).
\1 seems to not cause issue when matching is found. Perhaps it is 
created only when match occurs.

MORALITY:

1) Behaviour of python is logic and I must understand what I do with it.
2) sed is a fantastic tool because it manages match value when missing.
3) I am a real poor tradesman

Regards
Karim

>
> Regards
> Karim
>
>> \\\"" \\\"\" \"" \"\" \\\" \\" \"
>>
>> Compare that with
>>
>> $ echo '\\\"" \\"" \"" "" \\\" \\" \"' | sed 's/\([^\\]\)\?"/\1\\"/g'
>> \\\"\" \\"\" \"\" \"\" \\\\" \\\" \\"
>>
>> Concerning the exception and the discrepancy between sed and python's 
>> re, I
>> suggest that you ask it again on comp.lang.python aka the python-list
>> mailing list where at least one regex guru will read it.
>>
>> Peter
>>
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list