[Tutor] RE module is working ?

Karim karim.liateni at free.fr
Thu Feb 3 19:47:22 CET 2011


On 02/03/2011 02:15 PM, Peter Otten wrote:
> Karim wrote:
>
>> I am trying to subsitute a '""' pattern in '\"\"' namely escape 2
>> consecutives double quotes:
>>
>>      * *In Python interpreter:*
>>
>> $ python
>> Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
>> [GCC 4.4.3] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>>   >>>  expression = *' "" '*
>>   >>>  re.subn(*r'([^\\])?"', r'\1\\"', expression*)
>> Traceback (most recent call last):
>>     File "<stdin>", line 1, in<module>
>>     File "/home/karim/build/python/install/lib/python2.7/re.py", line
>> 162, in subn
>>       return _compile(pattern, flags).subn(repl, string, count)
>>     File "/home/karim/build/python/install/lib/python2.7/re.py", line
>> 278, in filter
>>       return sre_parse.expand_template(template, match)
>>     File "/home/karim/build/python/install/lib/python2.7/sre_parse.py",
>> line 787, in expand_template
>>       raise error, "unmatched group"
>> sre_constants.error: unmatched group
>>
>> But if I remove '?' I get the following:
>>
>>   >>>  re.subn(r'([^\\])"', r'\1\\"', expression)
>> (' \\"" ', 1)
>>
>> Only one substitution..._But this is not the same REGEX._ And the
>> count=2 does nothing. By default all occurrence shoul be substituted.
>>
>>      * *On linux using my good old sed command, it is working with my '?'
>>        (0-1 match):*
>>
>> *$* echo *' "" '* | sed *'s/\([^\\]\)\?"/\1\\"/g*'*
>>    \"\"
>>
>> *Indeed what's the matter with RE module!?*
> You should really fix the problem with your email program first;
Thunderbird issue with bold type (appears as stars) but I don't know how 
to fix it yet.
>   afterwards
> it's probably a good idea to try and explain your goal clearly, in plain
> English.

I already did it. (cf the mails queue). But to resume I pass the 
expression string to TCL command which delimits string with double 
quotes only.
Indeed I get error with nested double quotes => That's the key problem.
> Yes. What Steven said ;)
>
> Now to your question as stated: if you want to escape two consecutive double
> quotes that can be done with
>
> s = s.replace('""', '\"\"')
>
I have already done it as a workaround but I have to add another 
replacement before to consider all other cases.
I want to make the original command work to suppress the workaround.


> but that's probably *not* what you want. Assuming you want to escape two
> consecutive double quotes and make sure that the first one isn't already
> escaped,

You hit it !:-)

> this is my attempt:
>
>>>> def sub(m):
> ...     s = m.group()
> ...     return r'\"\"' if s == '""' else s
> ...
>>>> print re.compile(r'[\\].|""').sub(sub, r'\\\"" \\"" \"" "" \\\" \\" \"')

That is not the thing I want. I want to escape any " which are not 
already escaped.
The sed regex  '/\([^\\]\)\?"/\1\\"/g' is exactly what I need (I have 
made regex on unix since 15 years).

For me the equivalent python regex is buggy: r'([^\\])?"', r'\1\\"'
'?' is not accepted Why? character which should not be an antislash with 
0 or 1 occurence. This is quite simple.

I am a poor tradesman but I don't deny evidence.

Regards
Karim

> \\\"" \\\"\" \"" \"\" \\\" \\" \"
>
> Compare that with
>
> $ echo '\\\"" \\"" \"" "" \\\" \\" \"' | sed 's/\([^\\]\)\?"/\1\\"/g'
> \\\"\" \\"\" \"\" \"\" \\\\" \\\" \\"
>
> Concerning the exception and the discrepancy between sed and python's re, I
> suggest that you ask it again on comp.lang.python aka the python-list
> mailing list where at least one regex guru will read it.
>
> Peter
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list