splitting perl-style find/replace regexp using python

James Stroud jstroud at mbi.ucla.edu
Thu Mar 1 11:33:07 CET 2007


Peter Otten wrote:
> James Stroud wrote:
> 
>> James Stroud wrote:
>>> John Pye wrote:
>>>> Hi all
>>>>
>>>> I have a file with a bunch of perl regular expressions like so:
>>>>
>>>> /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
>>>> bold
>>>> /(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
>>>> b>''$3/ # italic bold
>>>> /(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
>>>> italic
>>>>
>>>> These are all find/replace expressions delimited as '/search/replace/
>>>> # comment' where 'search' is the regular expression we're searching
>>>> for and 'replace' is the replacement expression.
>>>>
>>>> Is there an easy and general way that I can split these perl-style
>>>> find-and-replace expressions into something I can use with Python, eg
>>>> re.sub('search','replace',str) ?
>>>>
>>>> I though generally it would be good enough to split on '/' but as you
>>>> see the <\/b> messes that up. I really don't want to learn perl
>>>> here :-)
>>>>
>>>> Cheers
>>>> JP
>>>>
>>> This could be more general, in principal a perl regex could end with a
>>> "\", e.g. "\\/", but I'm guessing that won't happen here.
>>>
>>> py> for p in perlish:
>>> ...   print p
>>> ...
>>> /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
>>> /(^|[\s\(])\_\_([^ ].*?[^
>>> ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/ /(^|[\s\(])\_([^ ].*?[^
>>> ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ py> import re
>>> py> splitter = re.compile(r'[^\\]/')
>>> py> for p in perlish:
>>> ...   print splitter.split(p)
>>> ...
>>> ['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
>>> "$1'''$2'''$", '']
>>> ['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
>>> "$1''<b>$2<\\/b>''$", '']
>>> ['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
>>> "$1''$2''$", '']
>>>
>>> (I'm hoping this doesn't wrap!)
>>>
>>> James
>> I realized that threw away the closing parentheses. This is the correct
>> version:
>>
>> py> splitter = re.compile(r'(?<!\\)/')
>> py> for p in perlish:
>> ...   print splitter.split(p)
>> ...
>> ['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
>> "$1'''$2'''$3", '']
>> ['', '(^|[\\s\\(])\\_\\_([^ ].*?[^
>> ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', "$1''<b>$2<\\/b>''$3", '']
>> ['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
>> "$1''$2''$3", '']
> 
> There is another problem with escaped backslashes:
> 
>>>> re.compile(r'(?<!\\)/').split(r"/abc\\/def/")
> ['', 'abc\\\\/def', '']
> 
> Peter

Yes, this would be a case of the expression (left side) ending with a 
"\" as I mentioned above.

James



More information about the Python-list mailing list