Something confusing about non-greedy reg exp match

Gary Herron gherron at islandtraining.com
Mon Sep 7 05:33:30 CEST 2009


George Burdell wrote:
> On Sep 6, 10:06 pm, "Mark Tolonen" <metolone+gm... at gmail.com> wrote:
>   
>> <gburde... at gmail.com> wrote in message
>>
>> news:f98a6057-c35f-4843-9efb-7f36b05b677c at g19g2000yqo.googlegroups.com...
>>
>>     
>>> If I do this:
>>>       
>>> import re
>>> a=re.search(r'hello.*?money',  'hello how are you hello funny money')
>>>       
>>> I would expect a.group(0) to be "hello funny money", since .*? is a
>>> non-greedy match. But instead, I get the whole sentence, "hello how
>>> are you hello funny money".
>>>       
>>> Is this expected behavior? How can I specify the correct regexp so
>>> that I get "hello funny money" ?
>>>       
>> A non-greedy match matches the fewest characters before matching the text
>> *after* the non-greedy match.  For example:
>>
>>     
>>>>> import re
>>>>> a=re.search(r'hello.*?money','hello how are you hello funny money and
>>>>> more money')
>>>>> a.group(0)  # non-greedy stops at the first money
>>>>>           
>> 'hello how are you hello funny money'>>> a=re.search(r'hello.*money','hello how are you hello funny money and
>>     
>>>>> more money')
>>>>> a.group(0)  # greedy keeps going to the last money
>>>>>           
>> 'hello how are you hello funny money and more money'
>>
>> This is why it is difficult to use regular expressions to match nested
>> objects like parentheses or XML tags.  In your case you'll need something
>> extra to not match the first hello.
>>
>>     
>>>>> a=re.search(r'(?<!^)hello.*?money','hello how are you hello funny
>>>>> money')
>>>>> a.group(0)
>>>>>           
>> 'hello funny money'
>>
>> -Mark
>>     
>
> I see now. I also understand r's response. But what if there are many
> "hello"'s before "money," and I don't know how many there are? In
> other words, I want to find every occurrence of "money," and for each
> occurrence, I want to scan back to the first occurrence of "hello."
> How can this be done?
>   

This is asking for more power then regular expressions can support.

However, your request reads like an algorithm.  Search for an occurrence 
of "hello" (using the find string method), and search backwards from 
there for "money" (use rfind string method).  Two lines of code in a 
loop should do it.


Gary Herron



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090906/bb199d00/attachment.html>


More information about the Python-list mailing list