Something confusing about non-greedy reg exp match

Mark Tolonen metolone+gmane at gmail.com
Mon Sep 7 05:06:29 CEST 2009


<gburdell1 at gmail.com> wrote in message 
news:f98a6057-c35f-4843-9efb-7f36b05b677c at g19g2000yqo.googlegroups.com...
> If I do this:
>
> import re
> a=re.search(r'hello.*?money',  'hello how are you hello funny money')
>
> I would expect a.group(0) to be "hello funny money", since .*? is a
> non-greedy match. But instead, I get the whole sentence, "hello how
> are you hello funny money".
>
> Is this expected behavior? How can I specify the correct regexp so
> that I get "hello funny money" ?

A non-greedy match matches the fewest characters before matching the text 
*after* the non-greedy match.  For example:

>>> import re
>>> a=re.search(r'hello.*?money','hello how are you hello funny money and 
>>> more money')
>>> a.group(0)  # non-greedy stops at the first money
'hello how are you hello funny money'
>>> a=re.search(r'hello.*money','hello how are you hello funny money and 
>>> more money')
>>> a.group(0)  # greedy keeps going to the last money
'hello how are you hello funny money and more money'

This is why it is difficult to use regular expressions to match nested 
objects like parentheses or XML tags.  In your case you'll need something 
extra to not match the first hello.

>>> a=re.search(r'(?<!^)hello.*?money','hello how are you hello funny 
>>> money')
>>> a.group(0)
'hello funny money'

-Mark 





More information about the Python-list mailing list