re.match -- not greedy?

Noah Rawlins noah.rawlins at comcast.net
Sun Nov 19 19:59:45 EST 2006


Carl Banks wrote:
> EXI-Andrews, Jack wrote:
>> the '*' and '+' don't seem to be greedy.. they will consume less in
>> order to match a string:
>>
>>>>> import re;re.match('(a+)(ab)','aaab').groups()
>> ('aa', 'ab')
>>
>> this is the sort of behaviour i'd expect from
>>    '(a+?)(ab)'
>>
>> a+ should greedily consume a's at the expense of the string not matching
> 
> It's called backtracking.  If a match fails, the regexp engine
> recursively unwinds some of the greedy matching and tries again.
> 
>> in real life, i am trying to match #defines:
>>
>>>>> re.match( '#define ([A-Za-z0-9_]+)([^(])', '#define
>> abc(d)').groups()
>> ('ab', 'c')
>>
>> i want this example to fail because the first character after a string
>> of letters is a '('
>> i want to match only #defines without parameters.
> 
> Probably the easiest thing to do is to attempt to match zero or one
> opening parentheses, and bail out if it did end up matching the
> parenthesis.
> 
> m = re.match(r"#define ([A-Za-z0-9_]+)(\(?)","#define abc(d)")
> if m and m.groups(2) != "(":
>     ...
> 
> (BTW, it's good practice to use a raw strings for the regular
> expressions as I did.)
> 

Another way that seems clearer to me is to use negative lookahead 
assertions.

 >>>defPatt = re.compile(r'#define\s+(\w+)\b(?!\()')
 >>> defPatt.match("#define abc(x)")
 >>> defPatt.match("#define foo_BarBaz")
<_sre.SRE_Match object at 0xb7dd5820>
 >>> defPatt.match("#define foo_BarBaz").groups()
('foo_BarBaz',)

In general '\w' is the same as [A-Za-z0-9_]

There are other considerations too... I don't know if

#define abc (x)

But the main thing here is the use of '\b' to require there to be a word 
boundary at the end your match and a (?!\() to require that the match is 
not followed by a '('

see http://docs.python.org/lib/re-syntax.html

noah


>> so what's the definition of greedy?
> 
> It means match as much you can *while getting a match*.
> 
> Just remember regexp parsing is not a one way street: if it hits a dead
> end, it'll go back and try another path.  In fact, greediness usually
> doesn't affect *whether* a regexp matches; it only affects the
> groupings.  I'm suddenly curious if there are any cases at all where
> greediness changes whether it finds a match.
> 
>> (pls copy me on responses)
> 
> Ah, too bad you won't see it then....
> 
> 
> Carl Banks
> 



More information about the Python-list mailing list