re.match -- not greedy?
pavlovevidence at gmail.com
Mon Nov 20 01:11:43 CET 2006
EXI-Andrews, Jack wrote:
> the '*' and '+' don't seem to be greedy.. they will consume less in
> order to match a string:
> >>> import re;re.match('(a+)(ab)','aaab').groups()
> ('aa', 'ab')
> this is the sort of behaviour i'd expect from
> a+ should greedily consume a's at the expense of the string not matching
It's called backtracking. If a match fails, the regexp engine
recursively unwinds some of the greedy matching and tries again.
> in real life, i am trying to match #defines:
> >>> re.match( '#define ([A-Za-z0-9_]+)([^(])', '#define
> ('ab', 'c')
> i want this example to fail because the first character after a string
> of letters is a '('
> i want to match only #defines without parameters.
Probably the easiest thing to do is to attempt to match zero or one
opening parentheses, and bail out if it did end up matching the
m = re.match(r"#define ([A-Za-z0-9_]+)(\(?)","#define abc(d)")
if m and m.groups(2) != "(":
(BTW, it's good practice to use a raw strings for the regular
expressions as I did.)
> so what's the definition of greedy?
It means match as much you can *while getting a match*.
Just remember regexp parsing is not a one way street: if it hits a dead
end, it'll go back and try another path. In fact, greediness usually
doesn't affect *whether* a regexp matches; it only affects the
groupings. I'm suddenly curious if there are any cases at all where
greediness changes whether it finds a match.
> (pls copy me on responses)
Ah, too bad you won't see it then....
More information about the Python-list