How to get the "longest possible" match with Python's RE module?
fanglicheng at gmail.com
Tue Sep 12 06:51:03 CEST 2006
> Licheng Fang wrote:
> > Basically, the problem is this:
> > >>> p = re.compile("do|dolittle")
> > >>> p.match("dolittle").group()
> > 'do'
> >From what I understand, this isn't python specific, it is the expected
> behavior of that pattern in any implementation. You are using
> alternation, which means "either, or", and you have the shorter
> subexpression first, so the condition is satisfied by just 'do' and the
> matching terminates.
> > There's another example:
> > >>> p = re.compile("one(self)?(selfsufficient)?")
> > >>> p.match("oneselfsufficient").group()
> > 'oneself'
> Again, I don't think this has anything to do with python. You pattern
> basically means "match 'one' whether it is followed by 'self' or not,
> and whether it is followed by 'selfsufficient' or not". For this
> particular example, you'd want something like
> I think you could construct a pattern that would do what you want in
> python without any problem. If you post a (short) example of your data,
> I'm sure someone could help you with it.
Hi, according to these regexp engine discussions, it's NOT a behavior
true to any implementation.
Python's NFA engine reads along the input string, matching it to the
pattern, and backtracking when needed. By contrast a DFA engine, to my
understanding, constructs a DFA and uses it to munch as many characters
as possible. Maybe it's like this:
PYTHON'S NFA ENGINE:
one self, none selfsufficient, none
I want to know if there is some way to make Python RE behave like grep
does, or do I have to change to another engine?
More information about the Python-list