Something confusing about non-greedy reg exp match
ptmcg at austin.rr.com
Mon Sep 7 16:00:04 CEST 2009
On Sep 6, 11:23 pm, Ben Finney <ben+pyt... at benfinney.id.au> wrote:
> George Burdell <gburde... at gmail.com> writes:
> > I want to find every occurrence of "money," and for each
> > occurrence, I want to scan back to the first occurrence
> > of "hello." How can this be done?
> By recognising the task: not expression matching, but lexing and
> parsing. For which you might find the ‘pyparsing’ library of use
Even pyparsing has to go through some gyrations to do this sort of
"match, then backup" parsing. Here is my solution:
>>> from pyparsing import SkipTo, originalTextFor
>>> expr = originalTextFor("hello" + SkipTo("money", failOn="hello", include=True))
>>> print expr.searchString('hello how are you hello funny money')
[['hello funny money']]
SkipTo is analogous to the OP's .*?, but the failOn attribute adds the
logic "if this string is found before matching the target string, then
fail". So pyparsing scans through the string, matches the first
"hello", attempts to skip to the next occurrence of "money", but finds
another "hello" first, so this parse fails. Then the scan continues
until the next "hello" is found, and this time, SkipTo successfully
finds "money" without first hitting a "hello". I then had to wrap the
whole thing in a helper method originalTextFor, otherwise I get an
ugly grouping of separate strings.
So I still don't really have any kind of "backup after matching"
parsing, I just turned this into a qualified forward match. One could
do a similar thing with a parse action. If you could attach some kind
of validating function to a field within a regex, you could have done
the same thing there.
More information about the Python-list