shortest match regexp operator anyone?

Harald Kirsch kirschh at lionbioscience.com
Thu Jul 12 09:14:34 CEST 2001


"Steve Holden" <sholden at holdenweb.com> writes:

> "Harald Kirsch" <kirschh at lionbioscience.com> wrote in ...
> >
> > SHORT STORY:
> > Does anyone know of a regular expression library which has an operator
> > that forces a subexpression to insist on its shortest match, even if
> > that ruins the overall match?
[snip]
> Had you thought about using lookahead assertions, which don't actually match
> anything, but fail unless the specified pattern is (or, for a negative
> lookahead assertion, is not) present? Combined with non-greedy matching this
> might get you where you want to be.

No. Friedl's book has an example similar to

   (.*?)(?=<A>)<A>B

but that matches "xx<A>x<A>B" i.e. the match contains an <A> in the
part covered by ".*". Again I cannot force "(.*?)(?=<A>)<A>" to insist
on the "shortest match" and not give it up for an overall match.

I tried other combinations, e.g. "(.(?!<A>))*?<A>" but none really
works. 

Advocacy: The `shortest match' operator is really missing from regexp
languages.

  Harald Kirsch
-- 
----------------+------------------------------------------------------
Harald Kirsch   | kirschh at lionbioscience.com | "How old is the epsilon?"
LION bioscience | +49 6221 4038 172          |        -- Paul Erdös
       *** Please do not send me copies of your posts. ***



More information about the Python-list mailing list