shortest match regexp operator anyone?

Steve Holden sholden at holdenweb.com
Wed Jul 11 15:58:10 EDT 2001


"Harald Kirsch" <kirschh at lionbioscience.com> wrote in ...
>
> SHORT STORY:
> Does anyone know of a regular expression library which has an operator
> that forces a subexpression to insist on its shortest match, even if
> that ruins the overall match?
>
> LONG STORY:
>
> Look how different the following tasks are, although they look so
> similar.
>
> 1) TASK: Find the first 'A' and match, if it is followed by a 'B'?
>    SOLUTION: '^[^A]+AB'
>
> 2) TASK: Find the first '<A>' and match, if it is followed by a 'B'
>    SOLUTION: ???
>
> An approximation for (2) is '^[^<>A]+<A>B', but it does not match
> 'A<A>B', which it should.
>
> With non-greedy matching, another approximation is '^.*?<A>B', however
> this matches 'xx<A>y<A>B', although it should not.
>
> The third approximation is '^(.*<A>){1,1}?B' and the Tcl's manual page
> `re_syntax' seems to indicate that '{1,1}?' forces a subexpression on
> a shortest match. However the `?' indicates `non-greedy' which only
> means that the `shortest match not ruining an overall match' is taken
> and not the `shortest match!'
>
> The solution would in fact be a *shortest match operator*. For the
> sake of discussion let it have postfix syntax like `*' and be written
> as `%'. The solution for (2) would be '^(.*<A>)%B' which would *not*
> match the string 'xx<A>y<A>B' since the shortest '.*<A>' found is not
> followed by 'B'.
>
> Again the question: Is there a re-library with a true shortest match
> operator?
>
> REMARK: The implementation would be rather simple since it suffices to
> delete all outgoing transitions from stop states of the automaton
> equivalent to the subexpression.
>
Had you thought about using lookahead assertions, which don't actually match
anything, but fail unless the specified pattern is (or, for a negative
lookahead assertion, is not) present? Combined with non-greedy matching this
might get you where you want to be.

regards
 Steve
--
http://www.holdenweb.com/








More information about the Python-list mailing list