shortest match regexp operator anyone?

Harald Kirsch kirschh at lionbioscience.com
Thu Jul 12 02:26:52 EDT 2001


"Richard.Suchenwirth" <Richard.Suchenwirth at kst.siemens.de> writes:

> Harald Kirsch wrote:
> ...
> > 2) TASK: Find the first '<A>' and match, if it is followed by a 'B'
> >    SOLUTION: ???
> > 
> > An approximation for (2) is '^[^<>A]+<A>B', but it does not match
> > 'A<A>B', which it should.
> > 
> > With non-greedy matching, another approximation is '^.*?<A>B', however
> > this matches 'xx<A>y<A>B', although it should not.
>  
> Maybe I don't understand the exact problem, but wouldn't
> 
> regexp  {(<A>)B} foo<A>Cbar<A>B -> matched
> 
> fulfill task 2?

Ooops, yes. Sorry, I stripped the example down too much. The typical
application is of course that you want something resembling

  <A>(.*</A>)

being part of a larger regexp except that the parenthesized expression
shall be the shortest match found, i.e. '.*' may contain every string
but "</A>". I insist in `part of a larger regexp' since otherwise a
step by step approach would still be easy to perform. (Yes, I *have*
an application for that :-<).

  Harald Kirsch

-- 
----------------+------------------------------------------------------
Harald Kirsch   | kirschh at lionbioscience.com | "How old is the epsilon?"
LION bioscience | +49 6221 4038 172          |        -- Paul Erdös
       *** Please do not send me copies of your posts. ***



More information about the Python-list mailing list