Python Regex Question

Arnaud Delobelle arnodel at googlemail.com
Wed Oct 29 15:37:13 EDT 2008


On Oct 29, 7:01 pm, Tim Chase <python.l... at tim.thechases.com> wrote:
> > I need a regex expression which returns the start to the x=ANIMAL for
> > only the x=Dog fragments so all my entries should be start ...
> > (something here) ... x=Dog .  So I am really interested in fragments 1
> > and 3 only.
>
> > My idea (primitive) ^start.*?x=Dog doesn't work because clearly it
> > would return results
>
> > start
> > x=Dog  # (good)
>
> > and
>
> > start
> > x=Cat
> > stop
> > start
> > x=Dog # bad since I only want start ... x=Dog portion
>
> Looks like the following does the trick:
>
>  >>> s = """start      #frag 1 start
> ... x=Dog # frag 1 end
> ... stop
> ... start    # frag 2 start
> ... x=Cat # frag 2 end
> ... stop
> ... start     #frag 3 start
> ... x=Dog #frag 3 end
> ... stop"""
>  >>> import re
>  >>> r = re.compile(r'^start.*\nx=Dog.*\nstop.*', re.MULTILINE)
>  >>> for i, result in enumerate(r.findall(s)):
> ...     print i, repr(result)
> ...
> 0 'start      #frag 1 start\nx=Dog # frag 1 end\nstop'
> 1 'start     #frag 3 start\nx=Dog #frag 3 end\nstop'
>
> -tkc

This will only work if 'x=Dog' directly follows 'start' (which happens
in the given example).  If that's not necessarily the case, I would do
it in two steps (in fact I wouldn't use regexps probably but...):

>>> for chunk in re.split(r'\nstop', data):
...     m = re.search('^start.*^x=Dog', chunk, re.DOTALL |
re.MULTILINE)
...     if m: print repr(m.group())
...
'start      #frag 1 start \nx=Dog'
'start     #frag 3 start \nx=Dog'

--
Arnaud




More information about the Python-list mailing list