Python Regex Question
Arnaud Delobelle
arnodel at googlemail.com
Wed Oct 29 15:37:13 EDT 2008
On Oct 29, 7:01 pm, Tim Chase <python.l... at tim.thechases.com> wrote:
> > I need a regex expression which returns the start to the x=ANIMAL for
> > only the x=Dog fragments so all my entries should be start ...
> > (something here) ... x=Dog . So I am really interested in fragments 1
> > and 3 only.
>
> > My idea (primitive) ^start.*?x=Dog doesn't work because clearly it
> > would return results
>
> > start
> > x=Dog # (good)
>
> > and
>
> > start
> > x=Cat
> > stop
> > start
> > x=Dog # bad since I only want start ... x=Dog portion
>
> Looks like the following does the trick:
>
> >>> s = """start #frag 1 start
> ... x=Dog # frag 1 end
> ... stop
> ... start # frag 2 start
> ... x=Cat # frag 2 end
> ... stop
> ... start #frag 3 start
> ... x=Dog #frag 3 end
> ... stop"""
> >>> import re
> >>> r = re.compile(r'^start.*\nx=Dog.*\nstop.*', re.MULTILINE)
> >>> for i, result in enumerate(r.findall(s)):
> ... print i, repr(result)
> ...
> 0 'start #frag 1 start\nx=Dog # frag 1 end\nstop'
> 1 'start #frag 3 start\nx=Dog #frag 3 end\nstop'
>
> -tkc
This will only work if 'x=Dog' directly follows 'start' (which happens
in the given example). If that's not necessarily the case, I would do
it in two steps (in fact I wouldn't use regexps probably but...):
>>> for chunk in re.split(r'\nstop', data):
... m = re.search('^start.*^x=Dog', chunk, re.DOTALL |
re.MULTILINE)
... if m: print repr(m.group())
...
'start #frag 1 start \nx=Dog'
'start #frag 3 start \nx=Dog'
--
Arnaud
More information about the Python-list
mailing list