[Tutor] Parsing multiple lines from text file using regex
Peter Otten
__peter__ at web.de
Sun Oct 27 15:46:27 CET 2013
Marc wrote:
> Hi,
> I am having an issue with something that would seem have an easy solution,
> which escapes me. I have configuration files that I would like to parse.
> The data I am having issue with is a multi-line attribute that has the
> following structure:
>
> banner <option> <banner text delimiter>
> Banner text
> Banner text
> Banner text
> ...
> <banner text delimiter>
>
> The regex 'banner\s+(\w+)\s+(.+)' captures the command nicely and
> banner.group(2) captures the delimiter nicely.
>
> My issue is that I need to capture the lines between the delimiters (both
> delimiters are the same).
>
> I have tried various permutations of
>
> Delimiter=banner.group(2)
> re.findall(Delimiter'(.*?)'Delimiter, line, re.DOTALL|re.MULTILINE)
>
> with no luck
>
> Examples I have found online all assume that the starting and ending
> delimiters are different and are defined directly in re.findall(). I
> would like to use the original regex extracting the banner.group(2), since
> it is already done.
>
> Any help in pointing me in the right direction would be most appreciated.
You can reference a group in the regex with \N, e. g.:
>>> text = """"banner option delim
... banner text
... banner text
... banner text
... delim
... """
>>> re.compile(r"banner\s+(\w+)\s+(\S+)\s+(.+?)\2", re.MULTILINE |
re.DOTALL).findall(text)
[('option', 'delim', 'banner text\nbanner text\nbanner text\n')]
More information about the Tutor
mailing list