[Tutor] Parsing multiple lines from text file using regex

Sun Oct 27 15:46:27 CET 2013

Marc wrote:

> Hi,
> I am having an issue with something that would seem have an easy solution,
> which escapes me.  I have configuration files that I would like to parse.
> The data I am having issue with is a multi-line attribute that has the
> following structure:
> 
> banner <option> <banner text delimiter>
> Banner text
> Banner text
> Banner text
> ...
> <banner text delimiter>
> 
> The regex 'banner\s+(\w+)\s+(.+)' captures the command nicely and
> banner.group(2) captures the delimiter nicely.
> 
> My issue is that I need to capture the lines between the delimiters (both
> delimiters are the same).
> 
> I have tried various permutations of
> 
> Delimiter=banner.group(2)
> re.findall(Delimiter'(.*?)'Delimiter, line, re.DOTALL|re.MULTILINE)
> 
> with no luck
> 
> Examples I have found online all assume that the starting and ending
> delimiters are different and are defined directly in re.findall().  I
> would like to use the original regex extracting the banner.group(2), since
> it is already done.
> 
> Any help in pointing me in the right direction would be most appreciated.

You can reference a group in the regex with \N, e. g.:

>>> text = """"banner option delim
... banner text
... banner text
... banner text
... delim
... """
>>> re.compile(r"banner\s+(\w+)\s+(\S+)\s+(.+?)\2", re.MULTILINE | 
re.DOTALL).findall(text)
[('option', 'delim', 'banner text\nbanner text\nbanner text\n')]