How to access multiple group matches?

Gary Herron gherron at islandtraining.com
Fri Apr 6 11:10:49 EDT 2007


Christoph Krammer wrote:
> Hello,
>
> I want to use the re module to split a data stream that consists of
> several blocks of data. I use the following code:
>
> iter = re.finditer('^(HEADER\n.*)+$', data)
>
> The data variable contains binary data that has the word HEADER in it
> in some places and binary data after this word till the next
> appearance of header or the end of the file. But if I iterate over
> iter, I only get one match and this match only contains one group. How
> to access the other matches? Data may contain tens of them.
>
> Thanks in advance,
>  Christoph
>
>   

Use .*? instead of .* in your regular expression. 


 From the manual page:

*|*?|, |+?|, |??|*
    The "*", "+", and "?" qualifiers are all /greedy/; they match as
    much text as possible. Sometimes this behaviour isn't desired; if
    the RE <.*> is matched against |'<H1>title</H1>'|, it will match the
    entire string, and not just |'<H1>'|. Adding "?" after the qualifier
    makes it perform the match in /non-greedy/ or /minimal/ fashion; as
    /few/ characters as possible will be matched. Using .*? in the
    previous expression will match only |'<H1>'|.

Gary Herron




More information about the Python-list mailing list