How do I get to *all* of the groups of an re search?

Cameron Laird claird at
Fri Jan 10 16:34:30 CET 2003

In article <oe23f-ta3.ln1 at>,
Kyler Laird  <Kyler at> wrote:
>Sure, I can do that but then I have to parse the second group
>again.  In this case it's fairly trivial, but in my application
>there is a lot of junk in between each of the groups.
>The text I'm matching is more like this.
>	<a href="foo.html">
>	blah blah blah
>	<img src="fooabc.jpg">
>	blah blah
>	<img src="foocde.jpg">
>	more stuff
>	</a>
>I want [('foo', ['fooabc', 'foocde'])].  I have no problem with
>getting the RE to match everything.  It's just getting to all of
>the matched groups that's stopping me.
>If I use the RE you gave, I'll end up with something like this.
>	[('foo', ' blah blah blah <img src="fooabc.jpg"> blah blah <img
>src="foocde.jpg">', 'foocde')]
>That's going to require me to reprocess the second element.  It's
>inefficient and ugly.  Worse, it's not what I expected from the
>description in the documentation.

Got it.

1.  Harvey Thomas, in a nearby follow-up (how're
    gateway propagation delays today?) has sum-
    marized the main point far more aptly than 
    anything I wrote:  "You can't return a vari-
    able number of groups from a regex." (but 
    can Perl people?  They apparently tried to
    cram cement mixers, kitchen toasters, and
    turbojets inside their REs, so, who knows?)
2.  Oh, what you *really* want is HTML parsing.
    There are serious limits to RE's applicability
    in that role, as the columnists of <URL: http:// >
    assert.  Get an HTML parser--then be ready to
    tweak it to accept all the junk that roams
    around in the wild.

Cameron Laird <Cameron at>

More information about the Python-list mailing list