How do I get to *all* of the groups of an re search?

maney at pobox.com maney at pobox.com
Sat Jan 11 16:47:50 CET 2003


Tim Peters <tim.one at comcast.net> wrote:
> [Kyler Laird, discovers that a regexp group "in a loop" captures only
> the last place it matched]
>> ...
>> Yes, and that surprises me.  It seems so obvious that it
>> should return all matched pieces and so arbitrary that it
>> only returns the last one.
> 
> It would take potentially unbounded storage to remember all matches, and
> would also (at least) complicate the meaning of backreferences (what is \1
> supposed to match then?  "a list" of all strings ever matched by group 1?
> the catentation of them?  at least one of them?  a contigous slice of the
> string spanning the first and last places it matched?  etc).

The .net engine appears to do something like this, although the only
example I have for this (p. 431 of _Mastering Regular Expressions_, 2nd
edition) only shows it for a contiguous match, so it may or may not
actually do what Kyler wants - I just don't know.

Not that I'm really interested in promoting this, but... if it were to
be added, I would hope that backreferences would access the last of the
set of matches, preserving the existing behavior.  While this might not
be entirely surprise-free, it does score compatability points.  For
that matter, the existing API could be left unchanged, and the
additional information made available through an added interface - this
is more or less what .net does, although the exact syntax

  aMatch.Groups(2).Captures(n).Value

is a poor fit for Python's existing match object's interface, it does
suggest the sort of added interface that would be needed to handle such
an extension.

> gratuitous-novelty-is-harmful-ly y'rs  - tim

If only we could all agree on what was gratuitous and what was a Good
Idea... then again, maybe better to have the variety after all.  <wink>

-- 
A delicate balance is necessary between sticking with the things
you know and can rely upon, and exploring things which have the
potential to be better.  Assuming that either of these strategies
is the one true way is silly.  -- Graydon Hoare




More information about the Python-list mailing list