How do I get to *all* of the groups of an re search?

Kyler Laird Kyler at news.Lairds.org
Fri Jan 10 22:25:59 EST 2003


Andrew Dalke <adalke at mindspring.com> writes:

>> As it is, I am resigned to understanding that Python's re
>> module makes an arbitrary and undocumented decision to return
>> the last instance of a match for a group.  I'm embarrassed.

>It is documented, and behaves as documented.

Ah!  It's documented in multiple places.  The quote I
gave before did not mention any further restrictions to
"the contents of a group can be retrieved after a match
has been performed,"

>http://www.python.org/doc/current/lib/match-objects.html

Thank you.  Any idea how I should have know to find that from
here?
	http://www.python.org/doc/current/lib/re-syntax.html

>] If a group number is negative or larger than the number of
>] groups defined in the pattern, an IndexError exception is
>] raised. If a group is contained in a part of the pattern that did not 
>] match, the corresponding result is None. If a group is contained
>] in a part of the pattern that matched multiple times, the last
>] match is returned.

>As far as my research went, no standard regexp library could provide
>that sort of information.  They only give the last group which
>matched a pattern.

Yes, and that surprises me.  It seems so obvious that it
should return all matched pieces and so arbitrary that it
only returns the last one.

>I ended up writing my own regexp engine (!), Martel, which is at
>http://www.dalkescientific.com/Martel/ and based on mxTextTools.

Hmmmm...that's tempting.  I try to limit myself to built-in
tools when I can, but it's still tempting.  I appreciate
the reference.

>> At the very least, the documentation should be changed to say
>> that only the last match of a group will be returned.  Better
>> still would be an explanation of why the last one was chosen
>> and how that makes Python's behavior more predictable.

>It is documented.  

It's not documented where a user is likely to look for the
RE syntax.  The RE syntax page gives what appears to be a
very straightforward explanation of groups.  I don't know
why a beginning Python user would think to look elsewhere
for some strange behavior.

>And it is consistent with other regexp libs,

I'm not as concerned about Python being like lots of other
languages as I am with it behaving the way a new programmer
might expect it to work after reading what appears to be a
canonical explanation.

>eg, I know Perl's works that way.  I have 2nd ed. of Friedl's
>regexp book, but I haven't read it yet and I can't find where
>he talks about it.  Still, this behaviour is highly consistent
>with the other regexp packages.

Regardless, do you find it useful?  Can you think of any time
when you want to match a bunch of things and just end up with
the last one?

>You can also solve this without regexps.

I can solve it lots of ways.  I went for what I thought was
going to be an elegant solution.  I'd like to have a tool
that works the way I expected the re module to work.

>Show me a module besides Martel which lets you get access to
>the parse tree.  I looked at about a dozen packages, read
>through Friedl's 1st edition book, and posted to various newsgroups
>looking for one.

I'm not at all interested in how popular the solution is.

--kyler




More information about the Python-list mailing list