How do I get to *all* of the groups of an re search?
Kyler Laird
Kyler at news.Lairds.org
Fri Jan 10 22:25:59 EST 2003
Andrew Dalke <adalke at mindspring.com> writes:
>> As it is, I am resigned to understanding that Python's re
>> module makes an arbitrary and undocumented decision to return
>> the last instance of a match for a group. I'm embarrassed.
>It is documented, and behaves as documented.
Ah! It's documented in multiple places. The quote I
gave before did not mention any further restrictions to
"the contents of a group can be retrieved after a match
has been performed,"
>http://www.python.org/doc/current/lib/match-objects.html
Thank you. Any idea how I should have know to find that from
here?
http://www.python.org/doc/current/lib/re-syntax.html
>] If a group number is negative or larger than the number of
>] groups defined in the pattern, an IndexError exception is
>] raised. If a group is contained in a part of the pattern that did not
>] match, the corresponding result is None. If a group is contained
>] in a part of the pattern that matched multiple times, the last
>] match is returned.
>As far as my research went, no standard regexp library could provide
>that sort of information. They only give the last group which
>matched a pattern.
Yes, and that surprises me. It seems so obvious that it
should return all matched pieces and so arbitrary that it
only returns the last one.
>I ended up writing my own regexp engine (!), Martel, which is at
>http://www.dalkescientific.com/Martel/ and based on mxTextTools.
Hmmmm...that's tempting. I try to limit myself to built-in
tools when I can, but it's still tempting. I appreciate
the reference.
>> At the very least, the documentation should be changed to say
>> that only the last match of a group will be returned. Better
>> still would be an explanation of why the last one was chosen
>> and how that makes Python's behavior more predictable.
>It is documented.
It's not documented where a user is likely to look for the
RE syntax. The RE syntax page gives what appears to be a
very straightforward explanation of groups. I don't know
why a beginning Python user would think to look elsewhere
for some strange behavior.
>And it is consistent with other regexp libs,
I'm not as concerned about Python being like lots of other
languages as I am with it behaving the way a new programmer
might expect it to work after reading what appears to be a
canonical explanation.
>eg, I know Perl's works that way. I have 2nd ed. of Friedl's
>regexp book, but I haven't read it yet and I can't find where
>he talks about it. Still, this behaviour is highly consistent
>with the other regexp packages.
Regardless, do you find it useful? Can you think of any time
when you want to match a bunch of things and just end up with
the last one?
>You can also solve this without regexps.
I can solve it lots of ways. I went for what I thought was
going to be an elegant solution. I'd like to have a tool
that works the way I expected the re module to work.
>Show me a module besides Martel which lets you get access to
>the parse tree. I looked at about a dozen packages, read
>through Friedl's 1st edition book, and posted to various newsgroups
>looking for one.
I'm not at all interested in how popular the solution is.
--kyler
More information about the Python-list
mailing list