suggestion for module re

Mon Oct 22 14:54:22 EDT 2001

Skip Montanaro wrote:
> 
>     Jose> My first goal is to get all the *named* fields of a regex, for all
>     Jose> of the matches in a source string.  The re.RegexObject.findall
>     Jose> method is "almost" good, but it returns a tuple of matched
>     Jose> *groups*, leading to a dumb pharentesis-counting task and to code
>     Jose> hard to maintain.
> 
> I suspect that in most situations you will want all the interesting groups
> to be named or none of the interesting groups to be named.

Yep, I believe so.

> If you want
> named groups, use the "(?:...)" construct to create "throwaway" groups.  See
> the re syntax docs for more info:

Right.  Good suggestion.  That solves the problem when we want to make changes
in a regex (such as insert a group for branching purposes, I mean, things like
(X|Y) instead of X, and don't need to keep track of the thing grouped.

>     http://www.python.org/doc/current/lib/re-syntax.html

Actually, http://localhost/doc/python/html/lib/re-syntax.html :-)

> Your example would thus become:
> 
>     rx = re.compile(r'(?:(?P<a>a|A)(?P<b>b|B))')
> 
> (though why you needed the outer parens in the first place is not clear).

(Right, my outer parenthesis are a dumb mistake.)

Anyway, it still remains the need of a function to travel through a string and
collect all the matches in one single object, *while preserving the names*.  I
mean, we have named groups in the regex engine, so we shouldn't need to keep
counting parenthesis...

The merge of the good things of groupdict and findall seems natural.  Better
even if we can merge the good things of findall and *every other* matchObject
method/attribute, present or future!  So eventually the Good Thing is to have a
method similar to findall that returns *objects* instead of a specific view of
them.

Thanks,
Sebrosa