re findall mod for issue of side effects

Tim Peters tim.one at home.com
Mon Jan 15 03:07:43 EST 2001


[Andrew Henshaw]
> I've changed a line, added a line, and added a 'grammaticGrouping'
> rameter to the definition of RegexObj.findall (the parameter gets
> modified in the user api, also).  This appears to change the
> behavior of findall to be consistent with what I was desiring; that
> is, a way to specify grouping in a regex pattern, without returning
> tuples in the findall result.

Except that's what non-capturing parens are for, in the context of
re.findall() and *everywhere else*.  Adding a unique wart to findall() is
probably a poor idea unless it's astonishingly useful.

> An example:
>
> >>> s='..abcabcxyz..'
>
> # Try a simple pattern
> >>> r=re.compile('abcxyz')
> >>> r.findall(s)
> ['abcxyz']
>
> # Now add some grouping to the pattern
> >>> r=re.compile('(abc)*(xyz)*')
> >>> r.findall(s)
> [('', ''), ('', ''), ('abc', 'xyz'), ('', ''), ('', ''), ('', '')]
> # Wow, that changed the return value dramatically

Sure did.  But your pattern after "grouping" is also radically different in
another way:  it can match an empty string, where your original pattern
could not.  And a pattern that can match nothing everywhere is a very
strange pattern for findall() (what is it you're trying to find then?  a
bunch of nothings?  that's what you're *telling* it to find).  You do
strange things, you get strange results.

A pattern matching what it appears you *intended* to search for here:

>>> r = re.compile('(?:abc)+(?:xyz)*|(?:xyz)+')
>>> r.findall(s)
['abcabcxyz']
>>>

That is, don't hand findall a pattern than matches empty strings, and it
won't return empty matches.

> ...
> Does anybody else see this [grammaticGrouping] to be as useful
> as I do?

Sorry, I don't:  I see it as misusing findall(), and then adding a wart to
cover that up.  But then I'm always generous in my assessments <wink>.

More generally useful would be a new flag on regexp compilation meaning "all
my parens are non-capturing".  Then that part of it could be enjoyed by all
uses of regexps, not just findall.  I don't see a need for that, but it
wouldn't be particularly damaging.

If you're going to ask findall() to match empty strings, though, filter 'em
out yourself.

cruel-but-fair-ly y'rs  - tim





More information about the Python-list mailing list