re findall mod for issue of side effects

Andrew Henshaw andrew_dot_henshaw_at_earthling_dot_net
Mon Jan 15 08:11:43 EST 2001


"Tim Peters" <tim.one at home.com> wrote in message
news:mailman.979546103.2078.python-list at python.org...
> Except that's what non-capturing parens are for, in the context of
> re.findall() and *everywhere else*.  Adding a unique wart to findall() is
> probably a poor idea unless it's astonishingly useful.

...snip...

>
> Sure did.  But your pattern after "grouping" is also radically different
in
> another way:  it can match an empty string, where your original pattern
> could not.  And a pattern that can match nothing everywhere is a very
> strange pattern for findall() (what is it you're trying to find then?  a
> bunch of nothings?  that's what you're *telling* it to find).  You do
> strange things, you get strange results.
>
> A pattern matching what it appears you *intended* to search for here:
>
> >>> r = re.compile('(?:abc)+(?:xyz)*|(?:xyz)+')
> >>> r.findall(s)
> ['abcabcxyz']
> >>>
>
> That is, don't hand findall a pattern than matches empty strings, and it
> won't return empty matches.

Introducing empty matches was by mistake.  I should have left the pattern at
(abc)+xyz

There is a problem with (?:) that I brought up in 'Using re -side effects or
misunderstanding'.

...snip...

> > Does anybody else see this [grammaticGrouping] to be as useful
> > as I do?
>
> Sorry, I don't:  I see it as misusing findall(), and then adding a wart to
> cover that up.  But then I'm always generous in my assessments <wink>.
>
> More generally useful would be a new flag on regexp compilation meaning
"all
> my parens are non-capturing".  Then that part of it could be enjoyed by
all
> uses of regexps, not just findall.  I don't see a need for that, but it
> wouldn't be particularly damaging.

This is what I had suggested (well, maybe not, see below) in yesterday's
take on this subject (see:
'Using re -side effects or misunderstanding') and would be my preferred
design.  I looked at the code and became worried that the effect of that
flag would have deep consequences that I wasn't going to foresee in my quick
examination.  Therefore, I thought I'd limit the 'wart', as you call it, to
the area that I was particularly interested, for demonstration purposes.

As to your suggestion (a new flag on regexp compilation meaning "all my
parens are non-capturing"), I'd still like to retain the ability to use the
non-capturing flag to exclude portions from the return string.  This may be
what you're stating, but I'd like the flag to indicate that parens are for
grammatical grouping - they do not  force a tuple return.

Thus,
s='..abcxyz..'
r=re.compile('(ab)+(?:c)(xyz)+')
r.findall(s)

would return

['abxyz']

(I should put a fractional wink in here about null strings)


>
> If you're going to ask findall() to match empty strings, though, filter
'em
> out yourself.

Yes, I agree.  That bit of code shouldn't be in there.  I realized that late
last night, when I was playing with the patch.  Also, the patch is flawed in
that it doesn't handle the '(?:)' type of parens correctly.  The problems
one generates when one tries to rush a 'product' out the door.


>
> cruel-but-fair-ly y'rs  - tim
>
Not cruel at all.

AH





More information about the Python-list mailing list