[Tutor] re.findall() weirdness. [looks like a bug!]

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Tue, 26 Jun 2001 19:23:31 -0700 (PDT)


On Wed, 27 Jun 2001, Dan Tropp wrote:

> Thanks for your excellent reply. I didn't actually check the
> documentation, I just assumed that you could include flags like
> re.search(). I guess these sort of bugs are hard to stop in an untyped
> language which allows default arguments!
> 
> I'll register it on sourceforge.

Ok.  Thanks for pointing that out; it was a fun bug to catch!  It's
something that Guido and Friends most likely need to fix, so you don't
need to apologize for assuming this.  I would assume the same thing, since
re.search() and re.findall() do very similar things.

By the way, to get things to behave the way you wanted it to behave (with
flags for case insensitivity and other stuff), we could make the following
enhancement to sre.py:

###
# An enhanced findall:
def findall(pattern, string, flags=0):
    """Return a list of all non-overlapping matches in the string.

    If one or more groups are present in the pattern, return a
    list of groups; this will be a list of tuples if the pattern
    has more than one group.

    Empty matches are included in the result."""
    return _compile(pattern, flags).findall(string)
###

Then things behave the way they probably should have in the first place:

###
## Using the enhanced sre.py
>>> print re.findall('<.*?>','<1> </2> \n<3> </4>', re.I|re.S)
['<1>', '</2>', '<3>', '</4>']
>>> print re.findall('python','pythonPythonPYTHON', re.I)
['python', 'Python', 'PYTHON']
###

Of course, this makes our enhanced findall() completely incompatible with
Python 1.52.  Maybe it can make it into Python 2.2!  *grin*

Hope this helps!