re module: findall erroneously remembering matches from previous groups?

Greg Jorgensen gregj at pobox.com
Mon Oct 2 06:55:56 EDT 2000


I'm using Python 1.6.

>>> s = 'The {speed|quick|slow} brown {animal} jumps over the {z|lazy} dogs
back.'
>>> p = r'\{(.*?)(?:\|(.*?)(?:\|(.*?))?)?\}'
>>> pat = re.compile(p)

The pattern p matches occurences of:

    {aaa} or {aaa|xxx} or {aaa|xxx|yyy}

In other words, aaa may be followed by |xxx or |xxx|yyy. The entire pattern
must be enclosed in {...}.

>>> pat.findall(s)
[('speed', 'quick', 'slow'), ('animal', 'quick', 'slow'), ('z', 'lazy',
'slow')]

I expected the second tuple to be ('animal', None, None), and the third
tuple to be ('z', 'lazy', None). If I use search instead I get the expected
results:

>>> m = pat.search(s, 0)
>>> m.groups()
('speed', 'quick', 'slow')
>>> m = pat.search(s, m.end())
>>> m.groups()
('animal', None, None)
>>> m = pat.search(s, m.end())
>>> m.groups()
('z', 'lazy', None)


So is this a bug in findall(), or is this how it is supposed to work? I
expected findall() to return the same list I'd get from this code:

>>> def findall(pat, s):
        f = []
        i = 0
        while 1:
            m = pat.search(s, i)
            if not m: break
            f.append(m.groups())
            i = m.end()
        return f

>>> findall(pat, s)
[('speed', 'quick', 'slow'), ('animal', None, None), ('z', 'lazy', None)]


I haven't looked at the source for the re module yet... I am hoping that
someone has seen this before. Thanks.


Greg Jorgensen
Deschooling Society
Portland, Oregon, USA
gregj at pobox.com





More information about the Python-list mailing list