re module: findall erroneously remembering matches from previous groups?
Greg Jorgensen
gregj at pobox.com
Mon Oct 2 06:55:56 EDT 2000
I'm using Python 1.6.
>>> s = 'The {speed|quick|slow} brown {animal} jumps over the {z|lazy} dogs
back.'
>>> p = r'\{(.*?)(?:\|(.*?)(?:\|(.*?))?)?\}'
>>> pat = re.compile(p)
The pattern p matches occurences of:
{aaa} or {aaa|xxx} or {aaa|xxx|yyy}
In other words, aaa may be followed by |xxx or |xxx|yyy. The entire pattern
must be enclosed in {...}.
>>> pat.findall(s)
[('speed', 'quick', 'slow'), ('animal', 'quick', 'slow'), ('z', 'lazy',
'slow')]
I expected the second tuple to be ('animal', None, None), and the third
tuple to be ('z', 'lazy', None). If I use search instead I get the expected
results:
>>> m = pat.search(s, 0)
>>> m.groups()
('speed', 'quick', 'slow')
>>> m = pat.search(s, m.end())
>>> m.groups()
('animal', None, None)
>>> m = pat.search(s, m.end())
>>> m.groups()
('z', 'lazy', None)
So is this a bug in findall(), or is this how it is supposed to work? I
expected findall() to return the same list I'd get from this code:
>>> def findall(pat, s):
f = []
i = 0
while 1:
m = pat.search(s, i)
if not m: break
f.append(m.groups())
i = m.end()
return f
>>> findall(pat, s)
[('speed', 'quick', 'slow'), ('animal', None, None), ('z', 'lazy', None)]
I haven't looked at the source for the re module yet... I am hoping that
someone has seen this before. Thanks.
Greg Jorgensen
Deschooling Society
Portland, Oregon, USA
gregj at pobox.com
More information about the Python-list
mailing list