[Python-bugs-list] [ python-Bugs-833137 ] re.matchobject.findall() adds an extra element

Thu Oct 30 10:55:01 EST 2003

Bugs item #833137, was opened at 2003-10-30 16:41
Message generated for change (Comment added) made by effbot
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=833137&group_id=5470

Category: Regular Expressions
Group: Python 2.3
>Status: Closed
>Resolution: Invalid
Priority: 5
Submitted By: Greg Kochanski (gpk)
Assigned to: Fredrik Lundh (effbot)
Summary: re.matchobject.findall() adds an extra element

Initial Comment:
import re

_sfp = re.compile(r'([][][0-9])|([^][]*)')

print _sfp.findall('test[1]2')
print _sfp.findall('test]1[2again')
print _sfp.findall('test[1')
print _sfp.findall(']2')

Yields:

[('', 'test'), ('[1', ''), (']2', ''), ('', '')]
[('', 'test'), (']1', ''), ('[2', ''), ('', 'again'),
('', '')]
[('', 'test'), ('[1', ''), ('', '')]
[(']2', ''), ('', '')]

Where do those empty matches at the end
come from?

Admittedly the [^][]* pattern can match
a zero length string, but if it's going to match
a zero-length string at the end,
why doesn't it also match at the beginning?
Or in between every nonzero match?

One would think, in the interests of economy and
sanity, that zero length matches should be
avoided unless they are needed to use up
all of the input string.

----------------------------------------------------------------------

>Comment By: Fredrik Lundh (effbot)
Date: 2003-10-30 16:55

Message:
Logged In: YES 
user_id=38376

a|b always checks pattern a before it checks pattern b, and 
findall returns all non-overlapping matches it can find.  

See the library reference for more information.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=833137&group_id=5470