[Python-bugs-list] [ python-Bugs-833137 ] re.matchobject.findall() adds an extra element

SourceForge.net noreply at sourceforge.net
Thu Oct 30 10:41:50 EST 2003


Bugs item #833137, was opened at 2003-10-30 15:41
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=833137&group_id=5470

Category: Regular Expressions
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Kochanski (gpk)
Assigned to: Fredrik Lundh (effbot)
Summary: re.matchobject.findall() adds an extra element

Initial Comment:
import re
 
 
_sfp = re.compile(r'([][][0-9])|([^][]*)')
 
print _sfp.findall('test[1]2')
print _sfp.findall('test]1[2again')
print _sfp.findall('test[1')
print _sfp.findall(']2')



Yields:

[('', 'test'), ('[1', ''), (']2', ''), ('', '')]
[('', 'test'), (']1', ''), ('[2', ''), ('', 'again'),
('', '')]
[('', 'test'), ('[1', ''), ('', '')]
[(']2', ''), ('', '')]


Where do those empty matches at the end
come from?

Admittedly the [^][]* pattern can match
a zero length string, but if it's going to match
a zero-length string at the end,
why doesn't it also match at the beginning?
Or in between every nonzero match?

One would think, in the interests of economy and
sanity, that zero length matches should be
avoided unless they are needed to use up
all of the input string.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=833137&group_id=5470



More information about the Python-bugs-list mailing list