[Python-bugs-list] [ python-Bugs-505997 ] string.split docs are inconsistent
noreply@sourceforge.net
noreply@sourceforge.net
Sun, 20 Jan 2002 20:30:10 -0800
Bugs item #505997, was opened at 2002-01-20 01:24
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=505997&group_id=5470
Category: Documentation
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Matt Zimmerman (mzimmerman)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: string.split docs are inconsistent
Initial Comment:
string.split.__doc__ says:
split(s [,sep [,maxsplit]]) -> list of strings
Return a list of the words in the string s, using
sep as the
delimiter string. If maxsplit is given, splits
into at most
maxsplit words. If sep is not specified, any
whitespace string
is a separator.
(split and splitfields are synonymous)
This implies that len(split(s, sep, maxsplit)) <=
maxsplit. In reality,
however, it is <= maxsplit+1. This seems to be
explained by the library
documentation:
<quote>
split(s[, sep[, maxsplit]])
Return a list of the words of the string s. If the
optional second argument
sep is absent or None, the words are separated by
arbitrary strings of
whitespace characters (space, tab, newline, return,
formfeed). If the second
argument sep is present and not None, it specifies a
string to be used as
the word separator. The returned list will then have
one more item than the
number of non-overlapping occurrences of the separator
in the string. The
optional third argument maxsplit defaults to 0. If it
is nonzero, at most
maxsplit number of splits occur, and the remainder of
the string is returned
as the final element of the list (thus, the list will
have at most
maxsplit+1 elements).
</quote>
Which indicates that maxsplit is in units of "splits"
rather than "words",
where words = splits + 1. Personally, i find the
"number of splits"
behaviour very counter-intuitive, and would much prefer
"number of words".
At any rate, the inconsistency needs to be corrected.
Also, the sentence "The optional third argument
maxsplit defaults to 0"
implies that specifying maxsplit=0 is the same as not
specifying it at all.
This is not the case, however:
Python 2.2 (#1, Jan 8 2002, 01:13:32)
[GCC 2.95.4 20011006 (Debian prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for
more information.
>>> print "1x2x3".split('x')
['1', '2', '3']
>>> print "1x2x3".split('x',0)
['1x2x3']
Instead, it seems to cause sep to be disregarded,
making split(anything,0)
equivalent to split().
I don't have the python2.1 documentation installed at
the moment, so I can't
check the library reference for that version, but at
least the
string.split.__doc__ there is inconsistent with behaviour.
This was originally submitted as Debian bug #129272
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2002-01-20 20:30
Message:
Logged In: NO
Tim was looking at the doc string for the split *method* of
string objects, which is correct. But the complaint was
about the split *function* in the (no longer needed, but
still supported) string *module*, which is indeed wrong --
still in 2.2.
--Guido (not logged in)
----------------------------------------------------------------------
Comment By: Matt Zimmerman (mzimmerman)
Date: 2002-01-20 20:20
Message:
Logged In: YES
user_id=196786
Thanks for responding.
The docstring was from Python 2.1.2 (Debian 2.1.2-2):
Python 2.1.2 (#1, Jan 18 2002, 18:05:45)
[GCC 2.95.4 (Debian prerelease)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> import string
>>> print string.split.__doc__
split(s [,sep [,maxsplit]]) -> list of strings
Return a list of the words in the string s, using sep as the
delimiter string. If maxsplit is given, splits into at most
maxsplit words. If sep is not specified, any whitespace string
is a separator.
(split and splitfields are synonymous)
In 2.2, it seems to be corrected:
Python 2.2 (#1, Jan 8 2002, 01:13:32)
[GCC 2.95.4 20011006 (Debian prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import string
>>> print string.split.__doc__
split(s [,sep [,maxsplit]]) -> list of strings
Return a list of the words in the string s, using sep as the
delimiter string. If maxsplit is given, splits into at most
maxsplit words. If sep is not specified, any whitespace string
is a separator.
(split and splitfields are synonymous)
The library documentation for 2.2 still says that maxsplit defaults to 0,
though apparently it defaults to -1, so that needs to be fixed.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-01-20 19:38
Message:
Logged In: YES
user_id=31435
I don't know which version of Python they're using, but the
docstring doesn't match what's claimed here in 2.0.1, 2.1
or 2.2. Assigned to Fred for resolution (probably "Fixed").
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2002-01-20 07:05
Message:
Logged In: NO
The docs and docstring seems wrong; the behavior is correct.
maxsplit is the number of *separators* recognized; it
defaults to -1. specifying maxsplit=0 makes it a no-op.
--Guido (can't log in right now)
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=505997&group_id=5470