[Python-bugs-list] [ python-Bugs-505997 ] string.split docs are inconsistent

noreply@sourceforge.net noreply@sourceforge.net
Sun, 20 Jan 2002 01:24:15 -0800


Bugs item #505997, was opened at 2002-01-20 01:24
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=505997&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Matt Zimmerman (mzimmerman)
Assigned to: Nobody/Anonymous (nobody)
Summary: string.split docs are inconsistent

Initial Comment:
string.split.__doc__ says:

split(s [,sep [,maxsplit]]) -> list of strings

    Return a list of the words in the string s, using
sep as the
    delimiter string.  If maxsplit is given, splits
into at most
    maxsplit words.  If sep is not specified, any
whitespace string
    is a separator.

    (split and splitfields are synonymous)

This implies that len(split(s, sep, maxsplit)) <=
maxsplit.  In reality,
however, it is <= maxsplit+1.  This seems to be
explained by the library
documentation:

<quote>
split(s[, sep[, maxsplit]])
Return a list of the words of the string s. If the
optional second argument
sep is absent or None, the words are separated by
arbitrary strings of
whitespace characters (space, tab, newline, return,
formfeed). If the second
argument sep is present and not None, it specifies a
string to be used as
the word separator. The returned list will then have
one more item than the
number of non-overlapping occurrences of the separator
in the string. The
optional third argument maxsplit defaults to 0. If it
is nonzero, at most
maxsplit number of splits occur, and the remainder of
the string is returned
as the final element of the list (thus, the list will
have at most
maxsplit+1 elements).
</quote>

Which indicates that maxsplit is in units of "splits"
rather than "words",
where words = splits + 1.  Personally, i find the
"number of splits"
behaviour very counter-intuitive, and would much prefer
"number of words".
At any rate, the inconsistency needs to be corrected.

Also, the sentence "The optional third argument
maxsplit defaults to 0"
implies that specifying maxsplit=0 is the same as not
specifying it at all.
This is not the case, however:

Python 2.2 (#1, Jan  8 2002, 01:13:32) 
[GCC 2.95.4 20011006 (Debian prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for
more information.
>>> print "1x2x3".split('x')
['1', '2', '3']
>>> print "1x2x3".split('x',0)
['1x2x3']

Instead, it seems to cause sep to be disregarded,
making split(anything,0)
equivalent to split().

I don't have the python2.1 documentation installed at
the moment, so I can't
check the library reference for that version, but at
least the
string.split.__doc__ there is inconsistent with behaviour.

This was originally submitted as Debian bug #129272

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=505997&group_id=5470