
Python 2 and 3 both exhibit this behavior: >>> "".split() [] >>> "".split("*") [''] >>> "".split(" ") [''] It's not at all clear to me why splitting an empty string on implicit whitespace should yield an empty list but splitting it with a non-whitespace character or explicit whitespace should yield a list with an empty string as its lone element. I realize this is documented behavior, but I can't for the life of me understand what the rationale might be for the different behaviors. Seems like a wart which might best be removed sometime in 3.x. Skip

On Thu, Dec 11, 2008 at 6:18 AM, <skip@pobox.com> wrote:
Python 2 and 3 both exhibit this behavior:
"".split() [] "".split("*") [''] "".split(" ") ['']
It's not at all clear to me why splitting an empty string on implicit whitespace should yield an empty list but splitting it with a non-whitespace character or explicit whitespace should yield a list with an empty string as its lone element. I realize this is documented behavior, but I can't for the life of me understand what the rationale might be for the different behaviors. Seems like a wart which might best be removed sometime in 3.x.
Which of the two would you choose for all? The empty string is the only reasonable behavior for split-with-argument, it is the logical consequence of how it behaves when the string is not empty. E.g. "x:y".split(":") -> ["x", "y"], "x::y".split(":") -> ["x", "", "y"], ":".split(":") -> ["", ""]. OTOH split-on-whitespace doesn't behave this way; it extracts the non-empty non-whitespace-containing substrings. If anything it's wrong, it's that they share the same name. This wasn't always the case. Do you really want to go back to .split() and .splitfields(sep)? -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido> Which of the two would you choose for all? The empty string is the Guido> only reasonable behavior for split-with-argument, it is the logical Guido> consequence of how it behaves when the string is not empty. E.g. Guido> "x:y".split(":") -> ["x", "y"], "x::y".split(":") -> ["x", "", "y"], Guido> ":".split(":") -> ["", ""]. OTOH split-on-whitespace doesn't behave Guido> this way; it extracts the non-empty non-whitespace-containing Guido> substrings. In my feeble way of thinking I go from something which evaluates to false to something which doesn't. It's almost like making matter out of empty space: bool("") -> False bool("".split()) -> False bool("".split("n")) -> True Guido> If anything it's wrong, it's that they share the same name. This Guido> wasn't always the case. Do you really want to go back to .split() Guido> and .splitfields(sep)? That might be preferable. The same method having such strikingly different behavior throws me every time I try splitting a possibly empty string with a non-whitespace character. It's a relatively uncommon case. Most of the time when you split a string with a non-whitespace character I think you know that the input can't be empty. Skip

skip@pobox.com wrote:
Guido> Which of the two would you choose for all? The empty string is the Guido> only reasonable behavior for split-with-argument, it is the logical Guido> consequence of how it behaves when the string is not empty. E.g. Guido> "x:y".split(":") -> ["x", "y"], "x::y".split(":") -> ["x", "", "y"], Guido> ":".split(":") -> ["", ""]. OTOH split-on-whitespace doesn't behave Guido> this way; it extracts the non-empty non-whitespace-containing Guido> substrings.
In my feeble way of thinking I go from something which evaluates to false to something which doesn't. It's almost like making matter out of empty space:
bool("") -> False bool("".split()) -> False bool("".split("n")) -> True
Guido> If anything it's wrong, it's that they share the same name. This Guido> wasn't always the case. Do you really want to go back to .split() Guido> and .splitfields(sep)?
That might be preferable. The same method having such strikingly different behavior throws me every time I try splitting a possibly empty string with a non-whitespace character. It's a relatively uncommon case. Most of the time when you split a string with a non-whitespace character I think you know that the input can't be empty.
Skip
It looks like there are several behaviors involved in split, and you want to split those behaviors out. Behaviors of string split: 1. Split on white space chrs by giving no argument. This has the effect of splitting on multiple characters. Strings with multiple white space characters are not multiply split.
' '.split() [] ' \t\n'.split() []
2. Split on word by giving an argument. (A word can be one char.) In this case, the split is strict and does not combine/remove null string results.
' '.split(' ') ['', '', '', '', '', '', '', ''] ' \t\n'.split(' ') ['', '\t\n']
There doesn't seem to be an obvious way to split on different characters. A new to python programmer might try:
'1 (123) 456-7890'.split(' ()-') ['1 (123) 456-7890']
Expecting: ['1', '123', '456', '7890']
'1 (123) 456-7890'.split([' ', '(', ')', '-']) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: expected a character buffer object
When I needed to split on multiple chars other than the default white space, I have used .replace() to replace different splitting character with one single char sequence which I could then split on. It might be nice to have a .splitonchars() version of split with the default being whitespace chars, and an argument to specify other multiple characters to split on. The other behavior could be called .splitonwords(arg). The .splitonwords() method could possibly also accept a list of words. That leaves the possibility to leave the current .split() behavior alone and would not break current code. And alternately these could be functions in the string module. In that case the current .split() could just continue to exist as is. I find the name 'splitfields' to not be as intuitive as 'splitonwords' and 'splitonchars'. While both of those require more letters to type than split, they are more readable, and when you do need the capability of splitting on more than one char or word, they are far shorter and less prone to errors than rolling your own function. Ron

On Thu, Dec 11, 2008 at 4:58 PM, Ron Adam <rrr@ronadam.com> wrote:
There doesn't seem to be an obvious way to split on different characters.
A new to python programmer might try:
'1 (123) 456-7890'.split(' ()-') ['1 (123) 456-7890']
Expecting: ['1', '123', '456', '7890']
'1 (123) 456-7890'.split([' ', '(', ')', '-']) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: expected a character buffer object
re.split('[ ()-]', '1 (123) 456-7890') ['1', '', '123', '', '456', '7890'] re.split('[ ()-]+', '1 (123) 456-7890') ['1', '123', '456', '7890']
str.split() handles the simplest, most common cases. Let's not clutter it up with a bad[1] impersonation of regex. [1] And if you thought regex was ugly enough to begin with... -- Adam Olsen, aka Rhamphoryncus

Adam Olsen wrote:
On Thu, Dec 11, 2008 at 4:58 PM, Ron Adam <rrr@ronadam.com> wrote:
There doesn't seem to be an obvious way to split on different characters.
A new to python programmer might try:
'1 (123) 456-7890'.split(' ()-') ['1 (123) 456-7890']
Expecting: ['1', '123', '456', '7890']
'1 (123) 456-7890'.split([' ', '(', ')', '-']) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: expected a character buffer object
re.split('[ ()-]', '1 (123) 456-7890') ['1', '', '123', '', '456', '7890'] re.split('[ ()-]+', '1 (123) 456-7890') ['1', '123', '456', '7890']
str.split() handles the simplest, most common cases. Let's not clutter it up with a bad[1] impersonation of regex.
[1] And if you thought regex was ugly enough to begin with...
These examples was just what a "new" programmer might attempt. I have a feeling that most new programmers do not attempt regular expressions ie.. the re module, until sometime after they have learned the basics of python. Ron

Ron Adam writes:
These examples was just what a "new" programmer might attempt. I have a feeling that most new programmers do not attempt regular expressions ie.. the re module, until sometime after they have learned the basics of python.
Adding a str.split_on_any_of() would violate TOOWDTI, though. I think this is best addressed by an xref to re.split in the doc for str.split.

On Thu, Dec 11, 2008 at 7:32 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Ron Adam writes:
These examples was just what a "new" programmer might attempt. I have a feeling that most new programmers do not attempt regular expressions ie.. the re module, until sometime after they have learned the basics of python.
Adding a str.split_on_any_of() would violate TOOWDTI, though.
I think this is best addressed by an xref to re.split in the doc for str.split.
+1 -- Adam Olsen, aka Rhamphoryncus

From: "Adam Olsen" <rhamph@gmail.com>
str.split() handles the simplest, most common cases. Let's not clutter it up with a bad[1] impersonation of regex.
I concur and am -1 on *any* change to str.split(). It has been around for a very long time and is widely used. If there were any subtle change, even in 3.0, it would create migration problems that are very to diagnose and repair. Raymond

I think string.split(list) probably won't do what people expect either. Here's what I would expect it to do:
'1 (123) 456-7890'.split([' ', '(', ')', '-']) ['1', '', '123', '', '456', '7890']
but what you probably want is:
re.split(r'[ ()-]*', '1 (123) 456-7890') ['1', '123', '456', '7890']
using allows you to do that and avoids ambiguity about what it does. --- Bruce On Thu, Dec 11, 2008 at 3:58 PM, Ron Adam <rrr@ronadam.com> wrote:
skip@pobox.com wrote:
Guido> Which of the two would you choose for all? The empty string is the Guido> only reasonable behavior for split-with-argument, it is the logical Guido> consequence of how it behaves when the string is not empty. E.g. Guido> "x:y".split(":") -> ["x", "y"], "x::y".split(":") -> ["x", "", "y"], Guido> ":".split(":") -> ["", ""]. OTOH split-on-whitespace doesn't behave Guido> this way; it extracts the non-empty non-whitespace-containing Guido> substrings.
In my feeble way of thinking I go from something which evaluates to false to something which doesn't. It's almost like making matter out of empty space:
bool("") -> False bool("".split()) -> False bool("".split("n")) -> True
Guido> If anything it's wrong, it's that they share the same name. This Guido> wasn't always the case. Do you really want to go back to .split() Guido> and .splitfields(sep)?
That might be preferable. The same method having such strikingly different behavior throws me every time I try splitting a possibly empty string with a non-whitespace character. It's a relatively uncommon case. Most of the time when you split a string with a non-whitespace character I think you know that the input can't be empty.
Skip
It looks like there are several behaviors involved in split, and you want to split those behaviors out.
Behaviors of string split:
1. Split on white space chrs by giving no argument.
This has the effect of splitting on multiple characters. Strings with multiple white space characters are not multiply split.
' '.split() [] ' \t\n'.split() []
2. Split on word by giving an argument. (A word can be one char.)
In this case, the split is strict and does not combine/remove null string results.
' '.split(' ') ['', '', '', '', '', '', '', ''] ' \t\n'.split(' ') ['', '\t\n']
There doesn't seem to be an obvious way to split on different characters.
A new to python programmer might try:
'1 (123) 456-7890'.split(' ()-') ['1 (123) 456-7890']
Expecting: ['1', '123', '456', '7890']
'1 (123) 456-7890'.split([' ', '(', ')', '-']) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: expected a character buffer object
When I needed to split on multiple chars other than the default white space, I have used .replace() to replace different splitting character with one single char sequence which I could then split on.
It might be nice to have a .splitonchars() version of split with the default being whitespace chars, and an argument to specify other multiple characters to split on.
The other behavior could be called .splitonwords(arg). The .splitonwords() method could possibly also accept a list of words.
That leaves the possibility to leave the current .split() behavior alone and would not break current code.
And alternately these could be functions in the string module. In that case the current .split() could just continue to exist as is.
I find the name 'splitfields' to not be as intuitive as 'splitonwords' and 'splitonchars'. While both of those require more letters to type than split, they are more readable, and when you do need the capability of splitting on more than one char or word, they are far shorter and less prone to errors than rolling your own function.
Ron
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Bruce Leban wrote:
I think string.split(list) probably won't do what people expect either. Here's what I would expect it to do:
'1 (123) 456-7890'.split([' ', '(', ')', '-']) ['1', '', '123', '', '456', '7890']
but what you probably want is:
re.split(r'[ ()-]*', '1 (123) 456-7890') ['1', '123', '456', '7890']
using allows you to do that and avoids ambiguity about what it does.
--- Bruce
Without getting into regular expressions, it's easier to just allow adjacent char matches to act as one match so the following is true. longstring.splitchars(string.whitespace) = longstring.split()

-inf That breaks existing code in two different ways which I don't think makes it easy. it does NOT collapse adjacent characters: >>> "a&&b".split("&") ['a', '', 'b'] the separator it splits on is a string, not a character: >>> "a<b><c>d".split("><") ['a<b', 'c>d'] --- Bruce On Thu, Dec 11, 2008 at 4:38 PM, Ron Adam <rrr@ronadam.com> wrote:
Bruce Leban wrote:
I think string.split(list) probably won't do what people expect either. Here's what I would expect it to do:
'1 (123) 456-7890'.split([' ', '(', ')', '-']) ['1', '', '123', '', '456', '7890']
but what you probably want is:
re.split(r'[ ()-]*', '1 (123) 456-7890') ['1', '123', '456', '7890']
using allows you to do that and avoids ambiguity about what it does.
--- Bruce
Without getting into regular expressions, it's easier to just allow adjacent char matches to act as one match so the following is true.
longstring.splitchars(string.whitespace) = longstring.split()
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Bruce Leban wrote:
-inf
That breaks existing code in two different ways which I don't think makes it easy.
Correct, it would break existing code. Which is why it should have a different name rather than altering the existing split function.
it does NOT collapse adjacent characters: >>> "a&&b".split("&") ['a', '', 'b']
Also correct. But that is the behavior when splitting on the default white space. ie.. split() with no argument. ' '.split() is not the same as ' '.split(' '). Q: Would it be good to have a new method or function which extends the same behavior of whitespace splitting to other user specified characters? I would find it useful at times.
the separator it splits on is a string, not a character: >>> "a<b><c>d".split("><") ['a<b', 'c>d']
Yes, I know. To split on multiple chars in a given argument string it will need to be called something other than .split(). Such as .splitchars(), as in the example equality I gave. longstring.splitchars(string.whitespace) == longstring.split() Note: longstring.split() has no arguments. .split(arg) splits on a string as you stated.
--- Bruce
On Thu, Dec 11, 2008 at 4:38 PM, Ron Adam <rrr@ronadam.com <mailto:rrr@ronadam.com>> wrote:
Bruce Leban wrote:
I think string.split(list) probably won't do what people expect either. Here's what I would expect it to do:
>>> '1 (123) 456-7890'.split([' ', '(', ')', '-']) ['1', '', '123', '', '456', '7890']
but what you probably want is:
>>>re.split(r'[ ()-]*', '1 (123) 456-7890') ['1', '123', '456', '7890']
using allows you to do that and avoids ambiguity about what it does.
--- Bruce
Without getting into regular expressions, it's easier to just allow adjacent char matches to act as one match so the following is true.
longstring.splitchars(string.whitespace) = longstring.split()
_______________________________________________ Python-ideas mailing list Python-ideas@python.org <mailto:Python-ideas@python.org> http://mail.python.org/mailman/listinfo/python-ideas
------------------------------------------------------------------------
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Guido van Rossum wrote:
On Thu, Dec 11, 2008 at 6:18 AM, <skip@pobox.com> wrote:
If anything it's wrong, it's that they share the same name. This wasn't always the case. Do you really want to go back to .split() and .splitfields(sep)?
I hope not. I consider the current situation to be a definite improvement. I sometimes forgot which was which.

It seems to me like spliting an empty string is something that makes little sense to do, similar to dividing by zero in terms of an analogy. How about str.split, partition and friends just raise ValueError exception when the value is the empty string? Regards, Matt 2008/12/11 Terry Reedy <tjreedy@udel.edu>
Guido van Rossum wrote:
On Thu, Dec 11, 2008 at 6:18 AM, <skip@pobox.com> wrote:
If anything it's wrong, it's that they share the same name. This
wasn't always the case. Do you really want to go back to .split() and .splitfields(sep)?
I hope not. I consider the current situation to be a definite improvement. I sometimes forgot which was which.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

On Thu, Dec 11, 2008 at 12:11 PM, Matthew Russell <matt.horizon5@gmail.com> wrote:
It seems to me like spliting an empty string is something that makes little sense to do, similar to dividing by zero in terms of an analogy.
I guess you have never had the need. Let me assure you that you are mistaken. :-)
How about str.split, partition and friends just raise ValueError exception when the value is the empty string?
Absolutely not.
Regards, Matt
2008/12/11 Terry Reedy <tjreedy@udel.edu>
Guido van Rossum wrote:
On Thu, Dec 11, 2008 at 6:18 AM, <skip@pobox.com> wrote:
If anything it's wrong, it's that they share the same name. This wasn't always the case. Do you really want to go back to .split() and .splitfields(sep)?
I hope not. I consider the current situation to be a definite improvement. I sometimes forgot which was which.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

I think splitting an empty string is more like dividing zero in half. No one expects that to raise a value exception. --- Bruce On Thu, Dec 11, 2008 at 12:14 PM, Guido van Rossum <guido@python.org> wrote:
It seems to me like spliting an empty string is something that makes
On Thu, Dec 11, 2008 at 12:11 PM, Matthew Russell <matt.horizon5@gmail.com> wrote: little
sense to do, similar to dividing by zero in terms of an analogy.
I guess you have never had the need. Let me assure you that you are mistaken. :-)
How about str.split, partition and friends just raise ValueError exception when the value is the empty string?
Absolutely not.
Regards, Matt
2008/12/11 Terry Reedy <tjreedy@udel.edu>
Guido van Rossum wrote:
On Thu, Dec 11, 2008 at 6:18 AM, <skip@pobox.com> wrote:
If anything it's wrong, it's that they share the same name. This wasn't always the case. Do you really want to go back to .split() and .splitfields(sep)?
I hope not. I consider the current situation to be a definite improvement. I sometimes forgot which was which.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (home page: http://www.python.org/~guido/<http://www.python.org/%7Eguido/> ) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Sorry for wasting your brain power - I just read through some of my code and realised how stupid an idea it really was... I haven't even got the excuse of being remotely new to the language (or programming in general come to think of it) lesson(self): dont_post_at_end_of_day_without(thinking_a_lot_first) sorrly-mistaken-and-embarrassed Matt 2008/12/11 Bruce Leban <bruce@leapyear.org>
I think splitting an empty string is more like dividing zero in half. No one expects that to raise a value exception.
--- Bruce
On Thu, Dec 11, 2008 at 12:14 PM, Guido van Rossum <guido@python.org>wrote:
It seems to me like spliting an empty string is something that makes
On Thu, Dec 11, 2008 at 12:11 PM, Matthew Russell <matt.horizon5@gmail.com> wrote: little
sense to do, similar to dividing by zero in terms of an analogy.
I guess you have never had the need. Let me assure you that you are mistaken. :-)
How about str.split, partition and friends just raise ValueError exception when the value is the empty string?
Absolutely not.
Regards, Matt
2008/12/11 Terry Reedy <tjreedy@udel.edu>
Guido van Rossum wrote:
On Thu, Dec 11, 2008 at 6:18 AM, <skip@pobox.com> wrote:
If anything it's wrong, it's that they share the same name. This wasn't always the case. Do you really want to go back to .split() and .splitfields(sep)?
I hope not. I consider the current situation to be a definite improvement. I sometimes forgot which was which.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (home page: http://www.python.org/~guido/<http://www.python.org/%7Eguido/> ) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- Cheers, Matt
participants (10)
-
Adam Olsen
-
Bruce Leban
-
Greg Ewing
-
Guido van Rossum
-
Matthew Russell
-
Raymond Hettinger
-
Ron Adam
-
skip@pobox.com
-
Stephen J. Turnbull
-
Terry Reedy