skip@pobox.com wrote:
Guido> Which of the two would you choose for all? The empty string is the Guido> only reasonable behavior for split-with-argument, it is the logical Guido> consequence of how it behaves when the string is not empty. E.g. Guido> "x:y".split(":") -> ["x", "y"], "x::y".split(":") -> ["x", "", "y"], Guido> ":".split(":") -> ["", ""]. OTOH split-on-whitespace doesn't behave Guido> this way; it extracts the non-empty non-whitespace-containing Guido> substrings.
In my feeble way of thinking I go from something which evaluates to false to something which doesn't. It's almost like making matter out of empty space:
bool("") -> False bool("".split()) -> False bool("".split("n")) -> True
Guido> If anything it's wrong, it's that they share the same name. This Guido> wasn't always the case. Do you really want to go back to .split() Guido> and .splitfields(sep)?
That might be preferable. The same method having such strikingly different behavior throws me every time I try splitting a possibly empty string with a non-whitespace character. It's a relatively uncommon case. Most of the time when you split a string with a non-whitespace character I think you know that the input can't be empty.
Skip
It looks like there are several behaviors involved in split, and you want to split those behaviors out. Behaviors of string split: 1. Split on white space chrs by giving no argument. This has the effect of splitting on multiple characters. Strings with multiple white space characters are not multiply split.
' '.split() [] ' \t\n'.split() []
2. Split on word by giving an argument. (A word can be one char.) In this case, the split is strict and does not combine/remove null string results.
' '.split(' ') ['', '', '', '', '', '', '', ''] ' \t\n'.split(' ') ['', '\t\n']
There doesn't seem to be an obvious way to split on different characters. A new to python programmer might try:
'1 (123) 456-7890'.split(' ()-') ['1 (123) 456-7890']
Expecting: ['1', '123', '456', '7890']
'1 (123) 456-7890'.split([' ', '(', ')', '-']) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: expected a character buffer object
When I needed to split on multiple chars other than the default white space, I have used .replace() to replace different splitting character with one single char sequence which I could then split on. It might be nice to have a .splitonchars() version of split with the default being whitespace chars, and an argument to specify other multiple characters to split on. The other behavior could be called .splitonwords(arg). The .splitonwords() method could possibly also accept a list of words. That leaves the possibility to leave the current .split() behavior alone and would not break current code. And alternately these could be functions in the string module. In that case the current .split() could just continue to exist as is. I find the name 'splitfields' to not be as intuitive as 'splitonwords' and 'splitonchars'. While both of those require more letters to type than split, they are more readable, and when you do need the capability of splitting on more than one char or word, they are far shorter and less prone to errors than rolling your own function. Ron