[Python-ideas] This seems like a wart to me...
rrr at ronadam.com
Fri Dec 12 00:58:38 CET 2008
skip at pobox.com wrote:
> Guido> Which of the two would you choose for all? The empty string is the
> Guido> only reasonable behavior for split-with-argument, it is the logical
> Guido> consequence of how it behaves when the string is not empty. E.g.
> Guido> "x:y".split(":") -> ["x", "y"], "x::y".split(":") -> ["x", "", "y"],
> Guido> ":".split(":") -> ["", ""]. OTOH split-on-whitespace doesn't behave
> Guido> this way; it extracts the non-empty non-whitespace-containing
> Guido> substrings.
> In my feeble way of thinking I go from something which evaluates to false to
> something which doesn't. It's almost like making matter out of empty space:
> bool("") -> False
> bool("".split()) -> False
> bool("".split("n")) -> True
> Guido> If anything it's wrong, it's that they share the same name. This
> Guido> wasn't always the case. Do you really want to go back to .split()
> Guido> and .splitfields(sep)?
> That might be preferable. The same method having such strikingly different
> behavior throws me every time I try splitting a possibly empty string with a
> non-whitespace character. It's a relatively uncommon case. Most of the
> time when you split a string with a non-whitespace character I think you
> know that the input can't be empty.
It looks like there are several behaviors involved in split, and you want
to split those behaviors out.
Behaviors of string split:
1. Split on white space chrs by giving no argument.
This has the effect of splitting on multiple characters. Strings with
multiple white space characters are not multiply split.
>>> ' '.split()
>>> ' \t\n'.split()
2. Split on word by giving an argument. (A word can be one char.)
In this case, the split is strict and does not combine/remove null string
>>> ' '.split(' ')
['', '', '', '', '', '', '', '']
>>> ' \t\n'.split(' ')
There doesn't seem to be an obvious way to split on different characters.
A new to python programmer might try:
>>> '1 (123) 456-7890'.split(' ()-')
['1 (123) 456-7890']
Expecting: ['1', '123', '456', '7890']
>>> '1 (123) 456-7890'.split([' ', '(', ')', '-'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: expected a character buffer object
When I needed to split on multiple chars other than the default white
space, I have used .replace() to replace different splitting character with
one single char sequence which I could then split on.
It might be nice to have a .splitonchars() version of split with the
default being whitespace chars, and an argument to specify other multiple
characters to split on.
The other behavior could be called .splitonwords(arg). The .splitonwords()
method could possibly also accept a list of words.
That leaves the possibility to leave the current .split() behavior alone
and would not break current code.
And alternately these could be functions in the string module. In that
case the current .split() could just continue to exist as is.
I find the name 'splitfields' to not be as intuitive as 'splitonwords' and
'splitonchars'. While both of those require more letters to type than
split, they are more readable, and when you do need the capability of
splitting on more than one char or word, they are far shorter and less
prone to errors than rolling your own function.
More information about the Python-ideas