I think string.split(list) probably won't do what people expect either. Here's what I would expect it to do:
It looks like there are several behaviors involved in split, and you want to split those behaviors out.
skip@pobox.com wrote:
Guido> Which of the two would you choose for all? The empty string is the
Guido> only reasonable behavior for split-with-argument, it is the logical
Guido> consequence of how it behaves when the string is not empty. E.g.
Guido> "x:y".split(":") -> ["x", "y"], "x::y".split(":") -> ["x", "", "y"],
Guido> ":".split(":") -> ["", ""]. OTOH split-on-whitespace doesn't behave
Guido> this way; it extracts the non-empty non-whitespace-containing
Guido> substrings.
In my feeble way of thinking I go from something which evaluates to false to
something which doesn't. It's almost like making matter out of empty space:
bool("") -> False
bool("".split()) -> False
bool("".split("n")) -> True
Guido> If anything it's wrong, it's that they share the same name. This
Guido> wasn't always the case. Do you really want to go back to .split()
Guido> and .splitfields(sep)?
That might be preferable. The same method having such strikingly different
behavior throws me every time I try splitting a possibly empty string with a
non-whitespace character. It's a relatively uncommon case. Most of the
time when you split a string with a non-whitespace character I think you
know that the input can't be empty.
Skip
Behaviors of string split:
1. Split on white space chrs by giving no argument.
This has the effect of splitting on multiple characters. Strings with multiple white space characters are not multiply split.
>>> ' '.split()
[]
>>> ' \t\n'.split()
[]
2. Split on word by giving an argument. (A word can be one char.)
In this case, the split is strict and does not combine/remove null string results.
>>> ' '.split(' ')
['', '', '', '', '', '', '', '']
>>> ' \t\n'.split(' ')
['', '\t\n']
There doesn't seem to be an obvious way to split on different characters.
A new to python programmer might try:
>>> '1 (123) 456-7890'.split(' ()-')
['1 (123) 456-7890']
Expecting: ['1', '123', '456', '7890']
>>> '1 (123) 456-7890'.split([' ', '(', ')', '-'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: expected a character buffer object
When I needed to split on multiple chars other than the default white space, I have used .replace() to replace different splitting character with one single char sequence which I could then split on.
It might be nice to have a .splitonchars() version of split with the default being whitespace chars, and an argument to specify other multiple characters to split on.
The other behavior could be called .splitonwords(arg). The .splitonwords() method could possibly also accept a list of words.
That leaves the possibility to leave the current .split() behavior alone and would not break current code.
And alternately these could be functions in the string module. In that case the current .split() could just continue to exist as is.
I find the name 'splitfields' to not be as intuitive as 'splitonwords' and 'splitonchars'. While both of those require more letters to type than split, they are more readable, and when you do need the capability of splitting on more than one char or word, they are far shorter and less prone to errors than rolling your own function.
Ron
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
http://mail.python.org/mailman/listinfo/python-ideas