[Python-ideas] This seems like a wart to me...

Bruce Leban bruce at leapyear.org
Fri Dec 12 01:23:52 CET 2008


I think string.split(list) probably won't do what people expect either.
Here's what I would expect it to do:

>>> '1 (123) 456-7890'.split([' ', '(', ')', '-'])
['1', '', '123', '', '456', '7890']

but what you probably want is:

>>>re.split(r'[ ()-]*', '1 (123) 456-7890')
['1', '123', '456', '7890']

using allows you to do that and avoids ambiguity about what it does.

--- Bruce

On Thu, Dec 11, 2008 at 3:58 PM, Ron Adam <rrr at ronadam.com> wrote:

>
>
> skip at pobox.com wrote:
>
>>    Guido> Which of the two would you choose for all? The empty string is
>> the
>>    Guido> only reasonable behavior for split-with-argument, it is the
>> logical
>>    Guido> consequence of how it behaves when the string is not empty. E.g.
>>    Guido> "x:y".split(":") -> ["x", "y"], "x::y".split(":") -> ["x", "",
>> "y"],
>>    Guido> ":".split(":") -> ["", ""]. OTOH split-on-whitespace doesn't
>> behave
>>    Guido> this way; it extracts the non-empty non-whitespace-containing
>>    Guido> substrings.
>>
>> In my feeble way of thinking I go from something which evaluates to false
>> to
>> something which doesn't. It's almost like making matter out of empty
>> space:
>>
>>    bool("") -> False
>>    bool("".split()) -> False
>>    bool("".split("n")) -> True
>>
>>    Guido> If anything it's wrong, it's that they share the same name. This
>>    Guido> wasn't always the case. Do you really want to go back to
>> .split()
>>    Guido> and .splitfields(sep)?
>>
>> That might be preferable.  The same method having such strikingly
>> different
>> behavior throws me every time I try splitting a possibly empty string with
>> a
>> non-whitespace character.  It's a relatively uncommon case.  Most of the
>> time when you split a string with a non-whitespace character I think you
>> know that the input can't be empty.
>>
>> Skip
>>
>
>
> It looks like there are several behaviors involved in split, and you want
> to split those behaviors out.
>
>
>
> Behaviors of string split:
>
>
> 1. Split on white space chrs by giving no argument.
>
> This has the effect of splitting on multiple characters. Strings with
> multiple white space characters are not multiply split.
>
> >>> '       '.split()
> []
> >>> ' \t\n'.split()
> []
>
>
>
> 2. Split on word by giving an argument. (A word can be one char.)
>
> In this case, the split is strict and does not combine/remove null string
> results.
>
> >>> '       '.split(' ')
> ['', '', '', '', '', '', '', '']
> >>> ' \t\n'.split(' ')
> ['', '\t\n']
>
>
> There doesn't seem to be an obvious way to split on different characters.
>
>
> A new to python programmer might try:
>
> >>> '1 (123) 456-7890'.split(' ()-')
> ['1 (123) 456-7890']
>
> Expecting: ['1', '123', '456', '7890']
>
>
> >>> '1 (123) 456-7890'.split([' ', '(', ')', '-'])
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> TypeError: expected a character buffer object
>
>
> When I needed to split on multiple chars other than the default white
> space, I have used .replace() to replace different splitting character with
> one single char sequence which I could then split on.
>
>
> It might be nice to have a .splitonchars() version of split with the
> default being whitespace chars, and an argument to specify other multiple
> characters to split on.
>
> The other behavior could be called .splitonwords(arg). The .splitonwords()
> method could possibly also accept a list of words.
>
>
> That leaves the possibility to leave the current .split() behavior alone
> and would not break current code.
>
> And alternately these could be functions in the string module.  In that
> case the current .split() could just continue to exist as is.
>
> I find the name 'splitfields' to not be as intuitive as 'splitonwords' and
> 'splitonchars'.   While both of those require more letters to type than
> split, they are more readable, and when you do need the capability of
> splitting on more than one char or word, they are far shorter and less prone
> to errors than rolling your own function.
>
> Ron
>
>
>
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20081211/26ce4db0/attachment.html>


More information about the Python-ideas mailing list