Behaviour of str.split
Bengt Richter
bokr at oz.net
Wed Apr 20 06:19:24 EDT 2005
On Wed, 20 Apr 2005 10:55:18 +0200, David Fraser <davidf at sjsoft.com> wrote:
>Greg Ewing wrote:
>> Will McGugan wrote:
>>
>>> Hi,
>>>
>>> I'm curious about the behaviour of the str.split() when applied to
>>> empty strings.
>>>
>>> "".split() returns an empty list, however..
>>>
>>> "".split("*") returns a list containing one empty string.
>>
>>
>> Both of these make sense as limiting cases.
>>
>> Consider
>>
>> >>> "a b c".split()
>> ['a', 'b', 'c']
>> >>> "a b".split()
>> ['a', 'b']
>> >>> "a".split()
>> ['a']
>> >>> "".split()
>> []
>>
>> and
>>
>> >>> "**".split("*")
>> ['', '', '']
>> >>> "*".split("*")
>> ['', '']
>> >>> "".split("*")
>> ['']
>>
>> The split() method is really doing two somewhat different things
>> depending on whether it is given an argument, and the end-cases
>> come out differently.
>>
>You don't really explain *why* they make sense as limiting cases, as
>your examples are quite different.
>
>Consider
> >>> "a*b*c".split("*")
>['a', 'b', 'c']
> >>> "a*b".split("*")
>['a', 'b']
> >>> "a".split("*")
>['a']
> >>> "".split("*")
>['']
>
>Now how is this logical when compared with split() above?
The trouble is that s.split(arg) and s.split() are two different functions.
The first is 1:1 and reversible like arg.join(s.split(arg))==s
The second is not 1:1 nor reversible: '<<various whitespace>>'.join(s.split()) == s ?? Not usually.
I think you can do it with the equivalent whitespace regex, preserving the splitout whitespace
substrings and ''.joining those back with the others, but not with split(). I.e.,
>>> def splitjoin(s, splitter=None):
... return (splitter is None and '<<whitespace>>' or splitter).join(s.split(splitter))
...
>>> splitjoin('a*b*c', '*')
'a*b*c'
>>> splitjoin('a*b', '*')
'a*b'
>>> splitjoin('a', '*')
'a'
>>> splitjoin('', '*')
''
>>> splitjoin('a b c')
'a<<whitespace>>b<<whitespace>>c'
>>> splitjoin('a b ')
'a<<whitespace>>b'
>>> splitjoin(' b ')
'b'
>>> splitjoin('')
''
>>> splitjoin('*****','*')
'*****'
Note why that works:
>>> '*****'.split('*')
['', '', '', '', '', '']
>>> '*a'.split('*')
['', 'a']
>>> 'a*'.split('*')
['a', '']
>>> splitjoin('*a','*')
'*a'
>>> splitjoin('a*','*')
'a*'
Regards,
Bengt Richter
More information about the Python-list
mailing list