Behaviour of str.split

Bengt Richter bokr at oz.net
Wed Apr 20 06:19:24 EDT 2005


On Wed, 20 Apr 2005 10:55:18 +0200, David Fraser <davidf at sjsoft.com> wrote:

>Greg Ewing wrote:
>> Will McGugan wrote:
>> 
>>> Hi,
>>>
>>> I'm curious about the behaviour of the str.split() when applied to 
>>> empty strings.
>>>
>>> "".split() returns an empty list, however..
>>>
>>> "".split("*") returns a list containing one empty string.
>> 
>> 
>> Both of these make sense as limiting cases.
>> 
>> Consider
>> 
>>  >>> "a b c".split()
>> ['a', 'b', 'c']
>>  >>> "a b".split()
>> ['a', 'b']
>>  >>> "a".split()
>> ['a']
>>  >>> "".split()
>> []
>> 
>> and
>> 
>>  >>> "**".split("*")
>> ['', '', '']
>>  >>> "*".split("*")
>> ['', '']
>>  >>> "".split("*")
>> ['']
>> 
>> The split() method is really doing two somewhat different things
>> depending on whether it is given an argument, and the end-cases
>> come out differently.
>> 
>You don't really explain *why* they make sense as limiting cases, as 
>your examples are quite different.
>
>Consider
> >>> "a*b*c".split("*")
>['a', 'b', 'c']
> >>> "a*b".split("*")
>['a', 'b']
> >>> "a".split("*")
>['a']
> >>> "".split("*")
>['']
>
>Now how is this logical when compared with split() above?

The trouble is that s.split(arg) and s.split() are two different functions.

The first is 1:1 and reversible like arg.join(s.split(arg))==s
The second is not 1:1 nor reversible: '<<various whitespace>>'.join(s.split()) == s ?? Not usually.

I think you can do it with the equivalent whitespace regex, preserving the splitout whitespace
substrings and ''.joining those back with the others, but not with split(). I.e.,

 >>> def splitjoin(s, splitter=None):
 ...     return (splitter is None and '<<whitespace>>' or splitter).join(s.split(splitter))
 ...
 >>> splitjoin('a*b*c', '*')
 'a*b*c'
 >>> splitjoin('a*b', '*')
 'a*b'
 >>> splitjoin('a', '*')
 'a'
 >>> splitjoin('', '*')
 ''
 >>> splitjoin('a b    c')
 'a<<whitespace>>b<<whitespace>>c'
 >>> splitjoin('a b    ')
 'a<<whitespace>>b'
 >>> splitjoin('  b    ')
 'b'
 >>> splitjoin('')
 ''

 >>> splitjoin('*****','*')
 '*****'
Note why that works:

 >>> '*****'.split('*')
 ['', '', '', '', '', '']
 >>> '*a'.split('*')
 ['', 'a']
 >>> 'a*'.split('*')
 ['a', '']

 >>> splitjoin('*a','*')
 '*a'
 >>> splitjoin('a*','*')
 'a*'

Regards,
Bengt Richter



More information about the Python-list mailing list