Bug or feature? 'abc'.split('') rejects empty separator

David C. Ullrich ullrich at math.okstate.edu
Mon Feb 11 19:51:28 CET 2002

On Sun, 10 Feb 2002 18:05:01 -0500, "Tim Peters" <tim.one at home.com>

>[Bengt Richter]
>>  >>> 'abc'.split('')
>>  Traceback (most recent call last):
>>    File "<stdin>", line 1, in ?
>>  ValueError: empty separator
>> Wouldn't it make sense to return list('abc') ?
>[Neil Schemenauer]
>> It would also make sense to return ['a', 'b', 'c'].
>Well, that's what list('abc') does return, so you're in violent agreement.
>However, this still stands:
>> Since it's not obvious what you want Python raises an error.
>Indeed, my first thought was "OK, if it has to return *something*, then
>since we're asking it to split on nothing, it shouldn't split at all:
>    ['abc']
>is what it should return." 

Makes more sense to me for it to be an error - deciding
what it "should" return seems like deciding what 1/0
should return.

I'm suprised nobody's said that what it should return
is an infinite list of empty strings (which would be
inconvenient, so an error is better). That seems to
me to be "obviously" what it "really is" - when I
think about how to write a "split" in the "obvious"
way that's what comes out:

def split(data, sep):
  res = []
  pos = 0
  tok = ''
  while pos < len(data):
    if data[pos:pos+len(sep)] == sep:
      tok = ''
      pos = pos + len(sep)
      tok = tok + data[pos]
      pos = pos + 1
  return res

I don't know what re.split does. What's "natural"
code for a split function that returns something
reasonable when the separator is ''? Without

> That's also compatible with
>    ''.join[['abc']] == 'abc'
>People who think '' should split "everywhere" instead of "nowhere" should
>really be arguing for 'abc'.split('') to return
>    ['', 'a', 'b', 'c', '']
>instead, since in any sense that '' could be said to "match", it matches at
>4 slice positions in 'abc', not 2.

David C. Ullrich

More information about the Python-list mailing list