join and split with empty delimiter
Ben Bacarisse
ben.usenet at bsb.me.uk
Thu Jul 18 15:52:33 EDT 2019
Danilo Coccia <daniloco at acm.org> writes:
> Il 18/07/2019 12:27, Ben Bacarisse ha scritto:
>> Irv Kalb <Irv at furrypants.com> writes:
>>
>>> I have always thought that split and join are opposite functions. For
>>> example, you can use a comma as a delimiter:
>>>
>>>>>> myList = ['a', 'b', 'c', 'd', 'e']
>>>>>> myString = ','.join(myList)
>>>>>> print(myString)
>>> a,b,c,d,e
>>>
>>>>>> myList = myString.split(',')
>>>>>> print(myList)
>>> ['a', 'b', 'c', 'd', 'e']
>>>
>>> Works great.
>>
>> Note that join and split do not always recover the same list:
>>
>>>>> ','.join(['a', 'b,c', 'd']).split(',')
>> ['a', 'b', 'c', 'd']
>>
>> You don't even have to have the delimiter in one of the strings:
>>
>>>>> '//'.join(['a', 'b/', 'c']).split('//')
>> ['a', 'b', '/c']
>>
>>> But i've found a case where they don't work that way. If
>>> I join the list with the empty string as the delimiter:
>>>
>>>>>> myList = ['a', 'b', 'c', 'd']
>>>>>> myString = ''.join(myList)
>>>>>> print(myString)
>>> abcd
>>>
>>> That works great. But attempting to split using the empty string
>>> generates an error:
>>>
>>>>>> myString.split('')
>>> Traceback (most recent call last):
>>> File "<pyshell#9>", line 1, in <module>
>>> myString.split('')
>>> ValueError: empty separator
>>>
>>> I know that this can be accomplished using the list function:
>>>
>>>>>> myString = list(myString)
>>>>>> print(myString)
>>> ['a', 'b', 'c', 'd']
>>>
>>> But my question is: Is there any good reason why the split function
>>> should give an "empty separator" error? I think the meaning of trying
>>> to split a string into a list using the empty string as a delimiter is
>>> unambiguous - it should just create a list of single characters
>>> strings like the list function does here.
>>
>> One reason might be that str.split('') is not unambiguous. For example,
>> there's a case to be made that there is a '' delimiter at the start and
>> the end of the string as well as between letters. '' is a very special
>> delimiter because every string that gets joined using it includes it!
>> It's a wild version of ','.join(['a', 'b,c', 'd']).split(',').
>>
>> Of course str.split('') could be defined to work the way you expect, but
>> it's possible that the error is there to prompt the programmer to be
>> more explicit.
>
> It is even more ambiguous if you consider that any string starts with an
> infinite number of empty strings, followed by a character, followed by
> an infinite number of empty strings, followed by ...
> The result wouldn't fit on screen, or in memory for that!
Right, but that can be finessed by saying that two delimiters can't
overlap, which is the usual rule. A reasonable interpretation of "not
overlapping" might well exclude having more the one delimiter in the
same place.
--
Ben.
More information about the Python-list
mailing list