[Python-ideas] str.split() oddness
Terry Reedy
tjreedy at udel.edu
Sat Feb 26 19:52:19 CET 2011
On 2/26/2011 9:03 AM, Mart Sõmermaa wrote:
> IMHO, x.join(a).split(x) should be "idempotent"
> in regard to a.
Given that x.join is *not* 1 to 1,
>>> 'a'.join([])
''
>>> 'a'.join([''])
''
it cannot have an inverse for all outputs.
In particular, ''.split('a') cannot be both [] and [''].
This could only be fixed by changing the definition of join to not allow
joining on [], but that would not be convenient. I believe joining is
otherwise 1 to 1 and invertible for non-empty lists.
Of course, join input a can be any iterable of strings, whereas split
produces a list, so your equality test can only work for list inputs
unless generalized to c.join(a).split(c) == list(a).
''.split('a') == [''], not [], by the definition of s.split(c):
a list of pieces of s that were previously joined by c.
In particular, string_not_containing_sep.split(sep) ==
[string_not_containing_sep].
Note that empty pieces are inserted for repeated seps so that splitting
on seps (unlike splitting on 'whitespace') *is* 1 to 1.
'abc'.split('b') == ['a','c']
'abbc'.split('b') == ['a','','c']
(whereas 'a c'.split() and 'a c'.split() are both ['a','c'])
Therefore, sep splitting does have an inverse:
c.join(s.split(c)) == s
The doc for str.split specifies the above and makes clear that splitting
with and without a separator are slightly different functions.
>>>> assert ' '.join(foo).split() == foo
You have pulled a fast one here. ' ' does not equal 'whitespace' ;-)
If x in your original expression is nothing (to indicate 'whitespace'),
then your desired equality becomes
.join(a).split() == a
which is not legal ;-).
Some of the above is a rewording and expansion upon what Joao already
said, which was all correct.
--
Terry Jan Reedy
More information about the Python-ideas
mailing list