[Python-ideas] str.split() oddness

Terry Reedy tjreedy at udel.edu
Mon Mar 7 03:07:36 CET 2011

On 3/6/2011 1:32 PM, Mart Sõmermaa wrote:

> On Mon, Feb 28, 2011 at 2:13 AM, Guido van Rossum<guido-+ZN9ApsXKcEdnm+yROfE0A at public.gmane.org>  wrote:

Two minutes before that, I posted a more extensive reply and refutation 
that you have not replied to.

> But let me remind that the behaviour of foo.split(x) where
> foo is not an empty string is not questioned at all, only
> behaviour when splitting the empty string is.
>                Python           Ruby
> join1     [''] =>  ''        [''] =>  ''
> join2     [  ] =>  ''        [  ] =>  ''
>                Python           Ruby
> split      ['']<= ''        [  ]<= ''
> As you can see, join1 and join2 are identical in both
> languages. Python has chosen to make split the inverse of
> join1, Ruby, on the other hand, the inverse of join2.
>> In Python the generalization is that since "xx".split(",") is ["xx"],
>> and "x",split(",") is ["x"], it naturally follows that "".split(",")
>> is [""].

Which I wrote as:   (n*c).split(c) == (n+1)*['']
The generalization: len(s.split(c)) == s.count(c)+1

You want to change these into

(n*c).split(c) == (n+1)*[''] if n else []
len(s.split(c)) == s.count(c)+1 if s else 0

which is to say, you want to add an easily forgotten conditional and 
alternative to definition of split.

> That is one line of reasoning that emphasizes the
> "string-nature" of ''.

I do not see that particularly. I emphasize the algorithmic nature of 
functions and prefer simpler definitions/algorithms to more complicated 
ones with unnecessary special cases.

> However, I myself, the Ruby folks and Nick would rather
> emphasize the "zero-element-nature" [1] of ''.

Which says nothing it itself. Saying that one member of the domain of a 
function is the identify element under some particular operation 
(concatenation, in this case) says nothing about what that member should 
be mapped to by any particular function.

You seem to emphasize the mapping (set of ordered pairs) nature of 
functions and are hence willing to change one of the mappings (ordered 
pairs) without regard to its relation to all the other pairs. This is a 
consequence of the set view, which by itself denies any relation between 
its members (the mapping pairs).

> "Applying the split operator to the zero element of
> strings should result in the zero element of lists"

To repeat, 'should' has no justification; it is just hand waving.

Would you really say that every function should map identities to 
identities (and what if domain and range have more than one)? I hope 
not. Would you even say that every *string* function should map '' to 
the identity elememt of the range set? Or more specifically, should 
every string->list function map '' to []? Nonsense. It depends on the 

To also repeat, if split produced an iterable, then there would be no 
'zero element of lists' to talk about.

Anyway, it is a moot point as change would break code.

> The general problem stems from the fact that my initial
> expectation that
>   f_a(x) = x.join(a).split(x), where x in lists, a in strings
> should be an identity function can not be satisfied as join
> is non-injective (because of the surjective example above).

Since I was the first to point this out, I am glad you now agree.

Terry Jan Reedy

More information about the Python-ideas mailing list