[Python-ideas] str.split() oddness

Terry Reedy tjreedy at udel.edu
Mon Feb 28 01:11:47 CET 2011


On 2/27/2011 5:18 PM, Mart Sõmermaa wrote:
> On Sat, Feb 26, 2011 at 11:31 PM, Arnaud Delobelle<arnodel-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org>  wrote:
>> On 26 February 2011 14:03, Mart Sõmermaa<mrts.pydev-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org>  wrote:
>>> IMHO, x.join(a).split(x) should be
[invertible)
>>> with respect to a.
>
> Terry, thanks for pointing out that as
>    string_not_containing_sep.split(sep) == [string_not_containing_sep],
> therefore
>   ''.split('b') == [''].

Let me generalize this as follows:
   len(s.split(c)) == s.count(c)+1
and specialize this as follows:
   (n*c).split(c) == (n+1)*['']

> That's the gist of it.

That, and the fact the .join is not 1 to 1 and therefore inherently not 
completely invertible, despite your wishes that it be so.

> I would like to question that reasoning though.

Even though it is coherent and sound? Why?

 > '' (the
> empty string) is "nothing", the zero element [2] of strings.

So what. That is no reason in itself to break the general pattern.

> The problem is that it is treated as "something".

In what sense? Of course, it is a string object.

 > I would say that precisely because it is the zero element,
>    ''.split('b')
> should read
>    "applying the split operator with any argument to the zero
>     element of strings results in the zero element of lists"

Sorry, I do not see that all all. This ad hoc special case rule
1. makes no particular sense to me, except to produce the result you want;
2. breaks the invariant above, and all special cases thereof;'
3. requires the addition of a special case in the algorithm;
4. causes << 'x'.join(['']).split('x') == [''] >> to because False, when 
you say it should be True, as it is now.

> and therefore
>    ''.split('b') == ''.split() == []
> (like in Ruby).

> Knowing that reasoning

I do not see any reasoning other that 'do what Ruby does'.
Why did Ruby change? Really thought out? or accident?

 > and the inconvenient special casing that it causes in actual code,

I do not remember even one example, let alone a broad survey of use cases.

 > would you still design split() as
> ''.split('b') == [''] today?

I did not design it, but as you can guess from the above...
yes.

What I might change today is to make split lazy by returning an 
interator rather than a list. Otherwise, the definition of s.split(c) as 
s split at each occurence of c is quite coherent and without need of an 
arbitrary special case.

I see this as somewhat similar to 0**0==1 resulting from a uniform 
coherent rule: for n a count, x**n is 1 multiplied by x n times.
Whereas some claim that it should be special cased as 0 or disallowed.

-- 
Terry Jan Reedy





More information about the Python-ideas mailing list