[Python-ideas] str.split() oddness
Terry Reedy
tjreedy at udel.edu
Mon Feb 28 01:11:47 CET 2011
On 2/27/2011 5:18 PM, Mart Sõmermaa wrote:
> On Sat, Feb 26, 2011 at 11:31 PM, Arnaud Delobelle<arnodel-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org> wrote:
>> On 26 February 2011 14:03, Mart Sõmermaa<mrts.pydev-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org> wrote:
>>> IMHO, x.join(a).split(x) should be
[invertible)
>>> with respect to a.
>
> Terry, thanks for pointing out that as
> string_not_containing_sep.split(sep) == [string_not_containing_sep],
> therefore
> ''.split('b') == [''].
Let me generalize this as follows:
len(s.split(c)) == s.count(c)+1
and specialize this as follows:
(n*c).split(c) == (n+1)*['']
> That's the gist of it.
That, and the fact the .join is not 1 to 1 and therefore inherently not
completely invertible, despite your wishes that it be so.
> I would like to question that reasoning though.
Even though it is coherent and sound? Why?
> '' (the
> empty string) is "nothing", the zero element [2] of strings.
So what. That is no reason in itself to break the general pattern.
> The problem is that it is treated as "something".
In what sense? Of course, it is a string object.
> I would say that precisely because it is the zero element,
> ''.split('b')
> should read
> "applying the split operator with any argument to the zero
> element of strings results in the zero element of lists"
Sorry, I do not see that all all. This ad hoc special case rule
1. makes no particular sense to me, except to produce the result you want;
2. breaks the invariant above, and all special cases thereof;'
3. requires the addition of a special case in the algorithm;
4. causes << 'x'.join(['']).split('x') == [''] >> to because False, when
you say it should be True, as it is now.
> and therefore
> ''.split('b') == ''.split() == []
> (like in Ruby).
> Knowing that reasoning
I do not see any reasoning other that 'do what Ruby does'.
Why did Ruby change? Really thought out? or accident?
> and the inconvenient special casing that it causes in actual code,
I do not remember even one example, let alone a broad survey of use cases.
> would you still design split() as
> ''.split('b') == [''] today?
I did not design it, but as you can guess from the above...
yes.
What I might change today is to make split lazy by returning an
interator rather than a list. Otherwise, the definition of s.split(c) as
s split at each occurence of c is quite coherent and without need of an
arbitrary special case.
I see this as somewhat similar to 0**0==1 resulting from a uniform
coherent rule: for n a count, x**n is 1 multiplied by x n times.
Whereas some claim that it should be special cased as 0 or disallowed.
--
Terry Jan Reedy
More information about the Python-ideas
mailing list