On 2/27/2011 5:18 PM, Mart Sõmermaa wrote:
On Sat, Feb 26, 2011 at 11:31 PM, Arnaud Delobelle<arnodel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
On 26 February 2011 14:03, Mart Sõmermaa<mrts.pydev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
IMHO, x.join(a).split(x) should be [invertible) with respect to a.
Terry, thanks for pointing out that as string_not_containing_sep.split(sep) == [string_not_containing_sep], therefore ''.split('b') == [''].
Let me generalize this as follows: len(s.split(c)) == s.count(c)+1 and specialize this as follows: (n*c).split(c) == (n+1)*['']
That's the gist of it.
That, and the fact the .join is not 1 to 1 and therefore inherently not completely invertible, despite your wishes that it be so.
I would like to question that reasoning though.
Even though it is coherent and sound? Why?
'' (the empty string) is "nothing", the zero element [2] of strings.
So what. That is no reason in itself to break the general pattern.
The problem is that it is treated as "something".
In what sense? Of course, it is a string object.
I would say that precisely because it is the zero element, ''.split('b') should read "applying the split operator with any argument to the zero element of strings results in the zero element of lists"
Sorry, I do not see that all all. This ad hoc special case rule 1. makes no particular sense to me, except to produce the result you want; 2. breaks the invariant above, and all special cases thereof;' 3. requires the addition of a special case in the algorithm; 4. causes << 'x'.join(['']).split('x') == [''] >> to because False, when you say it should be True, as it is now.
and therefore ''.split('b') == ''.split() == [] (like in Ruby).
Knowing that reasoning
I do not see any reasoning other that 'do what Ruby does'. Why did Ruby change? Really thought out? or accident?
and the inconvenient special casing that it causes in actual code,
I do not remember even one example, let alone a broad survey of use cases.
would you still design split() as ''.split('b') == [''] today?
I did not design it, but as you can guess from the above... yes. What I might change today is to make split lazy by returning an interator rather than a list. Otherwise, the definition of s.split(c) as s split at each occurence of c is quite coherent and without need of an arbitrary special case. I see this as somewhat similar to 0**0==1 resulting from a uniform coherent rule: for n a count, x**n is 1 multiplied by x n times. Whereas some claim that it should be special cased as 0 or disallowed. -- Terry Jan Reedy