[Python-ideas] str.split() oddness
tjreedy at udel.edu
Mon Mar 7 03:07:36 CET 2011
On 3/6/2011 1:32 PM, Mart Sõmermaa wrote:
> On Mon, Feb 28, 2011 at 2:13 AM, Guido van Rossum<guido-+ZN9ApsXKcEdnm+yROfE0A at public.gmane.org> wrote:
Two minutes before that, I posted a more extensive reply and refutation
that you have not replied to.
> But let me remind that the behaviour of foo.split(x) where
> foo is not an empty string is not questioned at all, only
> behaviour when splitting the empty string is.
> Python Ruby
> join1 [''] => '' [''] => ''
> join2 [ ] => '' [ ] => ''
> Python Ruby
> split ['']<= '' [ ]<= ''
> As you can see, join1 and join2 are identical in both
> languages. Python has chosen to make split the inverse of
> join1, Ruby, on the other hand, the inverse of join2.
>> In Python the generalization is that since "xx".split(",") is ["xx"],
>> and "x",split(",") is ["x"], it naturally follows that "".split(",")
>> is [""].
Which I wrote as: (n*c).split(c) == (n+1)*['']
The generalization: len(s.split(c)) == s.count(c)+1
You want to change these into
(n*c).split(c) == (n+1)*[''] if n else 
len(s.split(c)) == s.count(c)+1 if s else 0
which is to say, you want to add an easily forgotten conditional and
alternative to definition of split.
> That is one line of reasoning that emphasizes the
> "string-nature" of ''.
I do not see that particularly. I emphasize the algorithmic nature of
functions and prefer simpler definitions/algorithms to more complicated
ones with unnecessary special cases.
> However, I myself, the Ruby folks and Nick would rather
> emphasize the "zero-element-nature"  of ''.
Which says nothing it itself. Saying that one member of the domain of a
function is the identify element under some particular operation
(concatenation, in this case) says nothing about what that member should
be mapped to by any particular function.
You seem to emphasize the mapping (set of ordered pairs) nature of
functions and are hence willing to change one of the mappings (ordered
pairs) without regard to its relation to all the other pairs. This is a
consequence of the set view, which by itself denies any relation between
its members (the mapping pairs).
> "Applying the split operator to the zero element of
> strings should result in the zero element of lists"
To repeat, 'should' has no justification; it is just hand waving.
Would you really say that every function should map identities to
identities (and what if domain and range have more than one)? I hope
not. Would you even say that every *string* function should map '' to
the identity elememt of the range set? Or more specifically, should
every string->list function map '' to ? Nonsense. It depends on the
To also repeat, if split produced an iterable, then there would be no
'zero element of lists' to talk about.
Anyway, it is a moot point as change would break code.
> The general problem stems from the fact that my initial
> expectation that
> f_a(x) = x.join(a).split(x), where x in lists, a in strings
> should be an identity function can not be satisfied as join
> is non-injective (because of the surjective example above).
Since I was the first to point this out, I am glad you now agree.
Terry Jan Reedy
More information about the Python-ideas