Mailman 3 str.split() oddness - Python-ideas

newer
Bring back callable()

str.split() oddness

older
windows dispatcher exe for python

Mart Sõmermaa

26 Feb 2011 26 Feb '11

2:03 p.m.

IMHO, x.join(a).split(x) should be "idempotent" in regard to a.

...

...
...
foo = ['a', 'b', 'c'] assert '|'.join(foo).split('|') == foo foo = ['a'] assert '|'.join(foo).split('|') == foo foo = [] assert ' '.join(foo).split() == foo

And now the odd exception to the rule:

...

...
...
assert '|'.join(foo).split('|') == foo Traceback (most recent call last): File "<stdin>", line 1, in <module> AssertionError

That forces one to write special case code when using custom separators. Consider: # clean baz = dict(chunk.split('=') for chunk in baz.split()) # ugly baz = (dict(chunk.split('=') for chunk in baz.split("|")) if baz else {}) Our younger cousin Ruby has no such idiosyncrasies:

...

...
foo = [] foo.join('|').split('|') == foo => true

What is the reason for that oddity? Can we amend it? Best regards, Mart Sõmermaa

Show replies by date

Joao S. O. Bueno

26 Feb 26 Feb

3:44 p.m.

On Sat, Feb 26, 2011 at 11:03 AM, Mart Sõmermaa <mrts.pydev@gmail.com> wrote:

...

IMHO, x.join(a).split(x) should be "idempotent" in regard to a.

...
...
...
foo = ['a', 'b', 'c'] assert '|'.join(foo).split('|') == foo foo = ['a'] assert '|'.join(foo).split('|') == foo foo = [] assert ' '.join(foo).split() == foo

And now the odd exception to the rule:

...
...
...
assert '|'.join(foo).split('|') == foo Traceback (most recent call last): File "<stdin>", line 1, in <module> AssertionError

That forces one to write special case code when using custom separators. Consider:

# clean baz = dict(chunk.split('=') for chunk in baz.split()) # ugly baz = (dict(chunk.split('=') for chunk in baz.split("|")) if baz else {})

Our younger cousin Ruby has no such idiosyncrasies:

It is no idiosyncrazy -- Split returns what it should return - a list with an empty string:

...

...
...
''.split("|") [''] and it would break a lot of code if it didn't. Filtering out lists with empty string does not see a big issue compared to the inconsistencies that would arise from any different behavior for split.

Any list of strings does the roundtrip with a join->split sequence. Lists of any other elements, or empty lists don't. js -><-

...

...
...
foo = [] foo.join('|').split('|') == foo => true

What is the reason for that oddity? Can we amend it?

Best regards, Mart Sõmermaa _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Terry Reedy

6:52 p.m.

On 2/26/2011 9:03 AM, Mart Sõmermaa wrote:

...

IMHO, x.join(a).split(x) should be "idempotent" in regard to a.

Given that x.join is *not* 1 to 1,

...

...
...
'a'.join([]) '' 'a'.join(['']) ''

it cannot have an inverse for all outputs. In particular, ''.split('a') cannot be both [] and ['']. This could only be fixed by changing the definition of join to not allow joining on [], but that would not be convenient. I believe joining is otherwise 1 to 1 and invertible for non-empty lists. Of course, join input a can be any iterable of strings, whereas split produces a list, so your equality test can only work for list inputs unless generalized to c.join(a).split(c) == list(a). ''.split('a') == [''], not [], by the definition of s.split(c): a list of pieces of s that were previously joined by c. In particular, string_not_containing_sep.split(sep) == [string_not_containing_sep]. Note that empty pieces are inserted for repeated seps so that splitting on seps (unlike splitting on 'whitespace') *is* 1 to 1. 'abc'.split('b') == ['a','c'] 'abbc'.split('b') == ['a','','c'] (whereas 'a c'.split() and 'a c'.split() are both ['a','c']) Therefore, sep splitting does have an inverse: c.join(s.split(c)) == s The doc for str.split specifies the above and makes clear that splitting with and without a separator are slightly different functions.

...

...
...
...
assert ' '.join(foo).split() == foo

You have pulled a fast one here. ' ' does not equal 'whitespace' ;-) If x in your original expression is nothing (to indicate 'whitespace'), then your desired equality becomes .join(a).split() == a which is not legal ;-). Some of the above is a rewording and expansion upon what Joao already said, which was all correct. -- Terry Jan Reedy

Arnaud Delobelle

9:31 p.m.

On 26 February 2011 14:03, Mart Sõmermaa <mrts.pydev@gmail.com> wrote:

...

IMHO, x.join(a).split(x) should be "idempotent" in regard to a.

Idempotent is the wrong word here. A function f is idempotent if f(f(x)) == f(x) for all x. What you are stating is that given: f_s(x) = s.join(x) g_s(x) = x.split(s) Then for all s and x, g_s(f_s(x)) == x. If this condition is satisfied then f_s and g_s are said to be each other's inverse. First you have to define clearly the domain of both functions for this to make sense. It seems that you consider the following domains: Domain of g_s = all strings Domain of f_s = all lists of strings which do not contain s Note that the domain of f_s is already quite complicated. As you point out, it can't work. As f_s([]) == f_s(['']) == '', g_s('') can't be both [] and ['']. But if you change the domain of f_s to: Domain of f_s = all non-empty lists of strings which do not contain s Then f_s and g_s are indeed the inverse of each other. Note also that in ruby, [''].join(s).split(s) == [''] evaluates to false. So the problem is also present with ruby. Ruby decided that ''.split(s) is [], whereas Python decided that ''.split(s) is ['']. The only solution would be to raise an exception when joining an empty list, which I guess is not very desirable. -- Arnaud

Mart Sõmermaa

27 Feb 27 Feb

10:18 p.m.

On Sat, Feb 26, 2011 at 11:31 PM, Arnaud Delobelle <arnodel@gmail.com> wrote:

...

On 26 February 2011 14:03, Mart Sõmermaa <mrts.pydev@gmail.com> wrote:

...
IMHO, x.join(a).split(x) should be "idempotent" in regard to a.

Idempotent is the wrong word here.

I should have said "identity function" instead. Sorry for the confusion. (Identity function is idempotent though [1].) ~ Terry, thanks for pointing out that as string_not_containing_sep.split(sep) == [string_not_containing_sep], therefore ''.split('b') == ['']. That's the gist of it. I would like to question that reasoning though. '' (the empty string) is "nothing", the zero element [2] of strings. The problem is that it is treated as "something". I would say that precisely because it is the zero element, ''.split('b') should read "applying the split operator with any argument to the zero element of strings results in the zero element of lists" and therefore ''.split('b') == ''.split() == [] (like in Ruby). And sorry for using "zero element" loosely, I hope it's understandable what I mean from context. ~ Knowing that reasoning and the inconvenient special casing that it causes in actual code, would you still design split() as ''.split('b') == [''] today? [1] http://en.wikipedia.org/wiki/Idempotence [2] http://en.wikipedia.org/wiki/Zero_element

Nick Coghlan

11:50 p.m.

On Mon, Feb 28, 2011 at 8:18 AM, Mart Sõmermaa <mrts.pydev@gmail.com> wrote:

...

Knowing that reasoning and the inconvenient special casing that it causes in actual code, would you still design split() as ''.split('b') == [''] today?

No, but that isn't really the question we need to ask. The more important question is, given that it *does* behave this way now, is changing it worth the inevitable hassle? How would we get there from here without gratuitously breaking working programs? So, even though I agree that Ruby's semantics are probably better in this case, I don't see it as sufficiently important to justify the breakage involved in fixing it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Terry Reedy

28 Feb 28 Feb

12:11 a.m.

On 2/27/2011 5:18 PM, Mart Sõmermaa wrote:

...

On Sat, Feb 26, 2011 at 11:31 PM, Arnaud Delobelle<arnodel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

...
On 26 February 2011 14:03, Mart Sõmermaa<mrts.pydev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

...
IMHO, x.join(a).split(x) should be [invertible) with respect to a.

Terry, thanks for pointing out that as string_not_containing_sep.split(sep) == [string_not_containing_sep], therefore ''.split('b') == [''].

Let me generalize this as follows: len(s.split(c)) == s.count(c)+1 and specialize this as follows: (n*c).split(c) == (n+1)*['']

...

That's the gist of it.

That, and the fact the .join is not 1 to 1 and therefore inherently not completely invertible, despite your wishes that it be so.

...

I would like to question that reasoning though.

Even though it is coherent and sound? Why?

...

'' (the empty string) is "nothing", the zero element [2] of strings.

So what. That is no reason in itself to break the general pattern.

...

The problem is that it is treated as "something".

In what sense? Of course, it is a string object.

...

I would say that precisely because it is the zero element, ''.split('b') should read "applying the split operator with any argument to the zero element of strings results in the zero element of lists"

Sorry, I do not see that all all. This ad hoc special case rule 1. makes no particular sense to me, except to produce the result you want; 2. breaks the invariant above, and all special cases thereof;' 3. requires the addition of a special case in the algorithm; 4. causes << 'x'.join(['']).split('x') == [''] >> to because False, when you say it should be True, as it is now.

...

and therefore ''.split('b') == ''.split() == [] (like in Ruby).

...

Knowing that reasoning

I do not see any reasoning other that 'do what Ruby does'. Why did Ruby change? Really thought out? or accident?

...

and the inconvenient special casing that it causes in actual code,

I do not remember even one example, let alone a broad survey of use cases.

...

would you still design split() as ''.split('b') == [''] today?

I did not design it, but as you can guess from the above... yes. What I might change today is to make split lazy by returning an interator rather than a list. Otherwise, the definition of s.split(c) as s split at each occurence of c is quite coherent and without need of an arbitrary special case. I see this as somewhat similar to 0**0==1 resulting from a uniform coherent rule: for n a count, x**n is 1 multiplied by x n times. Whereas some claim that it should be special cased as 0 or disallowed. -- Terry Jan Reedy

Guido van Rossum

12:13 a.m.

Does Ruby in general leave out empty strings from the result? What does it return when "x,,y" is split on "," ? ["x", "", "y"] or ["x", "y"]? In Python the generalization is that since "xx".split(",") is ["xx"], and "x",split(",") is ["x"], it naturally follows that "".split(",") is [""]. On Sun, Feb 27, 2011 at 2:18 PM, Mart Sõmermaa <mrts.pydev@gmail.com> wrote:

...

On Sat, Feb 26, 2011 at 11:31 PM, Arnaud Delobelle <arnodel@gmail.com> wrote:

...
On 26 February 2011 14:03, Mart Sõmermaa <mrts.pydev@gmail.com> wrote:

...
IMHO, x.join(a).split(x) should be "idempotent" in regard to a.

Idempotent is the wrong word here.

I should have said "identity function" instead. Sorry for the confusion. (Identity function is idempotent though [1].)

~

Terry, thanks for pointing out that as

string_not_containing_sep.split(sep) == [string_not_containing_sep],

therefore

''.split('b') == [''].

That's the gist of it.

I would like to question that reasoning though. '' (the empty string) is "nothing", the zero element [2] of strings. The problem is that it is treated as "something". I would say that precisely because it is the zero element,

''.split('b')

should read

"applying the split operator with any argument to the zero element of strings results in the zero element of lists"

and therefore

''.split('b') == ''.split() == []

(like in Ruby). And sorry for using "zero element" loosely, I hope it's understandable what I mean from context.

~

Knowing that reasoning and the inconvenient special casing that it causes in actual code, would you still design split() as ''.split('b') == [''] today?

[1] http://en.wikipedia.org/wiki/Idempotence [2] http://en.wikipedia.org/wiki/Zero_element _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

-- --Guido van Rossum (python.org/~guido)

Mart Sõmermaa

6 Mar 6 Mar

6:32 p.m.

First, sorry for such a big delay in replying. On Mon, Feb 28, 2011 at 2:13 AM, Guido van Rossum <guido@python.org> wrote:

...

Does Ruby in general leave out empty strings from the result? What does it return when "x,,y" is split on "," ? ["x", "", "y"] or ["x", "y"]?

...

...
"x,,y".split(",") => ["x", "", "y"]

But let me remind that the behaviour of foo.split(x) where foo is not an empty string is not questioned at all, only behaviour when splitting the empty string is. Python Ruby join1 [''] => '' [''] => '' join2 [ ] => '' [ ] => '' Python Ruby split [''] <= '' [ ] <= '' As you can see, join1 and join2 are identical in both languages. Python has chosen to make split the inverse of join1, Ruby, on the other hand, the inverse of join2.

...

In Python the generalization is that since "xx".split(",") is ["xx"], and "x",split(",") is ["x"], it naturally follows that "".split(",") is [""].

That is one line of reasoning that emphasizes the "string-nature" of ''. However, I myself, the Ruby folks and Nick would rather emphasize the "zero-element-nature" [1] of ''. Both approaches are based on solid reasoning, the latter just happens to be more practical. And I would still claim that "Applying the split operator to the zero element of strings should result in the zero element of lists" wins on theoretical grounds as well. The general problem stems from the fact that my initial expectation that f_a(x) = x.join(a).split(x), where x in lists, a in strings should be an identity function can not be satisfied as join is non-injective (because of the surjective example above). [1] http://en.wikipedia.org/wiki/Zero_element

Georg Brandl

7:35 p.m.

On 06.03.2011 19:32, Mart Sõmermaa wrote:

...

...
In Python the generalization is that since "xx".split(",") is ["xx"], and "x",split(",") is ["x"], it naturally follows that "".split(",") is [""].

That is one line of reasoning that emphasizes the "string-nature" of ''.

However, I myself, the Ruby folks and Nick would rather emphasize the "zero-element-nature" [1] of ''.

Both approaches are based on solid reasoning, the latter just happens to be more practical.

I think we haven't seen any proof of that (and no, the property of x.join(a).split(x) == a is not show me why it would be practical). Georg

Mart Sõmermaa

9:06 p.m.

On Sun, Mar 6, 2011 at 9:35 PM, Georg Brandl <g.brandl@gmx.net> wrote:

...

On 06.03.2011 19:32, Mart Sõmermaa wrote:

...
...
In Python the generalization is that since "xx".split(",") is ["xx"], and "x",split(",") is ["x"], it naturally follows that "".split(",") is [""].

That is one line of reasoning that emphasizes the "string-nature" of ''.

However, I myself, the Ruby folks and Nick would rather emphasize the "zero-element-nature" [1] of ''.

Both approaches are based on solid reasoning, the latter just happens to be more practical.

I think we haven't seen any proof of that (and no, the property of x.join(a).split(x) == a is not show me why it would be practical).

I referred to the practical example in my first message, but let me repeat it. Which do you prefer: bar = dict(chunk.split('=') for chunk in foo.split(",")) or bar = (dict(chunk.split('=') for chunk in foo.split(",")) if foo else {}) ? I'm afraid there are other people besides me that fail to think of the `if foo else {}` part the on the first shot (assuming there will be an empty list when foo='' and that `for` will not be entered at all). Best, Mart Sõmermaa

Joao S. O. Bueno

9:54 p.m.

On Sun, Mar 6, 2011 at 6:06 PM, Mart Sõmermaa <mrts.pydev@gmail.com> wrote:

...

On Sun, Mar 6, 2011 at 9:35 PM, Georg Brandl <g.brandl@gmx.net> wrote:

...
On 06.03.2011 19:32, Mart Sõmermaa wrote:

...
...
In Python the generalization is that since "xx".split(",") is ["xx"], and "x",split(",") is ["x"], it naturally follows that "".split(",") is [""].

That is one line of reasoning that emphasizes the "string-nature" of ''.

However, I myself, the Ruby folks and Nick would rather emphasize the "zero-element-nature" [1] of ''.

Both approaches are based on solid reasoning, the latter just happens to be more practical.

I think we haven't seen any proof of that (and no, the property of x.join(a).split(x) == a is not show me why it would be practical).

I referred to the practical example in my first message, but let me repeat it.

Which do you prefer:

bar = dict(chunk.split('=') for chunk in foo.split(","))

or

bar = (dict(chunk.split('=') for chunk in foo.split(",")) if foo else {})

?

I'm afraid there are other people besides me that fail to think of the `if foo else {}` part the on the first shot (assuming there will be an empty list when foo='' and that `for` will not be entered at all).

Mart, I don't knowe about you, but in my code for example there are plenty, and I mean __plenty__ of places where I assume after a split, I will have at least one element in a list. Python simply does not break code backwards compatibility like that, moreover for such little things like this. Such a behavior,as you describe, while apparently not bad, simply is not that way in Python, and cannot be changed without a break of compatibility. The current behavior has advantages as well: one can always refer to the 1st ( [0] ) element of the split return value. If I want to strip "# " style comments in a simple file: line = line.split("#")[0] Under your new and modified method, this code would break, and would have to contain one extra "if" upon rewriting. In my opinion it makes no sense to break the rules for such a small change. Moreover, if you come to think of it, while parsing lines in a text file, that might contain some kind of assignment interspersed with blank lines, as the one you describe, nearly any code dealing with that will have to check for blank lines containing white spaces as well. And in this case, with or without your changes: a = line.split("=") if len(a)==2: ... The situation you hit where you avoid writing that "if" int he generator expression is more likely very peculiar to the program you were writing in that moment - it is not the case I encounter in real day to day coding. js -><-

...

Best, Mart Sõmermaa _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Steven D'Aprano

11:21 p.m.

On Mon, 7 Mar 2011 08:06:08 am Mart Sõmermaa wrote:

...

Which do you prefer:

bar = dict(chunk.split('=') for chunk in foo.split(","))

or

bar = (dict(chunk.split('=') for chunk in foo.split(",")) if foo else {})

?

Which would you prefer? line = line.split("#")[0].rstrip() line = line.split("#")[0].rstrip() if line else "" Whichever behaviour we give split, we're going to complicate something. Since there's no overwhelming reason to prefer one use-case over the other, the status quo wins. Any change to the behaviour of split will break code which is currently working, and that alone is reason enough to stick with the current behaviour. By the way, your dict() examples are not robust against minor whitespace changes in foo. Consider what happens with either of: foo = "x=1, y = 4, z = 2" foo = "x=1,y=4,z=2," -- Steven D'Aprano

Terry Reedy

7 Mar 7 Mar

2:13 a.m.

On 3/6/2011 4:06 PM, Mart Sõmermaa wrote:

...

Which do you prefer: bar = dict(chunk.split('=') for chunk in foo.split(",")) or bar = (dict(chunk.split('=') for chunk in foo.split(",")) if foo else {})

Others have pointed out that one example is not representative of the universe of use cases of split. However, the irony of this example is the *you* are the one who prefers to add 'if s != '' else []' to the definition of s.split(c) ;-). -- Terry Jan Reedy

Nick Coghlan

6 Mar 6 Mar

11:24 p.m.

On Mon, Mar 7, 2011 at 4:32 AM, Mart Sõmermaa <mrts.pydev@gmail.com> wrote:

...

However, I myself, the Ruby folks and Nick would rather emphasize the "zero-element-nature" [1] of ''.

I did say maybe. As Jesse notes, there's another pattern based line of argument that goes: len(',,'.split('.')) == 3 len(','.split('.')) == 2 len(''.split('.')) == ??? (Well, 1 "obviously", since the pattern suggests that even when there is no other text in the string, the length of the split result is always 1 more than the number of separators occurring in the string) There are reasonable arguments for "''.split(sep)" as the inverse of either "sep.join([''])" or "sep.join([])", but once *either* has been chosen for a given language, none of the arguments are strong enough to justify switching to the other behaviour. Note that, independent of which is chosen, the following identity will hold for an explicit separator: sep.join((text.split(sep)) == text It's only composing them the other way around as "sep.join(data).split(sep)" that will convert either [] to [''] (as in Python) or [''] to [] (as in Ruby). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Terry Reedy

7 Mar 7 Mar

2:07 a.m.

On 3/6/2011 1:32 PM, Mart Sõmermaa wrote:

...

On Mon, Feb 28, 2011 at 2:13 AM, Guido van Rossum<guido-+ZN9ApsXKcEdnm+yROfE0A@public.gmane.org> wrote:

Two minutes before that, I posted a more extensive reply and refutation that you have not replied to.

...

But let me remind that the behaviour of foo.split(x) where foo is not an empty string is not questioned at all, only behaviour when splitting the empty string is.

Python Ruby join1 [''] => '' [''] => '' join2 [ ] => '' [ ] => ''

Python Ruby split ['']<= '' [ ]<= ''

As you can see, join1 and join2 are identical in both languages. Python has chosen to make split the inverse of join1, Ruby, on the other hand, the inverse of join2.

...
In Python the generalization is that since "xx".split(",") is ["xx"], and "x",split(",") is ["x"], it naturally follows that "".split(",") is [""].

Which I wrote as: (n*c).split(c) == (n+1)*[''] The generalization: len(s.split(c)) == s.count(c)+1 You want to change these into (n*c).split(c) == (n+1)*[''] if n else [] len(s.split(c)) == s.count(c)+1 if s else 0 which is to say, you want to add an easily forgotten conditional and alternative to definition of split.

...

That is one line of reasoning that emphasizes the "string-nature" of ''.

I do not see that particularly. I emphasize the algorithmic nature of functions and prefer simpler definitions/algorithms to more complicated ones with unnecessary special cases.

...

However, I myself, the Ruby folks and Nick would rather emphasize the "zero-element-nature" [1] of ''.

Which says nothing it itself. Saying that one member of the domain of a function is the identify element under some particular operation (concatenation, in this case) says nothing about what that member should be mapped to by any particular function. You seem to emphasize the mapping (set of ordered pairs) nature of functions and are hence willing to change one of the mappings (ordered pairs) without regard to its relation to all the other pairs. This is a consequence of the set view, which by itself denies any relation between its members (the mapping pairs).

...

"Applying the split operator to the zero element of strings should result in the zero element of lists"

To repeat, 'should' has no justification; it is just hand waving. Would you really say that every function should map identities to identities (and what if domain and range have more than one)? I hope not. Would you even say that every *string* function should map '' to the identity elememt of the range set? Or more specifically, should every string->list function map '' to []? Nonsense. It depends on the function. To also repeat, if split produced an iterable, then there would be no 'zero element of lists' to talk about. Anyway, it is a moot point as change would break code.

...

The general problem stems from the fact that my initial expectation that

f_a(x) = x.join(a).split(x), where x in lists, a in strings

should be an identity function can not be satisfied as join is non-injective (because of the surjective example above).

Since I was the first to point this out, I am glad you now agree. -- Terry Jan Reedy

Guido van Rossum

3:27 a.m.

Well, I'm sorry, but this is not going to change, so I don't see much point in continuing to discuss it. We can explain the reasoning that leads to the current behavior (as you note, it's solid), we can discuss an alternative that could be considered just as solid, but it can't prevail in this universe. The cost of change is just too high, so we'll just have to live with the current behavior (and we might as well accept that it's solid instead of trying to fight it). --Guido On Sun, Mar 6, 2011 at 10:32 AM, Mart Sõmermaa <mrts.pydev@gmail.com> wrote:

...

First, sorry for such a big delay in replying.

On Mon, Feb 28, 2011 at 2:13 AM, Guido van Rossum <guido@python.org> wrote:

...
Does Ruby in general leave out empty strings from the result? What does it return when "x,,y" is split on "," ? ["x", "", "y"] or ["x", "y"]?

...
...
"x,,y".split(",") => ["x", "", "y"]

But let me remind that the behaviour of foo.split(x) where foo is not an empty string is not questioned at all, only behaviour when splitting the empty string is.

Python Ruby join1 [''] => '' [''] => '' join2 [ ] => '' [ ] => ''

Python Ruby split [''] <= '' [ ] <= ''

As you can see, join1 and join2 are identical in both languages. Python has chosen to make split the inverse of join1, Ruby, on the other hand, the inverse of join2.

...
In Python the generalization is that since "xx".split(",") is ["xx"], and "x",split(",") is ["x"], it naturally follows that "".split(",") is [""].

That is one line of reasoning that emphasizes the "string-nature" of ''.

However, I myself, the Ruby folks and Nick would rather emphasize the "zero-element-nature" [1] of ''.

Both approaches are based on solid reasoning, the latter just happens to be more practical. And I would still claim that

"Applying the split operator to the zero element of strings should result in the zero element of lists"

wins on theoretical grounds as well.

The general problem stems from the fact that my initial expectation that

f_a(x) = x.join(a).split(x), where x in lists, a in strings

should be an identity function can not be satisfied as join is non-injective (because of the surjective example above).

[1] http://en.wikipedia.org/wiki/Zero_element _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

-- --Guido van Rossum (python.org/~guido)

Mart Sõmermaa

6:50 a.m.

That's a well-balanced summary that I entirely agree with. However, I suggest that we keep the pros and cons in mind and perhaps re-discuss the behaviour during the Python 4 design phase. Thank you all for your input, best regards, MS On Mon, Mar 7, 2011 at 5:27 AM, Guido van Rossum <guido@python.org> wrote:

...

Well, I'm sorry, but this is not going to change, so I don't see much point in continuing to discuss it. We can explain the reasoning that leads to the current behavior (as you note, it's solid), we can discuss an alternative that could be considered just as solid, but it can't prevail in this universe. The cost of change is just too high, so we'll just have to live with the current behavior (and we might as well accept that it's solid instead of trying to fight it).

--Guido

Bruce Leban

8:48 a.m.

On Sun, Mar 6, 2011 at 7:27 PM, Guido van Rossum <guido@python.org> wrote:

...

Well, I'm sorry, but this is not going to change ... The cost of change is just too high, so we'll just have to live with the current behavior (and we might as well accept that it's solid instead of trying to fight it).

Completely agree. It's interesting that the one thing that annoys me about string.split hasn't been mentioned here. I'm not bothered by the inconsistency in handling of the degenerate cases because frequently I need code to handle the degenerate case specially anyway. What *does* annoy me is the inconsistency of what the count parameter means between different languages. That is str.split(delimiter, count) means different things in different languages: Python/Javascript = max number of splits Java/C#/Ruby = max number of results Obviously, it would break things badly to switch from one to the other (in any language). An alternative would be first changing to: str.split(sep, maxsplits=None) and modify pylint to complain if maxsplits is used as a non-keyword argument. Eventually, change to str.split(sep, deprecated=None, maxsplits=None) where this throws an exception if deprecated is not None. This would also open up having maxresults keyword if it's desirable to allow either variant. --- Bruce

5043

Age (days ago)

5052

Last active (days ago)

List overview

Download

18 comments

9 participants

participants (9)

Arnaud Delobelle
Bruce Leban
Georg Brandl
Guido van Rossum
Joao S. O. Bueno
Mart Sõmermaa
Nick Coghlan
Steven D'Aprano
Terry Reedy

str.split() oddness

tags

participants (9)