[Python-ideas] str.split() oddness

Joao S. O. Bueno jsbueno at python.org.br
Sun Mar 6 22:54:48 CET 2011


On Sun, Mar 6, 2011 at 6:06 PM, Mart Sõmermaa <mrts.pydev at gmail.com> wrote:
> On Sun, Mar 6, 2011 at 9:35 PM, Georg Brandl <g.brandl at gmx.net> wrote:
>> On 06.03.2011 19:32, Mart Sõmermaa wrote:
>>
>>>> In Python the generalization is that since "xx".split(",") is ["xx"],
>>>> and "x",split(",") is ["x"], it naturally follows that "".split(",")
>>>> is [""].
>>>
>>> That is one line of reasoning that emphasizes the
>>> "string-nature" of ''.
>>>
>>> However, I myself, the Ruby folks and Nick would rather
>>> emphasize the "zero-element-nature" [1] of ''.
>>>
>>> Both approaches are based on solid reasoning, the latter
>>> just happens to be more practical.
>>
>> I think we haven't seen any proof of that (and no, the property
>> of x.join(a).split(x) == a is not show me why it would be practical).
>
> I referred to the practical example in my first message,
> but let me repeat it.
>
> Which do you prefer:
>
>  bar = dict(chunk.split('=') for chunk in foo.split(","))
>
> or
>
>  bar = (dict(chunk.split('=') for chunk in foo.split(",")) if foo else {})
>
> ?
>
> I'm afraid there are other people besides me that fail to think
> of the `if foo else {}` part the on the first shot (assuming there will be an
> empty list when foo='' and that `for` will not be entered at all).


Mart, I don't knowe about you, but in my code for example there
are plenty, and I mean __plenty__ of places where I assume
after a split, I will have at least one element in a list.

Python simply does not break code backwards compatibility like that,
moreover for such little things like this.

Such a behavior,as you describe, while apparently not bad, simply is
not that way in Python,
and cannot be changed without a break of compatibility. The current behavior
has advantages as well: one can always refer to the 1st ( [0] )
element of the split return value.
If I want to strip "# " style comments in a simple file:

line = line.split("#")[0]

Under your new and modified method, this code would break, and would
have to contain
one extra "if"  upon rewriting. In my opinion it makes no sense to
break the rules for
such a small change.

Moreover, if you come to think of it, while parsing lines in a text
file, that might contain some kind of assignment interspersed with
blank lines, as the one you describe,
nearly any code dealing with that will have to check for blank lines containing
white spaces as well.

And in this case, with or without your changes:

a  = line.split("=")
if len(a)==2:
    ...

The situation you hit where you avoid writing that "if" int he
generator expression is more likely very peculiar to the program you
were writing in that moment - it is not the case I encounter in real
day to day coding.


  js
 -><-

> Best,
> Mart Sõmermaa
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



More information about the Python-ideas mailing list