restriction on sum: intentional bug?

Stephen Hansen apt.shansen at gmail.com
Fri Oct 16 13:31:36 EDT 2009


On Fri, Oct 16, 2009 at 9:59 AM, Tim Chase <python.list at tim.thechases.com>wrote:

> Stephen Hansen wrote:
>
>> Why doesn't duck typing apply to `sum`?
>>>
>>
>> Because it would be so hideously slow and inefficient that it'd be way too
>> easy a way for people to program something they think should work fine but
>> really doesn't... alternatively, the function would have to do two
>> /completely/ different implementations based on what you're passing in,
>> and
>> that violates duck typing too :)
>>
>
>
> But that's the issue...supporting strings does NOT involve "two
> /completely/ different implementations based on what you're passing in" but
> rather just a reduction of starting point (whether 0 or '') and an object
> that had an __add__ method. String meet these requirements.  Specifically
> disqualifying strings is where you get the two code-paths/implementations.
>

Except implementing:
    sum(["one", "two", "three"], "")
as:
    start = ""
    for x in my_list:
        start = start + x
    return start

Is bad. Its significantly slower then doing it the correct way, which would
be:

    return "".join(my_list)

The former churns through creating and destroying immutable strings all the
time; the latter only creates one string. There's never a reason why you'd
want to do the former in lieu of the latter -- correct me if I'm wrong on
that?

There really is just a right way verses a wrong way to join strings
together; using + is always the wrong way. Sometimes that level of 'wrong'
is so tiny that no one cares, like if you are using it to join together two
small strings. But when joining together a sequence of strings, the wrong
amplifies to become /clearly/ wrong.

So I agree with Alan & Peter that this creates an unfortunate language wart
> among (as Peter aptly puts it) "consenting adults".  I'd feel similarly if
> certain classes anomalously prevented access to internal "private" data.  We
> know that if we go mucking around like this, it's our own fault if things
> break or get slow.  Is Python going to prevent me from typing


>  count = 0
>  for i in range(1000000):
>    if i % 1000: count += 1
>
> instead of specifying the step-size?  Or even forcing me to precompute this
> constant value for `count` because looping is inefficient in this case?
>
>
That comparison is apples to... rocket launchers.

The case with sum has nothing at all to do with the the above example or it
maybe one day trying to "force" you into doing one thing or the other in the
name of Efficiency-- or start going down some data-hiding road.

Yes, sum() is doing some "hand holding" here, but only in one specific case:
because its -always-wrong- to use it in that case.

And more importantly, its *extremely* easy to not really know that it is
wrong or why. So its extremely easy to write code which you think is
perfectly fine, but isn't.

The "consenting adults" argument sort of applies, sure. But these general
principles aren't absolutes. None of them are. In this case, someone decided
that it was way too easy for someone to NOT know that this is wrong and make
a mistake.

You may know why using + to concatenate bunches of string is wrong and for
some reason decide to do it anyways, and in that case, you're a consenting
adult, you should get to do what you want, sure. But there's subtleties here
that relate to the immutability of Python strings and the overhead of
building and creating and destroying them, that a lot of people *aren't*
going to understand right away. They won't know its wrong.

They'll just use sum() because hey, they can use +, so that is obviously the
right way to use join together lots of strings! Especially with the oddity
of ''.join() and methods-on-literals (something I have grown to quite like,
but at first made me blink) looking unusual to them.

There's a lot of "principles" behind Python; there's the zen, there's
philosophies like duck typing, there's the less-easy-to-grasp Pythonic and
Idiomatic Pythonic methodologies, and such. None of them are rules or
absolute edicts. All get bent in certain places.

Okay, you think its a wart that sum() is not a generic aggregation function
capable of aggregating anything which can be added. I get that. Then again,
it doesn't claim to be (in fact, it claims to only aggregate numbers but in
fact will do more, ssh).

In this case though, the "consenting adults" and "duck typing" lost out over
"explicit is better then implicit" and "practicality beats purity" because
sum() (+ based aggregation) is never-the-right-answer for strings and and
str.join is always-the-right-answer, so making it an error helps push people
into learning the difference. Well, that's my guess as to why the devs did
that, at least.

--S
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20091016/eb645531/attachment.html>


More information about the Python-list mailing list