[Python-ideas] Fast sum() for non-numbers - why so much worries?

Sergey sergemp at mail.ru
Fri Jul 12 22:52:41 CEST 2013


On Jul 11, 2013 Ron Adam wrote:

>> It's just instead of discussing what is the best way to fix a slowness,
>> I'm spending most time trying to convince people that slowness should
>> be fixed.
>> — sum is slow for lists, let's fix that!
>> — you shouldn't use sum...
>> — why can't I use sum?
>> — because it's slow
>> — then let's fix that!
>> — you shouldn't use sum...
>> I haven't thought that somebody can truly believe that something should
>> be slow, and will find one excuse after another instead of just fixing
>> the slowness.
> 
> My advise is to not try so hard to change an individuals mind.

I'm not trying to change someones mind (well, maybe I do, but that's
just a side-effect). I'm trying to understand their mind.

What I'm really trying is to find a solution that would take in
account as many opinions as possible. But I need to understand them
to do that.

I understood when Steven said "I am uncomfortable about changing the
semantics to use __iadd__ instead of __add__". I was unsure about
that too, but since its not officially documented that sum() uses
__add__, I was hoping that nobody is relying on it and nothing
would break if this is changed.

I understood Joshua when it appeared that such change can break
numpy-based code, and he said "We can't rush a semantic change for
code that's in popular usage... it seems I've left for the dark
side." (and I agree with him, that's why I suggested [1] to record
that so others would not be tempted to do that again, so we're
still on the same side about that patch :)).

I even understood when Stefan said that using "+" and sum() for
concatenation makes no (obvious) sense. IMO, it does not matter for
our case, it's just a feature that you should be aware of. Using "+"
to add lists is like using 2**20 (e.g. instead of 2^20) to get a
power, it's neither good nor bad, it's just how it is named here.
I mean, I do not agree with that point, but I understand it.

But I cannot understand Andrew. From the very beginning it seemed
that he's mainly concerned about speed, he was constantly asking me
how to speed up different types (or rather he was insisting that
I cannot speed them up). When I explicitly asked him whether he
thinks that sum() must not be optimized JUST because of possible
speed issues with other types he said "Yes". But in the next email
he says that speed is not the main reason...

So I should either stop trying to understand him (and I don't want
to, because discussing the problem with him have inspired me with new
ideas) or I'm doomed to repeat the same questions over and over in
different forms until I finally understand what he really wants.

> Are you familiar with the informal voting system we use? Basically
> take a look though the discussion and look for [...]

Thank you for detailed explanation. Unfortunatelly I can't just
account them once and forget about that, because I adapt my
suggestion to these opinions. I.e. initially there was just one patch
suggested and now there're three patches and two more ideas waiting
to be discussed and, maybe, modified again.

> So.. make the numbers case faster, but probably don't bother changing the 
> non numbers case.  (It seems like this is the preferred view so far.)
>
> There might be some support for depreciating the non-numbers
> case.  I'm not  sugesting that *you* do that btw... see below. :-)

Sum is not just for numbers. It's a rather good choice to add many
things, including timedeltas and different numpy types. That's why
it was never restricted to work on numbers only. It only has string
restriction (for historical reasons and it's more a note to newbies
than a restriction because it can be easily tricked if needed).

So we can't just deprecate non-numbers, well, we can but I don't
think it's a good idea.

-- 
[1] http://bugs.python.org/issue18305#msg192956
    http://bugs.python.org/file30904/fastsum-iadd_warning.patch


More information about the Python-ideas mailing list