[Python-ideas] Fast sum() for non-numbers - why so much worries?
Sergey
sergemp at mail.ru
Fri Jul 12 05:45:51 CEST 2013
On Jul 10 2013, Steven D'Aprano wrote:
>>> No, please. Using sum() on lists is not more than a hack that
>>> seems to be a cool idea but isn't. Seriously - what's the sum of
>>> lists? Intuitively, it makes no sense at all to say sum(lists).
>>
>> It's the same as it is now. What else can you think about when you
>> see: [1, 2, 3] + [4, 5] ?
>
> Some of us think that using + for concatenation is an abuse of
> terminology, or at least an unfortunate choice of operator, and are
> wary of anything that encourages that terminology.
>
> Nevertheless, you are right, in Python 3 both + and sum of lists
> is well-defined. At the moment sum is defined in terms of __add__.
> You want to change it to be defined in terms of __iadd__. That is a
> semantic change that needs to be considered carefully, it is not just
> an optimization.
Yes, I understand that. On the other hand is this documented
anywhere? Does anything says that sum() actually uses __add__,
not __iadd__ or something else...
I'm not trying to say that we can change that freely. I'm just
trying to find out how tough could be such a change.
> I have been very careful to say that I am only a little bit
> against this idea, -0 not -1. I am uncomfortable about changing the
> semantics to use __iadd__ instead of __add__, because I expect that
> this will change the behaviour of sum() for non-builtins.
> I worry about increased complexity making maintenance harder for
> no good reason. It's the "for no good reason" that concerns me:
> you could answer some of my objections if you showed:
>
> - real code written by people who sum() large (more than 100)
> numbers of lists;
That would be hard. You're asking to find someone using a bad
function in exactly the case where it's bad. :) Even if nobody
uses a bad function that does not mean it should stay bad.
> - real code with comments like "this is a work-around for sum()
> being slow on lists";
Even I probably wouldn't do that, I would just silently use another
function.
> - bug reports or other public complaints by people (other than
> you) complaining that sum(lists) is slow;
This is easier. sum() is constantly suggested as an option to list
additions in those questions I could find, and often someone comes
later and says "be careful, it may be slow". So this is kind of
common attempt. Examples [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
> or similar. That would prove that people do call sum on lists.
I can't say they do, but they definitely try.
> Earlier in this discussion, you posted benchmarks for the patched
> sum using Python 2.7. Would you be willing to do it again for 3.3?
Well, I posted them for Python 2.7 because I expected that patch
would not introduce any changes, so it could be applied for 2.7 too.
The patch itself works for both 2.7 and 3.3.
Python 3.3.2 + fastsum-special-tuplesandlists.patch [11]
Before patch:
list compr: 14.5
chain: 10.5
chain.from_iterable: 7.44
aug.assignment: 5.8
regular extend: 9.34
optimized extend: 5.59
sum: infinite
After patch:
list compr: 14.5
chain: 10.5
chain.from_iterable: 7.44
aug.assignment: 5.79
regular extend: 9.47
optimized extend: 5.53
sum: 2.58
I used the same tests as you [12] except a minor bug fixed: you
did "l = []" only at start, while I do it for every iteration.
I hope this patch would not introduce any changes. :)
> And confirm that the Python test suite continues to pass?
Yes, they do. But they don't test many cases. And they definitely
test nothing like: http://bugs.python.org/issue18305#msg192919
--
[1]
http://stackoverflow.com/questions/406121/flattening-a-shallow-list-in-python
[2]
http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
[3]
http://stackoverflow.com/questions/3021641/concatenation-of-many-lists-in-python
[4]
http://stackoverflow.com/questions/7895449/merging-a-list-of-lists
[5]
http://stackoverflow.com/questions/10424219/combining-lists-into-one
[6]
http://stackoverflow.com/questions/11331908/how-to-use-reduce-with-list-of-lists
[7]
http://stackoverflow.com/questions/17142101/concatenating-sublists-python
[8]
http://stackoverflow.com/questions/716477/join-list-of-lists-in-python
2009-04-04 CMS
x = [["a","b"], ["c"]]
result = sum(x, [])
2010-09-04 habnabit
O(n^2) complexity yaaaaaay.
[9]
http://article.gmane.org/gmane.comp.python.general/658537
The mildly surprising part of sum() is that is does add vs. add-in-
place, which leads to O(N) vs. O(1) for the inner loop calls, for
certain data structures, notably lists, even though none of the
intermediate results get used by the caller. For lists, you could
make a more efficient variant of sum() that clones the start value and
does add-in-place.
[10]
http://article.gmane.org/gmane.comp.python.general/441831
> A fast implementation would probably allocate the output list just
> once and then stream the values into place with a simple index.
That's what I hoped "sum" would do, but instead it barfs with a type
error.
[11]
http://bugs.python.org/file30897/fastsum-special-tuplesandlists.patch
[12] Two test numbers are before and after patch:
$ ./python -mtimeit --setup="x=[[1,2,3]]*100000" \
"[i for l in x for i in l]"
100 loops, best of 3: 14.5 msec per loop
100 loops, best of 3: 14.5 msec per loop
$ ./python -mtimeit --setup="x=[[1,2,3]]*100000" \
--setup="from itertools import chain" \
"list(chain(*x))"
100 loops, best of 3: 10.5 msec per loop
100 loops, best of 3: 10.5 msec per loop
$ ./python -mtimeit --setup="x=[[1,2,3]]*100000" \
--setup="from itertools import chain" \
"list(chain.from_iterable(x))"
100 loops, best of 3: 7.44 msec per loop
100 loops, best of 3: 7.44 msec per loop
$ ./python -mtimeit --setup="x=[[1,2,3]]*100000" \
"l = []" \
"for i in x: l += i"
100 loops, best of 3: 5.8 msec per loop
100 loops, best of 3: 5.79 msec per loop
$ ./python -mtimeit --setup="x=[[1,2,3]]*100000" \
"l = []" \
"for i in x: l.extend(i)"
100 loops, best of 3: 9.34 msec per loop
100 loops, best of 3: 9.47 msec per loop
$ ./python -mtimeit --setup="x=[[1,2,3]]*100000" \
"l = []" \
"extend=l.extend" \
"for i in x: extend(i)"
100 loops, best of 3: 5.59 msec per loop
100 loops, best of 3: 5.53 msec per loop
$ ./python -mtimeit --setup="x=[[1,2,3]]*100000" \
"sum(x,[])"
infinite
100 loops, best of 3: 2.58 msec per loop
More information about the Python-ideas
mailing list