[New-bugs-announce] [issue39440] Use PyNumber_InPlaceAdd in sum() for the second iteration onward
edk
report at bugs.python.org
Thu Jan 23 18:00:12 EST 2020
New submission from edk <e at kellett.im>:
The C implementation of sum() contains this comment:
/* It's tempting to use PyNumber_InPlaceAdd instead of
PyNumber_Add here, to avoid quadratic running time
when doing 'sum(list_of_lists, [])'. However, this
would produce a change in behaviour: a snippet like
empty = []
sum([[x] for x in range(10)], empty)
would change the value of empty. */
But that doesn't hold after PyNumber_Add has been called once,
because from that point onward the accumulator value is something
we got from an __add__, and the caller can't know if we reuse it.
The in-place version is substantially faster for some workloads--
the example in the comment is an obvious one, but the context in
which I ran into this was using a sum of Counters in a one-liner
a bit like this:
sum((Counter({line.split("|", 3): len(line)}) for line in sys.stdin), Counter())
in which significant time seems to be spent adding the contents
of the previous accumulator value to the new one.
# before
; ./python -m timeit 'sum(([x] for x in range(1000)), [])'
500 loops, best of 5: 888 usec per loop
# after
; ./python -m timeit 'sum(([x] for x in range(1000)), [])'
5000 loops, best of 5: 65.3 usec per loop
; cat test_.py
from collections import Counter
import timeit
import random
data = [Counter({random.choice(['foo', 'bar', 'baz', 'qux']): random.randint(1,1000000)}) for _ in range(10000)]
print(min(timeit.repeat('sum(data, Counter())', 'from collections import Counter', number=100, globals={'data': data})))
print(min(timeit.repeat('reduce(Counter.__iadd__, data, Counter())', 'from collections import Counter; from functools import reduce', number=100, globals={'data': data})))
# before
; ./python test_.py
1.8981186050223187
0.7094596439856105
# after
; ./python test_.py
0.715508968976792
0.7050370009965263
----------
messages: 360583
nosy: edk
priority: normal
severity: normal
status: open
title: Use PyNumber_InPlaceAdd in sum() for the second iteration onward
type: performance
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue39440>
_______________________________________
More information about the New-bugs-announce
mailing list