[Python-ideas] Intermediate Summary: Fast sum() for non-numbers
Sergey
sergemp at mail.ru
Sun Jul 14 21:26:06 CEST 2013
Hello, python-ideas.
I don't want to call it "summary" yet because I hope there're other
ideas out there that would be suggested later.
So let's call it "Intermediate Summary".
I also shifted emphasis in this summary a little, to show that
a faster sum() may be just a side-effect of some ideas.
Introduction
============
sum() is a great function because it makes the code simple. It was
never restricted to numbers, since it's also useful for adding types
like timedelta or different numpy types.
Unfortunately sum() performance is INCONSISTENT. I.e. it's O(N)
for numbers, but it's O(N*N) for many containers. As a result:
sum( [123]*1000000, 0 )
is instant, but:
sum( [[1,2,3]]*1000000, [] )
takes forever to complete.
It's worth to note, that sum() is one of the most commonly suggested
options to add lists [1], despite usually someone also comes and says
that it may be slow. This means, that people at least often try to
use sum() for lists. That case was also explicitly mentioned in
comments to sum() sources. So the problem is not hypothetical.
This thread was not the first time when people discussed how this
problem can be solved [2]. But this is probably the most deep and
detailed discussion of it.
Alternatives
============
During the thread there were many alternatives suggested. Among them:
* Sum is not obvious (for everyone) way to add lists, so people
should not use it, as there're alternatives, i.e. instead of
- sum(list_of_lists, [])
one can use:
- reduce(operator.iadd, list_of_lists, [])
- list(itertools.chain.from_iterable(list_of_lists))
- result = []
for x in list_of_lists:
result.extend(x)
* Alternatives should be more visible, so that people would spotted
them earlier. For example there was a suggestion to move chain
(or chain.from_iterable) from itertools into builtins, optionally
changing its name.
* Another alternative was to introduce "reduce" in builtins under
a new name "fold", optionally with adjusted parameters to have
them easier to understand.
* And one more alternative suggested was do nothing.
I can't understand it, so I'll just quote:
> You're saying what IS NOT your preferred way. But I'm asking what
> IS your preferred way. Do you prefer to have a slow sum in python
> and people asking why it's slow forever? Do you see that as a best
> possible case for python?
YES. Godamnit YES. 100% true-to-form YES.
But I believe that all of them are workarounds (better workarounds,
maybe, but still workaround), that do not solve original problem.
It's like if you have a similar bug [3] in httplib you could answer
that python is not the best tool to download 110Mb files, or you
could suggest some customhttpclient instead, but that would not fix
the bug in httplib.
I'm trying to address this particular issue of sum() inconsistency
being O(N) for numbers, but O(N*N) for many containers. And trying
to find options that could give sum a chance to be O(N) for most
use cases.
Ideas
=====
0. Original patch.
Original patch [4] was suggesting to use "+" for first item and
"+=" for everything else. That could make sum faster not only
for lists but also for all the types having __iadd__ faster
than __add__. But a clever example from Oscar [5] shown, that
"+" can be different from "+=" e.g. due to coercion:
>>> from numpy import array
>>> a = array([1, 2, 3], dtype=int)
>>> b = array([1.4, 2.5, 3.6], dtype=float)
>>> a + b
array([ 2.4, 4.5, 6.6])
>>> a += b
>>> a
array([2, 4, 6])
I suggest (patch [6]) to mention this example explicitly in
comments for sum(), so that future developers would not be
tempted to follow this approach.
1. Fast custom types.
Is it fine to have types that are O(N) summable? This question
looks weird, since sum() is already O(N) for some types and O(N*N)
for others, but I was several times told things like:
> Today, sum is not the best way to concatenate sequences. Making
> it work better for some sequences but not others would mean it's
> still not the best way to concatenate sequences, but it would
> _appear_ to be. That's the very definition of an attractive
> nuisance.
So I want to clear this out: is it normal that sum() is fast for
some types (e.g. numbers, timedeltas or FastTuples from [7]) and
slow for others? Or sum just MUST BE slow and the very existence
of faster containers is "attractive nuisance", and those should be
expelled or slowed down too?
2. Fast built-in types.
If the answer is "Yes", i.e. there's nothing bad in some types
being faster than others, and there's nothing bad in sum() being
faster for some types. Then what do you think about optimizing
built-in types, like lists, tuples and strings? The optimization
itself could have different benefits:
* lower memory usage
* instant creation of one instance from another
* faster add operation (i.e. "c = a + b")
* having it common to lists and tuples could give instant
conversion of lists to tuples and back
But as a side-effect of such optimization sum() will be O(N) for
those types.
So the question is: if it's fine for some types to be faster
than others, then is it fine if those "some types" are lists or
tuples? I.e. what if lists and tuples would be O(N) to sum, while
some other types are not?
To clear the question whether this is possible I wrote [7]
a very limited version of tuple, using that approach, so that:
from fasttuple import ft
x = sum( [ft([1,2,3])] * 1000000, ft([]) )
takes a few seconds under a usual unpatched python.
3. Fast built-in types just for sum [8].
If it's not bad to have lists/tuples faster than other types, then
how about just implementing a small special case for them?
Advantage: patch is much shorter [8], than optimization for lists
and tuples would be, and it would probably cover most (if not all)
usages of sum() for containers.
Disadvantage: this patch is limited to sum(), i.e. it gives the
same performance for sum() as #2, but it does not optimize general
"c = a + b" code as #2 would.
Side-note: sum() was designed with special cases in mind from the
beginning [9]. Initial sum() patch had a special case for strings
(''.join() was used for them) but due to a complexity of code for
mixed lists it was replaced by another special case rejecting
strings. Currently sum has 5 special cases: three to reject
strings/bytes and two optimizations for ints and floats.
4. If that would be considered a bad idea, i.e. if we decide that
other custom types should have a fair chance to implement some
optimization, then how about declaring a unified way to initialize
a container type from iterable? Something like an optional method
__init_concatenable_sequence_from_iterable__. Most containers
already have such method as a default constructor, i.e. it should
be trivial to implement it for any existing type.
As a side-effect of this unification sum() could have a general
optimization for such types, something like:
if hasattr(type(start), "__init_concatenable_container_from_iterable__"):
optimized_for = type(start)
l_result = list()
l_result.extend(start)
try:
while True:
item = next(it)
if type(item) is not optimized_for:
start = optimized_for.__init_concatenable_container_from_iterable__(l_result)
start = start + item break
l_result.extend(item)
except StopIteration:
return optimized_for.__init_concatenable_container_from_iterable__(l_result)
(I don't like that long __ name, it was just first I thought about.
Better names are welcome. __iccfi__?)
5. Alternative unification idea was suggested by MRAB:
. Get the first item, or the start value
. Ask the item for an 'accumulator' (item.__accum__()).
. Add the first item to the accumulator.
. Add the remaining items to the accumulator.
. Ask the accumulator for the result.
If there's no accumulator available (the "__accum__" method isn't
implemented), then either fall back to the current behaviour or
raise a TypeError like it currently does for strings.
I *guess* sum() was never discussed before so deeply because many
people either hate sum() or believe that nothing can be done, so they
don't even try to find any options. Or maybe they just prefer to use
some easily available workaround and don't spend time trying to find
other solutions.
PS: If you can, please, explain what exactly you [don't] like in the
ideas above, so I could have a chance to modify the ideas according
to your opinions.
PPS: I'm sorry if I missed some of the suggestions, please, add them
too.
PPPS: I'm also sorry not to answer to some emails during the thread.
I read all of them, but there were already too many of messages in
the thread, so I was trying to limit number of emails I write.
If I missed something important, please, forward your email to me
privately and I'll answer you either directly or on list, if you wish.
--
[1] Some stackoverflow questions where sum() is suggested for lists:
http://stackoverflow.com/questions/406121/flattening-a-shallow-list-in-python
http://stackoverflow.com/questions/716477/join-list-of-lists-in-python
http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
http://stackoverflow.com/questions/3021641/concatenation-of-many-lists-in-python
http://stackoverflow.com/questions/7895449/merging-a-list-of-lists
http://stackoverflow.com/questions/10424219/combining-lists-into-one
http://stackoverflow.com/questions/11331908/how-to-use-reduce-with-list-of-lists
http://stackoverflow.com/questions/17142101/concatenating-sublists-python
[2] Some earlier attempts to discuss solutions to sum() slowness:
2006-01-12 http://article.gmane.org/gmane.comp.python.general/441831
> A fast implementation would probably allocate the output list just
> once and then stream the values into place with a simple index.
That's what I hoped "sum" would do, but instead it barfs with a type
error.
2010-03-27 http://article.gmane.org/gmane.comp.python.general/658537
The mildly surprising part of sum() is that is does add vs. add-in-
place, which leads to O(N) vs. O(1) for the inner loop calls, for
certain data structures, notably lists, even though none of the
intermediate results get used by the caller. For lists, you could
make a more efficient variant of sum() that clones the start value and
does add-in-place.
[3] http://bugs.python.org/issue6838
*** httplib's _read_chunked extremely slow for lots of chunks ***
As the comment in this method suggests, accumulating the value by
repeated string concatenation is slow. Appending to a list speeds
this up dramatically.
[4] http://bugs.python.org/file30705/fastsum.patch
Original patch using "+" for first item and "+=" for the rest.
[5] http://bugs.python.org/issue18305#msg192873
Oscar Benjamin:
This "optimisation" is a semantic change. It breaks backward
compatibility in cases where a = a + b and a += b do not result in
the name a having the same value. In particular this breaks backward
compatibility for numpy users: [...]
[6] http://bugs.python.org/issue18305#msg192956
http://bugs.python.org/file30904/fastsum-iadd_warning.patch
Extended warning in sum() comments with more important numpy cases.
[7] http://bugs.python.org/issue18305#msg193048
http://bugs.python.org/file30917/fasttuple.py
fasttuple.py is a Proof-of-Concept implementation of tuple, that
reuses same data storage when possible. Its possible usage looks
similar to built-in tuples [...]. An interesting side-effect of this
implementation is a faster __add__ operator:
Adding 100000 of fasttuples
took 0.23242688179 seconds
Adding 100000 of built-in tuples
took 25.2749021053 seconds
[8] http://bugs.python.org/issue18305#msg192919
http://bugs.python.org/file30897/fastsum-special-tuplesandlists.patch
Patch implementing a special case for tuples and lists.
Should work for both Python 2.7 and Python 3.3, and should
introduce no behavior change.
[9] http://mail.python.org/pipermail/python-dev/2003-April/034767.html
Alex Martelli (author of sum())
> for the simple reason that I special-case this -- when the first
> item is a PyBaseString_Type, I delegate to ''.join
More information about the Python-ideas
mailing list