[Python-ideas] Intermediate Summary: Fast sum() for non-numbers

Sun Jul 14 21:26:06 CEST 2013

Hello, python-ideas.
I don't want to call it "summary" yet because I hope there're other
ideas out there that would be suggested later.

So let's call it "Intermediate Summary".

I also shifted emphasis in this summary a little, to show that
a faster sum() may be just a side-effect of some ideas.

Introduction
============
sum() is a great function because it makes the code simple. It was
never restricted to numbers, since it's also useful for adding types
like timedelta or different numpy types.

Unfortunately sum() performance is INCONSISTENT. I.e. it's O(N)
for numbers, but it's O(N*N) for many containers. As a result:
  sum( [123]*1000000, 0 )
is instant, but:
  sum( [[1,2,3]]*1000000, [] )
takes forever to complete.

It's worth to note, that sum() is one of the most commonly suggested
options to add lists [1], despite usually someone also comes and says
that it may be slow. This means, that people at least often try to
use sum() for lists. That case was also explicitly mentioned in
comments to sum() sources. So the problem is not hypothetical.

This thread was not the first time when people discussed how this
problem can be solved [2]. But this is probably the most deep and
detailed discussion of it.

Alternatives
============
During the thread there were many alternatives suggested. Among them:

* Sum is not obvious (for everyone) way to add lists, so people
  should not use it, as there're alternatives, i.e. instead of
  - sum(list_of_lists, [])
  one can use:
  - reduce(operator.iadd, list_of_lists, [])
  - list(itertools.chain.from_iterable(list_of_lists))
  - result = []
    for x in list_of_lists:
        result.extend(x)

* Alternatives should be more visible, so that people would spotted
  them earlier. For example there was a suggestion to move chain
  (or chain.from_iterable) from itertools into builtins, optionally
  changing its name.

* Another alternative was to introduce "reduce" in builtins under 
  a new name "fold", optionally with adjusted parameters to have
  them easier to understand.

* And one more alternative suggested was do nothing.
  I can't understand it, so I'll just quote:

  > You're saying what IS NOT your preferred way. But I'm asking what
  > IS your preferred way. Do you prefer to have a slow sum in python
  > and people asking why it's slow forever? Do you see that as a best
  > possible case for python?  
  YES. Godamnit YES. 100% true-to-form YES.

But I believe that all of them are workarounds (better workarounds,
maybe, but still workaround), that do not solve original problem.
It's like if you have a similar bug [3] in httplib you could answer
that python is not the best tool to download 110Mb files, or you
could suggest some customhttpclient instead, but that would not fix
the bug in httplib.

I'm trying to address this particular issue of sum() inconsistency
being O(N) for numbers, but O(N*N) for many containers. And trying
to find options that could give sum a chance to be O(N) for most
use cases.

Ideas
=====
0. Original patch.
   Original patch [4] was suggesting to use "+" for first item and
   "+=" for everything else. That could make sum faster not only
   for lists but also for all the types having __iadd__ faster
   than __add__. But a clever example from Oscar [5] shown, that
   "+" can be different from "+=" e.g. due to coercion:
       >>> from numpy import array
       >>> a = array([1, 2, 3], dtype=int)
       >>> b = array([1.4, 2.5, 3.6], dtype=float)
       >>> a + b
       array([ 2.4,  4.5,  6.6])
       >>> a += b
       >>> a
       array([2, 4, 6])
   I suggest (patch [6]) to mention this example explicitly in
   comments for sum(), so that future developers would not be
   tempted to follow this approach.

1. Fast custom types.
   Is it fine to have types that are O(N) summable? This question
   looks weird, since sum() is already O(N) for some types and O(N*N)
   for others, but I was several times told things like:
   > Today, sum is not the best way to concatenate sequences. Making
   > it work better for some sequences but not others would mean it's
   > still not the best way to concatenate sequences, but it would
   > _appear_ to be. That's the very definition of an attractive
   > nuisance.

   So I want to clear this out: is it normal that sum() is fast for
   some types (e.g. numbers, timedeltas or FastTuples from [7]) and
   slow for others? Or sum just MUST BE slow and the very existence
   of faster containers is "attractive nuisance", and those should be
   expelled or slowed down too?

2. Fast built-in types.
   If the answer is "Yes", i.e. there's nothing bad in some types
   being faster than others, and there's nothing bad in sum() being
   faster for some types. Then what do you think about optimizing
   built-in types, like lists, tuples and strings? The optimization
   itself could have different benefits:
   * lower memory usage
   * instant creation of one instance from another
   * faster add operation (i.e. "c = a + b")
   * having it common to lists and tuples could give instant
     conversion of lists to tuples and back
   But as a side-effect of such optimization sum() will be O(N) for
   those types.

   So the question is: if it's fine for some types to be faster
   than others, then is it fine if those "some types" are lists or
   tuples? I.e. what if lists and tuples would be O(N) to sum, while
   some other types are not?

   To clear the question whether this is possible I wrote [7]
   a very limited version of tuple, using that approach, so that:
     from fasttuple import ft
     x = sum( [ft([1,2,3])] * 1000000, ft([]) )
   takes a few seconds under a usual unpatched python.

3. Fast built-in types just for sum [8].
   If it's not bad to have lists/tuples faster than other types, then
   how about just implementing a small special case for them?
   Advantage: patch is much shorter [8], than optimization for lists
   and tuples would be, and it would probably cover most (if not all)
   usages of sum() for containers.
   Disadvantage: this patch is limited to sum(), i.e. it gives the
   same performance for sum() as #2, but it does not optimize general
   "c = a + b" code as #2 would.

   Side-note: sum() was designed with special cases in mind from the
   beginning [9]. Initial sum() patch had a special case for strings
   (''.join() was used for them) but due to a complexity of code for
   mixed lists it was replaced by another special case rejecting
   strings. Currently sum has 5 special cases: three to reject
   strings/bytes and two optimizations for ints and floats.

4. If that would be considered a bad idea, i.e. if we decide that
   other custom types should have a fair chance to implement some
   optimization, then how about declaring a unified way to initialize
   a container type from iterable? Something like an optional method
   __init_concatenable_sequence_from_iterable__. Most containers
   already have such method as a default constructor, i.e. it should
   be trivial to implement it for any existing type.

   As a side-effect of this unification sum() could have a general
   optimization for such types, something like:
     if hasattr(type(start), "__init_concatenable_container_from_iterable__"):
       optimized_for = type(start)
       l_result = list()
       l_result.extend(start)
       try:
         while True:
           item = next(it)
           if type(item) is not optimized_for:
             start = optimized_for.__init_concatenable_container_from_iterable__(l_result)
             start = start + item break
           l_result.extend(item)
       except StopIteration:
         return optimized_for.__init_concatenable_container_from_iterable__(l_result)

   (I don't like that long __ name, it was just first I thought about.
   Better names are welcome. __iccfi__?)

5. Alternative unification idea was suggested by MRAB:
   . Get the first item, or the start value
   . Ask the item for an 'accumulator' (item.__accum__()).
   . Add the first item to the accumulator.
   . Add the remaining items to the accumulator.
   . Ask the accumulator for the result.

   If there's no accumulator available (the "__accum__" method isn't
   implemented), then either fall back to the current behaviour or
   raise a TypeError like it currently does for strings.

I *guess* sum() was never discussed before so deeply because many
people either hate sum() or believe that nothing can be done, so they
don't even try to find any options. Or maybe they just prefer to use
some easily available workaround and don't spend time trying to find
other solutions.

PS: If you can, please, explain what exactly you [don't] like in the
ideas above, so I could have a chance to modify the ideas according
to your opinions.

PPS: I'm sorry if I missed some of the suggestions, please, add them
too.

PPPS: I'm also sorry not to answer to some emails during the thread.
I read all of them, but there were already too many of messages in
the thread, so I was trying to limit number of emails I write.
If I missed something important, please, forward your email to me
privately and I'll answer you either directly or on list, if you wish.

-- 
[1] Some stackoverflow questions where sum() is suggested for lists:
http://stackoverflow.com/questions/406121/flattening-a-shallow-list-in-python
http://stackoverflow.com/questions/716477/join-list-of-lists-in-python
http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
http://stackoverflow.com/questions/3021641/concatenation-of-many-lists-in-python
http://stackoverflow.com/questions/7895449/merging-a-list-of-lists
http://stackoverflow.com/questions/10424219/combining-lists-into-one
http://stackoverflow.com/questions/11331908/how-to-use-reduce-with-list-of-lists
http://stackoverflow.com/questions/17142101/concatenating-sublists-python

[2] Some earlier attempts to discuss solutions to sum() slowness:
2006-01-12 http://article.gmane.org/gmane.comp.python.general/441831
> A fast implementation would probably allocate the output list just
> once and then stream the values into place with a simple index.    
That's what I hoped "sum" would do, but instead it barfs with a type
error.

2010-03-27 http://article.gmane.org/gmane.comp.python.general/658537
The mildly surprising part of sum() is that is does add vs. add-in-
place, which leads to O(N) vs. O(1) for the inner loop calls, for
certain data structures, notably lists, even though none of the
intermediate results get used by the caller.  For lists, you could
make a more efficient variant of sum() that clones the start value and
does add-in-place.

[3] http://bugs.python.org/issue6838
*** httplib's _read_chunked extremely slow for lots of chunks ***
As the comment in this method suggests, accumulating the value by
repeated string concatenation is slow. Appending to a list speeds
this up dramatically.

[4] http://bugs.python.org/file30705/fastsum.patch
Original patch using "+" for first item and "+=" for the rest.

[5] http://bugs.python.org/issue18305#msg192873
Oscar Benjamin:
This "optimisation" is a semantic change. It breaks backward
compatibility in cases where a = a + b and a += b do not result in
the name a having the same value. In particular this breaks backward
compatibility for numpy users: [...]

[6] http://bugs.python.org/issue18305#msg192956
    http://bugs.python.org/file30904/fastsum-iadd_warning.patch
Extended warning in sum() comments with more important numpy cases.

[7] http://bugs.python.org/issue18305#msg193048
    http://bugs.python.org/file30917/fasttuple.py
fasttuple.py is a Proof-of-Concept implementation of tuple, that
reuses same data storage when possible. Its possible usage looks
similar to built-in tuples [...]. An interesting side-effect of this
implementation is a faster __add__ operator:
  Adding 100000 of fasttuples
  took 0.23242688179 seconds
  Adding 100000 of built-in tuples
  took 25.2749021053 seconds

[8] http://bugs.python.org/issue18305#msg192919
    http://bugs.python.org/file30897/fastsum-special-tuplesandlists.patch
Patch implementing a special case for tuples and lists.
Should work for both Python 2.7 and Python 3.3, and should
introduce no behavior change.

[9] http://mail.python.org/pipermail/python-dev/2003-April/034767.html
Alex Martelli (author of sum())
> for the simple reason that I special-case this -- when the first
> item is a PyBaseString_Type, I delegate to ''.join