[Python-ideas] Fast sum() for non-numbers

Fri Jul 5 00:25:12 CEST 2013

From: Sergey <sergemp at mail.ru>

Sent: Thursday, July 4, 2013 2:54 AM

> On Jul 04, 2013 Steven D'Aprano wrote:

[to avoid confusion: this is Sergey's summary, not Steven's]

> This message is long, so here's its short summary:

Right on some points, but you're making unwarranted assumptions for these two:

> * sum() can *always* be fast! (patch and tests)
> * linked list is O(n) where n is number of lists to add

>>  sum simply cannot *always* be fast. E.g. summing tuples will still
>>  be slow even with your suggestion.
> 
> Yes, it can! That's the point of the original idea!
> 
> The original patch [2] optimizes lists, because it was easy to do.
> But nothing stops you from optimizing other (two?) types.

It's not two other types, it's every type anyone ever has built or will build that doesn't have a fast __iadd__. If I create a new type, I can very easily make it addable, iterable, displayable, pickleable, etc., with simple Python code. But I can't make it summable without patching the builtin C function and recompiling the interpreter. So, as Steven said, sum cannot always be fast.

> As for linked lists, the point of linked
> list is to insert items fast. So any decent implementation of it
> should store a pointer to its head and tail, should implement a O(1)
> __iadd__ using tail pointer, and thus falls under my first patch.
> There's not much sence in [single] linked list if it has no __iadd__.

Have you never heard of Lisp or any of its successors?

A cons is a single-linked list with no tail pointer and an O(N) __iadd__. The node type is the list type; it's just a head value and a next pointer to another list. This data structure various kinds of O(1) operations (including mutating operations) that other types do not. For a trivial example:

    >>> a = linkedlist([0, 1, 2])

    >>> b = a.next

    >>> b.next = linkedlist([10, 20])

    >>> a
    linkedlist([0, 1, 10, 20])

There is no way to make this work is a holds a tail pointer. The trivial design would leave a pointing at the 2, which isn't even a member of a. Making a.next be a copying operation gives the wrong answer (and, more generally, breaks every mutating operation on cons-lists) and makes almost everything O(N) instead of O(1). Storing head and tail pointers at each node makes every mutating operation at least O(N) instead of O(1), and also means that two different lists can't share the same tail, which breaks most mutating tree operations. Storing a stack of lists that refer to each node makes everything O(N^2). Integrating that stack into the list means you have a double-linked list instead of a single-linked list.

Not having a tail pointer is inherent in the cons-list data structure, and all the algorithms built for that type depend on it.

>>  Yes. If Python used & for concatenation, we wouldn't have to worry

>>  about sum(lists or strings or tuples) being quadratic, because people
>>  wouldn't call sum on lists, strings or tuples.
> 
> Heh. If Python had no sum() we wouldn't have to worry about people
> using it.

Well, there's always the Java-ish solution of adding a Summable ABC and having it only work on matching values (or just using numbers.Number).

But I don't think it's a problem that there are types that are addable and aren't (fast-)summable, or that this isn't expressed directly in the language. Python doesn't avoid providing tools that are useful in their obvious uses, even if they can be harmful in silly uses. In a case like this, you don't ban the screwdriver, or pile weight onto it so it's also a good hammer; you just try to make the hammer easier to spot so people don't pick up the screwdriver.

If people really do need to concatenate a bunch of tuples this often, maybe the answer is to move chain to builtins. If it's not important enough to move chain to builtins, it's probably not important enough to do anything else.

 If Python had no lists we wouldn't have to worry about
> people concatenating them. If there was no Python we wouldn't have
> to worry at all. But the world would be poor without all these great
> things...
> 
> Seriously, I miss add for set(). I needed it when I had a dictionary
> like {x:set(...), ...} and needed a set of all the values from it.
> I wanted to use sum(dict.values()), that would be easy and obvious,
> but I couldn't, because set() does not support __add__. So I had
> to write a few lines of loop instead of a few characters. :(

So what you really want is to be able to combine things using any arbitrary operation, instead of just addition? 

That's the point I was making in my earlier email: What you want is reduce. It's just like sum, but you specify the operation, and you can start with next(iterator) instead of having to specify a start value.

Why is sum a builtin, while reduce isn't? Sum is usable in a pretty restricted set of cases, and in most of those cases, it's the obvious way to do it. Reduce is usable in a much broader set of cases, but in most of those cases, there's a better way to do it. Broadening the usability of sum to make it more like reduce is not a positive change.