[Python-ideas] Fast sum() for non-numbers

Andrew Barnert abarnert at yahoo.com
Fri Jul 5 12:34:37 CEST 2013


From: Sergey <sergemp at mail.ru>

Sent: Thursday, July 4, 2013 11:43 PM


> On Jul 4, 2013 Andrew Barnert wrote:
> 
>>  Right on some points, but you're making unwarranted assumptions
>>  for these two:
>>>  * sum() can *always* be fast! (patch and tests)
>>>  * linked list is O(n) where n is number of lists to add
> 
>>  It's not two other types, it's every type anyone ever has built or
>>  will build that doesn't have a fast __iadd__. If I create a new
>>  type, I can very easily make it addable, iterable, displayable,
>>  pickleable, etc., with simple Python code. But I can't make it
>>  summable without patching the builtin C function and recompiling
>>  the interpreter.
> 
> Yes, you can! You just need to implement a fast __iadd__ or __add__
> for your type to make it O(n) summable.

Well, in Python 3.3, or even 2.3, you just need to implement a fast __add__. So, if that's an acceptable answer, then we don't need to do anything.

The reason it's not acceptable to you is that sometimes, you can't implement a fast __add__. But sometimes, you can't implement a fast __iadd__, either.

> And you can always do that, can't you?


No. You can't implement a fast __iadd__ for tuple.

In fact, there is no single operation you could implement for tuple. To make it fast-summable, you had to modify sum so that it converts to list, extends, and converts back to tuple at the end. If that's going to be extendable to other types, you need to give them control over some external context—e.g., a way to specify how to set up the start of a sum operation before the first __iadd__ and how to conclude it after the last __iadd__.

> If that is not obvious we can make it explicit in sum() description.
> E.g.:
[…]
>   To make advantage of faster __iadd__ implementation sum() uses it
>   if possible. sum() has linear time complexity for built-in types
>   and all types providing constant-time `__iadd__` or `__add__`.

Note that integer addition isn't actually constant, it's linear on the word size of the integers. Which means sum isn't linear for integers. So, this is misleading right off the bat. Currently, because we don't try to specify anything except to imply that sum is an appropriately fast way to sum numbers, there's no such problem.

>>  A cons is a single-linked list with no tail pointer and an O(N)
>>  __iadd__. The node type is the list type; it's just a head value and
>>  a next pointer to another list. 

> Personally I don't think such implementation would be very useful.

Given that nearly every program every written in Lisp and its successors makes heavy use of such an implementation, I submit that your intuition may be wrong.

> (It's certainly unusual for python's “batteries included” philosophy).

What does "batteries included" have to do with excluding data types?

> What I would call a linked-list is a separate type where your nodes
> are just its internal representation.

If you make the lists and nodes separate types, and the nodes private, you have to create yet a third type, like the list::iterator type in C++, which either holds a reference to a node plus a reference to a list, or can be passed to a list in the same way indices are passed to an array. Without that, you can't do anything useful with lists at all, because any operation is O(N). Just as arrays use indices and hash tables use keys, linked lists use node references.

>>  There is no way to make this work is a holds a tail pointer.

> 
> There is. You just need to separate the nodes from the list object,
> and turn nodes into internal details, invisible outside.
> 
> So that your sample would turn into:
>     >>> a = linkedlist([0, 1, 2])
>     >>> a.replacefrom(2, linkedlist([10, 20]))
>     >>> a
>     linkedlist([0, 1, 10, 20])
> 
> Or maybe even:
>     >>> a = linkedlist([0, 1, 2])
>     >>> b = a[1:]
>     >>> b[1:] = linkedlist([10, 20])

>     >>> a
>     linkedlist([0, 1, 2])
>     >>> b
>     linkedlist([1, 10, 20])
> 
> I don't like the idea that `a` implicitly changes when I change `b`.

Then you don't like linked lists. Saying "There is a way to make it work, just make it do something different, which is O(N) instead of O(1) and won't work with the algorithms you want to use" is not an answer.

You don't have to use linked lists—there's a reason they're not built in to Python, after all. But if you want to talk about linked lists, you have to talk about linked lists, not some similar data type you've invented that has most of the disadvantages of linked lists and none of the advantages, and that no one will ever use.


> And yes, b=a[1:] would be O(N) since it would have to copy all the

> elements to make sure that change to one list won't affect another
> one. But that's my vision.

Even b=a[n] is O(N) on linked lists, even without copying. Which is exactly why linked lists provide head and next instead of indexing and slicing. Trying to use linked lists as if they were arrays is as silly as the other way around.

>>  Making a.next be a copying operation gives the wrong answer (and,
>>  more generally, breaks every mutating operation on cons-lists) and
>>  makes almost everything O(N) instead of O(1).
> 
> Yes, you're right, this turns copy into O(N). But you can still
> keep other operations fast.

No, you can make one operation fast, at the expense of making every other operation slow. That's not a good tradeoff. It's like giving a tree O(1) searches by attaching a hash table that maps keys to node pointers, and then claiming that this is an improvement, and who cares that inserts are now O(N) instead of O(log N).

And again, you've also ignored the fact that, performance aside, it _breaks every algorithm people use mutable linked lists for_.

> But you CAN have a fast __iadd__ even for your simple a.next case!
> You only need to initialize `tail` before calling sum() and update
> it inside __iadd__

Initialize tail where exactly? In a global variable?


>>  If people really do need to concatenate a bunch of tuples this

>>  often, maybe the answer is to move chain to builtins.
> 
> Yes, I understand that we can restrict sum(), for example it can
> throw an error for non-numbers as it currently does for strings.
> What I can't understand is why we should restrict it if instead
> we can make it fast?

How is that even a response to my point? I suggest moving chain to builtins, and you ask why we should add restrictions to sum?



More information about the Python-ideas mailing list