[Python-ideas] Fast sum() for non-numbers

Steven D'Aprano steve at pearwood.info
Tue Jul 9 04:16:43 CEST 2013


On 09/07/13 10:39, Haoyi Li wrote:
>> Python programmers do currently write programs with linked lists.  And
> they use what ever is in the library they think is useful... the whole
> point of "batteries included".
>
> On the other hand, python programmers do currently write programs with
> normal lists, and sum(), and using sum() on lists would be useful. They're
> BOTH in the standard library too! In fact, this seems like a perfect
> argument for having sum() work on lists!

sum() does work on lists.

The fact that sum(lists) has had quadratic performance since sum was first introduced in Python 2.3, and I've *never* seen anyone complain about it being slow, suggests very strongly that this is not a use-case that matters.

I would bet that people simply do not sum large numbers of lists.


> The objection seems like optimizing for a hypothetical, potential use case
> far in the future at the expense of a concrete use case now. "if you can't
> make it do EVERYTHING, then we shouldn't make it do ANYTHING or people will
> get confused!" (paraphrased and exaggerated for theatrical effect).

No no no. The objection is that complicating the implementation of a function in order to optimize a use-case that doesn't come up in real-world use is actually harmful. Maintaining sum will be harder, for the sake of some benefit that very possibly nobody will actually receive.

I don't care that sum() is O(N**2) on strings, linked lists, tuples, lists. I don't think we should care. Sergey thinks we should care, and is willing to change the semantics of sum AND include as many special cases as needed in order to "guarantee" that sum will be "always fast". I don't believe that guarantee can possibly hold, and I'm dubious about the change in semantics and what I see as the standard library making misleading performance guarantees.


> This is all on the assumption that you consider flattening list-of-lists a
> concrete use case. I for one find it annoying that i have to write a
> verbose long thingy every time i need to flatten lists, and I have probably
> needed to flatten lists about a dozen times in the last 3 months and used
> linked lists not-at-all.


Flattening sequences is not sum. You have to consider the question, what counts as an atomic type? Should you recursively flatten sub-sub-lists? What to do with lists that contain a reference to themselves? sum() is correspondingly trivial, you just add the items, end of story. Writing a general purpose flattener is hard, and horribly easy to either under- or over-engineer. I have in my private software collection about half a dozen half-finished flatten() utility functions, because periodically I think it "might be useful", but because I've never actually needed the damn thing, I'm never motivated to complete it.

Non-recursively flattening a list of lists is trivial:

flattened = []
for sublist in lists:
      flattened.extend(sublist)

Three short lines of code. You could write that a dozen times a day, every day for a year, and not come close to the amount of discussion we've had on this sum() thread :-)



-- 
Steven


More information about the Python-ideas mailing list