[Python-ideas] Fast sum() for non-numbers

Andrew Barnert abarnert at yahoo.com
Tue Jul 9 07:04:36 CEST 2013


On Jul 8, 2013, at 18:25, Haoyi Li <haoyi.sg at gmail.com> wrote:

> <rant>I would also really like a nice object-oriented sortedlist to be part of the standard library, instead of that nasty C-style heapq stuff.</rant>

I agree.

heapq isn't really a sorted list because it's neither index-accessible nor key-accessible. And bisect isn't really a sorted list because everything but lookup is O(N). And both of them have C-style APIs to boot.

AFAIK, two different implementations of sorted collections have been offered for the stdlib, and both were rejected because they tried to offer the wrong thing: blist was sold as a replacement for the existing list and tuple, which also happens to give you sorted list/dict/set types for free, while the other one (I forget the name) was offered as a standard binary tree library, which also happens to give you sorted list/dict/set types for free.

I would be very happy to get sorted list/dict/set types added to the stdlib, ideally using the blist interface plus the views and key-slicing stuff from bintrees. And I don't care which implementation we get, as long as it's logarithmic, and there's a fast native accelerator for CPython alongside pure python code that works well in PyPy. 

There are multiple packages on PyPI that are 90% of the way there. I think it's just a matter of picking one, and either rallying the author to make the changes and commit to maintenance, or taking it over and committing to maintenance yourself. Either way, if you pull it off, I'll be your best friend forever and ever. (Your macro library is very cool, but a collections.sortedlist, I would use every day of my life, which makes it even cooler.)

> I'd still rather we make use of the nice new generic dispatch stuff to make our builtins re-usable, rather than having a set of crufty only-really-works-on-builtins functions cluttering the *global* namespaces, and having to manually pick from *another* bunch of non-generic custom functions to do the same (conceptual) thing to different types.

The point of using chain instead of sum to concatenate sequences or iterators is that it automatically works for every kind of iterable you can imagine, so it doesn't need any type-switching or generics or anything of the sort.

Also, generic dispatch wouldn't really help here, as we need to dispatch on the type(s) that the iterable yields, not the type of the iterable itself.

> I don't think that's going to happen anytime before Python 4.0 though. This sort of "yeah, function_x() doesn't work on y, you have to use special_function_z() to concat ys and thingy_z() to concat ws" is workable, but reminds me of my bad-old php days.

That's exactly why I don't like expanding sum. At best, it can be the obvious way to concatenate mutable sequences with fast __iadd__ plus some builtin immutable sequences but not others... that's a mess. A chain/flatten/concat/whatever function can be (and already is, with two wasted lines) the obvious way to concatenate any iterables, no matter what. Meanwhile, sum is the obvious way to sum things that are obviously summable (numbers, matrices, etc.), and nothing else.

> 
> -Haoyi
> 
> 
> 
> On Tue, Jul 9, 2013 at 9:02 AM, Andrew Barnert <abarnert at yahoo.com> wrote:
>> From: Haoyi Li <haoyi.sg at gmail.com>
>> Sent: Monday, July 8, 2013 5:39 PM
>> 
>> 
>> > I for one find it annoying that i have to write a verbose long thingy every time i need to flatten lists
>> 
>> What verbose long thingy? You just need to write:
>> 
>>     flatten = itertools.chain.from_iterable
>> 
>> Or, if you use more-itertools:
>> 
>>     from more_itertools import flatten
>> 
>> Someone suggested moving this to builtins as "chain" or "flatten" or similar. I'm at least +0 on this. 
>> 
>> At any rate, it's about as simple as can be. 
>> 
>> Because it gives you an iterable, you don't pay the cost (in time or space) of building a list unless you need to, which means in many use cases it's actually much faster than sum, at the cost of being a little bit slower for lists when you do need a list.
>> 
>> 
>> It also flattens any iterable of sequences, or even any iterable of iterables, in the same time—linear in the total size. With no custom optimizations, it works with tuples, blist.sortedlists, cons lists, or _anything else you can throw at it_.
>> 
>> And it's obvious what it does. If you sum three sortedlists, do you get the first list's elements, then the second, then the third, or a merge of the three? I don't know. If you chain or flatten three sortedlists, it's obvious which one you get.
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130708/1f60a68e/attachment-0001.html>


More information about the Python-ideas mailing list