[Numpy-discussion] Does a `mergesorted` function make sense?

Eelco Hoogendoorn hoogendoorn.eelco at gmail.com
Mon Sep 1 03:49:50 EDT 2014


Sure, id like to do the hashing things out, but I would also like some
preliminary feedback as to whether this is going in a direction anyone else
sees the point of, if it conflicts with other plans, and indeed if we can
agree that numpy is the right place for it; a point which I would very much
like to defend. If there is some obvious no-go that im missing, I can do
without the drudgery of writing proper documentation ;).

As for whether this belongs in numpy: yes, I would say so. There are the
extension of functionality to functions already in numpy, which are a
no-brainer (it need not cost anything performance wise, and ive needed
unique graph edges many many times), and there is the grouping
functionality, which is the main novelty.

However, note that the grouping functionality itself is a very small
addition, just a few 100 lines of pure python, given that the indexing
logic has been factored out of the classic arraysetops. At least from a
developers perspective, it very much feels like a logical extension of the
same 'thing'.

But also from a conceptual numpy perspective, grouping is really more an
'elementary manipulation of an ndarray' than a 'special purpose algorithm'.
It is useful for literally all kinds of programming; hence there is similar
functionality in the python standard library (itertools.groupby); so why
not have an efficient vectorized equivalent in numpy? It belongs there more
than the linalg module, arguably.

Also, from a community perspective, a significant fraction of all
stackoverflow numpy questions are (unknowingly) exactly about 'how to do
grouping in numpy'.


On Mon, Sep 1, 2014 at 4:36 AM, Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
>
>
> On Sun, Aug 31, 2014 at 1:48 PM, Eelco Hoogendoorn <
> hoogendoorn.eelco at gmail.com> wrote:
>
>> Ive organized all code I had relating to this subject in a github
>> repository <https://github.com/EelcoHoogendoorn/Numpy_arraysetops_EP>.
>> That should facilitate shooting around ideas. Ive also added more
>> documentation and structure to make it easier to see what is going on.
>>
>> Hopefully we can converge on a common vision, and then improve the
>> documentation and testing to make it worthy of including in the numpy
>> master.
>>
>> Note that there is also a complete rewrite of the classic
>> numpy.arraysetops, such that they are also generalized to more complex
>> input, such as finding unique graph edges, and so on.
>>
>> You mentioned getting the numpy core developers involved; are they not
>> subscribed to this mailing list? I wouldn't be surprised; youd hope there
>> is a channel of discussion concerning development with higher signal to
>> noise....
>>
>>
> There are only about 2.5 of us at the moment. Those for whom this is an
> itch that need scratching should hash things out and make a PR. The main
> question for me is if it belongs in numpy, scipy, or somewhere else.
>
> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140901/9eaa5002/attachment.html>


More information about the NumPy-Discussion mailing list