[Numpy-discussion] Does a `mergesorted` function make sense?

Eelco Hoogendoorn hoogendoorn.eelco at gmail.com
Mon Sep 1 09:58:57 EDT 2014


On Mon, Sep 1, 2014 at 2:05 PM, Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
>
>
> On Mon, Sep 1, 2014 at 1:49 AM, Eelco Hoogendoorn <
> hoogendoorn.eelco at gmail.com> wrote:
>
>> Sure, id like to do the hashing things out, but I would also like some
>> preliminary feedback as to whether this is going in a direction anyone else
>> sees the point of, if it conflicts with other plans, and indeed if we can
>> agree that numpy is the right place for it; a point which I would very much
>> like to defend. If there is some obvious no-go that im missing, I can do
>> without the drudgery of writing proper documentation ;).
>>
>> As for whether this belongs in numpy: yes, I would say so. There are the
>> extension of functionality to functions already in numpy, which are a
>> no-brainer (it need not cost anything performance wise, and ive needed
>> unique graph edges many many times), and there is the grouping
>> functionality, which is the main novelty.
>>
>> However, note that the grouping functionality itself is a very small
>> addition, just a few 100 lines of pure python, given that the indexing
>> logic has been factored out of the classic arraysetops. At least from a
>> developers perspective, it very much feels like a logical extension of the
>> same 'thing'.
>>
>> But also from a conceptual numpy perspective, grouping is really more an
>> 'elementary manipulation of an ndarray' than a 'special purpose algorithm'.
>> It is useful for literally all kinds of programming; hence there is similar
>> functionality in the python standard library (itertools.groupby); so why
>> not have an efficient vectorized equivalent in numpy? It belongs there more
>> than the linalg module, arguably.
>>
>> Also, from a community perspective, a significant fraction of all
>> stackoverflow numpy questions are (unknowingly) exactly about 'how to do
>> grouping in numpy'.
>>
>
> What I'm trying to say is that numpy is a community project. We don't have
> a central planning committee, the only difference between "developers" and
> everyone else is activity and commit rights. Which is to say if you develop
> and push this topic it is likely to go in. There certainly seems to be
> interest in this functionality. The reason that I brought up scipy is that
> there are some graph algorithms there that went in a couple of years ago.
>
> Note that the convention on the list is bottom posting.
>
> <snip>
>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

I understand that numpy is a community project, so that the decision isn't
up to any one particular person; but some early stage feedback from those
active in the community would be welcome. I am generally confident that
this addition makes sense, but I have not contributed to numpy before,
and you don't know what you don't know and all... given that there are
multiple suggestions for changing arraysetops, some coordination would be
useful I think.

Note that I use graph edges merely as an example; the proposed
functionality is much more general than graphing algorithms specifically.
The radial reduction
<https://github.com/EelcoHoogendoorn/Numpy_arraysetops_EP/blob/master/examples.py>example
I included on github is particularly illustrative of the general utility of
grouping functionality I think. Operations like radial reductions are
rather common, and a custom implementation is quite lengthy, very bug
prone, and potentially very slow.

Thanks for the heads up on posting convention; ive always let gmail do my
thinking for me, which works well enough for me, but I can see how not
following this convention is annoying to others.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140901/245e3a23/attachment.html>


More information about the NumPy-Discussion mailing list