[Numpy-discussion] Does a `mergesorted` function make sense?
Charles R Harris
charlesr.harris at gmail.com
Tue Sep 2 20:40:09 EDT 2014
On Mon, Sep 1, 2014 at 7:58 AM, Eelco Hoogendoorn <
hoogendoorn.eelco at gmail.com> wrote:
> On Mon, Sep 1, 2014 at 2:05 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>>
>> On Mon, Sep 1, 2014 at 1:49 AM, Eelco Hoogendoorn <
>> hoogendoorn.eelco at gmail.com> wrote:
>>
>>> Sure, id like to do the hashing things out, but I would also like some
>>> preliminary feedback as to whether this is going in a direction anyone else
>>> sees the point of, if it conflicts with other plans, and indeed if we can
>>> agree that numpy is the right place for it; a point which I would very much
>>> like to defend. If there is some obvious no-go that im missing, I can do
>>> without the drudgery of writing proper documentation ;).
>>>
>>> As for whether this belongs in numpy: yes, I would say so. There are the
>>> extension of functionality to functions already in numpy, which are a
>>> no-brainer (it need not cost anything performance wise, and ive needed
>>> unique graph edges many many times), and there is the grouping
>>> functionality, which is the main novelty.
>>>
>>> However, note that the grouping functionality itself is a very small
>>> addition, just a few 100 lines of pure python, given that the indexing
>>> logic has been factored out of the classic arraysetops. At least from a
>>> developers perspective, it very much feels like a logical extension of the
>>> same 'thing'.
>>>
>>> But also from a conceptual numpy perspective, grouping is really more an
>>> 'elementary manipulation of an ndarray' than a 'special purpose algorithm'.
>>> It is useful for literally all kinds of programming; hence there is similar
>>> functionality in the python standard library (itertools.groupby); so why
>>> not have an efficient vectorized equivalent in numpy? It belongs there more
>>> than the linalg module, arguably.
>>>
>>> Also, from a community perspective, a significant fraction of all
>>> stackoverflow numpy questions are (unknowingly) exactly about 'how to do
>>> grouping in numpy'.
>>>
>>
>> What I'm trying to say is that numpy is a community project. We don't
>> have a central planning committee, the only difference between "developers"
>> and everyone else is activity and commit rights. Which is to say if you
>> develop and push this topic it is likely to go in. There certainly seems to
>> be interest in this functionality. The reason that I brought up scipy is
>> that there are some graph algorithms there that went in a couple of years
>> ago.
>>
>> Note that the convention on the list is bottom posting.
>>
>> <snip>
>>
>> Chuck
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> I understand that numpy is a community project, so that the decision isn't
> up to any one particular person; but some early stage feedback from those
> active in the community would be welcome. I am generally confident that
> this addition makes sense, but I have not contributed to numpy before,
> and you don't know what you don't know and all... given that there are
> multiple suggestions for changing arraysetops, some coordination would be
> useful I think.
>
> Note that I use graph edges merely as an example; the proposed
> functionality is much more general than graphing algorithms specifically.
> The radial reduction
> <https://github.com/EelcoHoogendoorn/Numpy_arraysetops_EP/blob/master/examples.py>example
> I included on github is particularly illustrative of the general utility of
> grouping functionality I think. Operations like radial reductions are
> rather common, and a custom implementation is quite lengthy, very bug
> prone, and potentially very slow.
>
> Thanks for the heads up on posting convention; ive always let gmail do my
> thinking for me, which works well enough for me, but I can see how not
> following this convention is annoying to others.
>
>
What do you think about the suggestion of timsort? One would need to
concatenate the arrays before sorting, but it should be fairly efficient.
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140902/2a2d2a2d/attachment.html>
More information about the NumPy-Discussion
mailing list