On Mon, Sep 1, 2014 at 7:58 AM, Eelco Hoogendoorn < hoogendoorn.eelco@gmail.com> wrote:

On Mon, Sep 1, 2014 at 2:05 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:

On Mon, Sep 1, 2014 at 1:49 AM, Eelco Hoogendoorn < hoogendoorn.eelco@gmail.com> wrote:

Sure, id like to do the hashing things out, but I would also like some preliminary feedback as to whether this is going in a direction anyone else sees the point of, if it conflicts with other plans, and indeed if we can agree that numpy is the right place for it; a point which I would very much like to defend. If there is some obvious no-go that im missing, I can do without the drudgery of writing proper documentation ;).

As for whether this belongs in numpy: yes, I would say so. There are the extension of functionality to functions already in numpy, which are a no-brainer (it need not cost anything performance wise, and ive needed unique graph edges many many times), and there is the grouping functionality, which is the main novelty.

However, note that the grouping functionality itself is a very small addition, just a few 100 lines of pure python, given that the indexing logic has been factored out of the classic arraysetops. At least from a developers perspective, it very much feels like a logical extension of the same 'thing'.

But also from a conceptual numpy perspective, grouping is really more an 'elementary manipulation of an ndarray' than a 'special purpose algorithm'. It is useful for literally all kinds of programming; hence there is similar functionality in the python standard library (itertools.groupby); so why not have an efficient vectorized equivalent in numpy? It belongs there more than the linalg module, arguably.

Also, from a community perspective, a significant fraction of all stackoverflow numpy questions are (unknowingly) exactly about 'how to do grouping in numpy'.

What I'm trying to say is that numpy is a community project. We don't have a central planning committee, the only difference between "developers" and everyone else is activity and commit rights. Which is to say if you develop and push this topic it is likely to go in. There certainly seems to be interest in this functionality. The reason that I brought up scipy is that there are some graph algorithms there that went in a couple of years ago.

Note that the convention on the list is bottom posting.

<snip>

Chuck

NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

I understand that numpy is a community project, so that the decision isn't up to any one particular person; but some early stage feedback from those active in the community would be welcome. I am generally confident that this addition makes sense, but I have not contributed to numpy before, and you don't know what you don't know and all... given that there are multiple suggestions for changing arraysetops, some coordination would be useful I think.

Note that I use graph edges merely as an example; the proposed functionality is much more general than graphing algorithms specifically. The radial reduction https://github.com/EelcoHoogendoorn/Numpy_arraysetops_EP/blob/master/examples.pyexample I included on github is particularly illustrative of the general utility of grouping functionality I think. Operations like radial reductions are rather common, and a custom implementation is quite lengthy, very bug prone, and potentially very slow.

Thanks for the heads up on posting convention; ive always let gmail do my thinking for me, which works well enough for me, but I can see how not following this convention is annoying to others.

What do you think about the suggestion of timsort? One would need to concatenate the arrays before sorting, but it should be fairly efficient.

Chuck