On Sat, Jun 1, 2019 at 6:12 PM Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
Hi Ralf,

Despite sharing Nathaniel's doubts about the ease of defining the numpy API and the likelihood of people actually sticking to a limited subset of what numpy exposes, I quite like the actual things you propose to do!

But my liking it is for reasons that are different from your stated ones: I think the proposed actions are likely to benefit greatly  both for users (like Bill above) and current and prospective developers.  To me, it seems almost as a side benefit (if a very nice one) that it might help other projects to share an API; a larger benefit may come from tapping into the experience of other projects in thinking about what are the true  basic functions/method that one should have.

Agreed, there is some reverse learning there as well. Projects like Dask and Xtensor already went through making these choices, which can teach us as NumPy developers some lessons.


More concretely, to address Nathaniel's (very reasonable) worry about ending up wasting a lot of time, I think it may be good to identify smaller parts, each of which are useful on their own.

In this respect, I think an excellent place to start might be something you are planning already anyway: update the user documentation. Doing this will necessarily require thinking about, e.g., what `ndarray` methods and properties are actually fundamental, as you only want to focus on a few. With that in place, one could then, as you suggest, reorganize the reference documentation to put those most important properties up front, and ones that we really think are mistakes at the bottom, with explanations of why we think so and what the alternative is. Also for the reference documentation, it would help to group functions more logically.

That perhaps another rationale for doing this. The docs are likely to get a fairly major overhaul this year. If we don't write down a coherent plan then we're just going to make very similar decisions as when we'd write up a "standard", just ad hoc and with much less review.


The above could lead to three next steps, all of which I think would be useful. First, for (prospective) developers as well as for future maintenance, I think it would be quite a large benefit if we (slowly but surely) rewrote code that implements the less basic functionality in terms of more basic functions (e.g., replace use of `array.fill(...)` or `np.copyto(array, ...)` with `array[...] =`).

That could indeed be nice. I think Travis referred to this as defining an "RNumPy" (similar to RPython as a subset of Python).


Second, we could update Nathaniel's NEP about distinct properties duck arrays might want to mimic/implement.

I wasn't thinking about that indeed, but agreed that it could be helpful.


Third, we could actual implementing the logical groupings identified in the code base (and describing them!). Currently, it is a mess: for the C files, I typically have to grep to even find where things are done, and while for the functions defined in python files that is not necessary, many have historical rather than logical groupings (looking at you, `from_numeric`!), and even more descriptive ones like `shape_base` are split over `lib` and `core`. I think it would help everybody if we went to a python-like layout, with a true core and libraries such as polynomial, fft, ma, etc.

I'd really like this. Also to have sane namespace in numpy, and a basis for putting something in numpy.lib vs the main namespace vs some other namespace (there are a couple of semi-public ones).


Anyway, re-reading your message, I realize the above is not really what you wrote about, so perhaps this is irrelevant...

Not irrelevant, I think you're making some good points.

Cheers,
Ralf