On Sat, Jun 1, 2019 at 8:46 PM Matti Picus <matti.picus@gmail.com> wrote:
On 1/6/19 7:31 pm, Charles R Harris wrote:
> I generally agree with this. The most useful aspect of this exercise
> is likely to be clarifying NumPy for its own developers, and maybe
> offering a guide to future simplification. Trying to put something
> together that everyone agrees to as an official standard would be a
> big project and, as Nathaniel points out, would involve an enormous
> amount of work, much time, and doubtless many arguments.  What might
> be a less ambitious exercise would be identifying commonalities in the
> current numpy-like languages. That would have the advantage of
> feedback from actual user experience, and would be more like a lessons
> learned document that would be helpful to others.
>
>
>     More concretely, to address Nathaniel's (very reasonable) worry
>     about ending up wasting a lot of time, I think it may be good to
>     identify smaller parts, each of which are useful on their own.
>
>     In this respect, I think an excellent place to start might be
>     something you are planning already anyway: update the user
>     documentation
>

I would include tests as well. Rather than hammer out a full standard
based on extensive discussions and negotiations, I would suggest NumPy
might be able set a de-facto "standard" based on pieces of the the
current numpy user documentation and test suite.

I think this is potentially useful, but *far* more prescriptive and detailed than I had in mind. Both you and Nathaniel seem to have not understood what I mean by "out of scope", so I think that's my fault in not being explicit enough. I *do not* want to prescribe behavior. Instead, a simple yes/no for each function in numpy and method on ndarray.

Our API is huge. A simple count:
main namespace: 600
fft: 30
linalg: 30
random: 60
ndarray: 70
lib: 20
lib.npyio: 35
etc. (many more ill-thought out but not clearly private submodules)

Just the main namespace plus ndarray methods is close to 700 objects. If you want to build a NumPy-like thing, that's 700 decisions to make. I'm suggesting something as simple as a list of functions that constitute a sensible "core of NumPy". That list would not include anything in fft/linalg/random, since those can easily be separated out (indeed, if we could disappear fft and linalg and just rely on scipy, pyfftw etc., that would be great). It would not include financial functions. And so on. I guess we'd end up with most ndarray methods plus <150 functions.

That list could be used for many purposes: improve the docs, serve as the set of functions to implement in xnd.array, unumpy & co, Marten's suggestion of implementing other functions in terms of basic functions, etc.

Two other thoughts:
1. NumPy is not done. Our thinking on how to evolve the NumPy API is fairly muddled. When new functions are proposed, it's decided on on a case-by-case basis, usually without a guiding principle. We need to improve that. A "core of NumPy" list could be a part of that puzzle.
2. We often argue about deprecations. Deprecations are costly, but so is keeping around functions that are not very useful or have a poor design. This may offer a middle ground. Don't let others repeat our mistakes, signal to users that a function is of questionable value, without breaking already written code.

Cheers,
Ralf