[Numpy-discussion] defining a NumPy API standard?

Ralf Gommers ralf.gommers at gmail.com
Sat Jun 1 16:04:32 EDT 2019


On Sat, Jun 1, 2019 at 8:46 PM Matti Picus <matti.picus at gmail.com> wrote:

> On 1/6/19 7:31 pm, Charles R Harris wrote:
> > I generally agree with this. The most useful aspect of this exercise
> > is likely to be clarifying NumPy for its own developers, and maybe
> > offering a guide to future simplification. Trying to put something
> > together that everyone agrees to as an official standard would be a
> > big project and, as Nathaniel points out, would involve an enormous
> > amount of work, much time, and doubtless many arguments.  What might
> > be a less ambitious exercise would be identifying commonalities in the
> > current numpy-like languages. That would have the advantage of
> > feedback from actual user experience, and would be more like a lessons
> > learned document that would be helpful to others.
> >
> >
> >     More concretely, to address Nathaniel's (very reasonable) worry
> >     about ending up wasting a lot of time, I think it may be good to
> >     identify smaller parts, each of which are useful on their own.
> >
> >     In this respect, I think an excellent place to start might be
> >     something you are planning already anyway: update the user
> >     documentation
> >
>
> I would include tests as well. Rather than hammer out a full standard
> based on extensive discussions and negotiations, I would suggest NumPy
> might be able set a de-facto "standard" based on pieces of the the
> current numpy user documentation and test suite.


I think this is potentially useful, but *far* more prescriptive and
detailed than I had in mind. Both you and Nathaniel seem to have not
understood what I mean by "out of scope", so I think that's my fault in not
being explicit enough. I *do not* want to prescribe behavior. Instead, a
simple yes/no for each function in numpy and method on ndarray.

Our API is huge. A simple count:
main namespace: 600
fft: 30
linalg: 30
random: 60
ndarray: 70
lib: 20
lib.npyio: 35
etc. (many more ill-thought out but not clearly private submodules)

Just the main namespace plus ndarray methods is close to 700 objects. If
you want to build a NumPy-like thing, that's 700 decisions to make. I'm
suggesting something as simple as a list of functions that constitute a
sensible "core of NumPy". That list would not include anything in
fft/linalg/random, since those can easily be separated out (indeed, if we
could disappear fft and linalg and just rely on scipy, pyfftw etc., that
would be great). It would not include financial functions. And so on. I
guess we'd end up with most ndarray methods plus <150 functions.

That list could be used for many purposes: improve the docs, serve as the
set of functions to implement in xnd.array, unumpy & co, Marten's
suggestion of implementing other functions in terms of basic functions, etc.

Two other thoughts:
1. NumPy is not done. Our thinking on how to evolve the NumPy API is fairly
muddled. When new functions are proposed, it's decided on on a case-by-case
basis, usually without a guiding principle. We need to improve that. A
"core of NumPy" list could be a part of that puzzle.
2. We often argue about deprecations. Deprecations are costly, but so is
keeping around functions that are not very useful or have a poor design.
This may offer a middle ground. Don't let others repeat our mistakes,
signal to users that a function is of questionable value, without breaking
already written code.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190601/a8e81a50/attachment.html>


More information about the NumPy-Discussion mailing list