[Numpy-discussion] defining a NumPy API standard?

Sat Jun 1 04:17:30 EDT 2019

Hi all,

I have an idea that I've discussed with a few people in person, and the
feedback has generally been positive. So I'd like to bring it up here, to
get a sense of if this is going to fly. Note that this is NOT a proposal at
this point.

Idea in five words: define a NumPy API standard

Observations
------------
- Many libraries, both in Python and other languages, have APIs copied from
or inspired by NumPy.
- All of those APIs are incomplete, and many deviate from NumPy either by
accident or on purpose.
- The NumPy API is very large and ill-defined.

Libraries with a NumPy-like API
-------------------------------
In Python:
- GPU: Tensorflow, PyTorch, CuPy, MXNet
- distributed: Dask
- sparse: pydata/sparse
- other: tensorly, uarray/unumpy, ...

In other languages:
- JavaScript: numjs
- Go: Gonum
- Rust: rust-ndarray, rust-numpy
- C++: xtensor
- C: XND
- Java: ND4J
- C#: NumSharp, numpy.net
- Ruby: Narray, xnd-ruby
- R: Rray

This is an incomplete list. Xtensor and XND aim for multi-language support.
These libraries are of varying completeness, size and quality - everything
from one-person efforts that have just started, to large code bases that go
beyond NumPy in features or performance.

Idea
----
Define a standard for "the NumPy API" (or "NumPy core API", or .... - it's
just a name for now), that
other libraries can use as a guide on what to implement and when to say
they are NumPy compatible.

In scope:
- Define a NumPy API standard, containing an N-dimensional array object and
a set of functions.
- List of functions and ndarray methods to include.
- Recommendations about where to deviate from NumPy (e.g. leave out array
scalars)

Out of scope, or to be treated separately:
- dtypes and casting
- (g)ufuncs
- function behavior (e.g. returning views vs. copies, which keyword
arguments to include)
- indexing behavior
- submodules (fft, random, linalg)

Who cares and why?
- Library authors: this saves them work and helps them make decisions.
- End users: consistency between libraries/languages, helps transfer
knowledge and understand code
- NumPy developers: gives them a vocabulary for "the NumPy API",
"compatible with NumPy", etc.

Risks:
- If not done well, we just add to the confusion rather than make things
better.
- Opportunity for endless amount of bikeshedding
- ?

Some more rationale:
We (NumPy devs) mostly have a shared understanding of what is "core NumPy
functionality", what we'd like to remove but are stuck with, what's not
used a whole lot, etc. Examples: financial functions don't belong, array
creation methods with weird names like np.r_ were a mistake. We are not
communicating this in any way though. Doing so would be helpful. Perhaps
this API standard could even have layers, to indicate what's really core,
what are secondary sets of functionality to include in other libraries, etc.

Discussion and next steps
-------------------------
What I'd like to get a sense of is:
- Is this a good idea to begin with?
- What should the scope be?
- What should the format be (a NEP, some other doc, defining in code)?

If this idea is well-received, I can try to draft a proposal during the
next month (help/volunteers welcome!). It can then be discussed at SciPy'19
- high-bandwidth communication may help to get a set of people on the same
page and hash out a lot of details.

Thoughts?

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190601/0641464f/attachment.html>