[Numpy-discussion] defining a NumPy API standard?

Nathaniel Smith njs at pobox.com
Sat Jun 1 05:32:10 EDT 2019


It's possible I'm not getting what you're thinking, but from what you
describe in your email I think it's a bad idea.

Standards take a tremendous amount of work (no really, an absurdly
massively huge amount of work, more than you can imagine if you
haven't done it). And they don't do what people usually hope they do.
Many many standards are written all the time that have zero effect on
reality, and the effort is wasted. They're really only useful when you
have to solve a coordination problem: lots of people want to do the
same thing as each other, whatever that is, but no-one knows what the
thing should be. That's not a problem at all for us, because numpy
already exists.

If you want to improve compatibility between Python libraries, then I
don't think it will be relevant. Users aren't writing code against
"the numpy standard", they're not testing their libraries against "the
numpy standard", they're using/testing against numpy. If library
authors want to be compatible with numpy, they need to match what
numpy does, not what some document says. OTOH if they think they have
a better idea and its worth breaking compatibility, they're going to
do it regardless of what some document somewhere says.

If you want to share the lessons learned from numpy in the hopes of
improving future libraries that don't care about numpy compatibility
per se, in python or other languages, then that seems like a great
idea! But that's not a standard, that's a journal article called
something like "NumPy: A retrospective". Other languages aren't going
to match numpy one-to-one anyway, because they'll be adapting things
to their language's idioms; they certainly don't care about whether
you decided 'newaxis MUST be defined to None' or merely 'SHOULD' be
defined to None.

IMO if you try the most likely outcome will be that it will suck up a
lot of energy writing it, and then the only effect is that everyone
will keep doing what they would have done anyway but now with extra
self-righteousness and yelling depending on whether that turns out to
match the standard or not.

-n

On Sat, Jun 1, 2019 at 1:18 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
> Hi all,
>
> I have an idea that I've discussed with a few people in person, and the feedback has generally been positive. So I'd like to bring it up here, to get a sense of if this is going to fly. Note that this is NOT a proposal at this point.
>
> Idea in five words: define a NumPy API standard
>
> Observations
> ------------
> - Many libraries, both in Python and other languages, have APIs copied from or inspired by NumPy.
> - All of those APIs are incomplete, and many deviate from NumPy either by accident or on purpose.
> - The NumPy API is very large and ill-defined.
>
> Libraries with a NumPy-like API
> -------------------------------
> In Python:
> - GPU: Tensorflow, PyTorch, CuPy, MXNet
> - distributed: Dask
> - sparse: pydata/sparse
> - other: tensorly, uarray/unumpy, ...
>
> In other languages:
> - JavaScript: numjs
> - Go: Gonum
> - Rust: rust-ndarray, rust-numpy
> - C++: xtensor
> - C: XND
> - Java: ND4J
> - C#: NumSharp, numpy.net
> - Ruby: Narray, xnd-ruby
> - R: Rray
>
> This is an incomplete list. Xtensor and XND aim for multi-language support. These libraries are of varying completeness, size and quality - everything from one-person efforts that have just started, to large code bases that go beyond NumPy in features or performance.
>
> Idea
> ----
> Define a standard for "the NumPy API" (or "NumPy core API", or .... - it's just a name for now), that
> other libraries can use as a guide on what to implement and when to say they are NumPy compatible.
>
> In scope:
> - Define a NumPy API standard, containing an N-dimensional array object and a set of functions.
> - List of functions and ndarray methods to include.
> - Recommendations about where to deviate from NumPy (e.g. leave out array scalars)
>
> Out of scope, or to be treated separately:
> - dtypes and casting
> - (g)ufuncs
> - function behavior (e.g. returning views vs. copies, which keyword arguments to include)
> - indexing behavior
> - submodules (fft, random, linalg)
>
> Who cares and why?
> - Library authors: this saves them work and helps them make decisions.
> - End users: consistency between libraries/languages, helps transfer knowledge and understand code
> - NumPy developers: gives them a vocabulary for "the NumPy API", "compatible with NumPy", etc.
>
> Risks:
> - If not done well, we just add to the confusion rather than make things better.
> - Opportunity for endless amount of bikeshedding
> - ?
>
> Some more rationale:
> We (NumPy devs) mostly have a shared understanding of what is "core NumPy functionality", what we'd like to remove but are stuck with, what's not used a whole lot, etc. Examples: financial functions don't belong, array creation methods with weird names like np.r_ were a mistake. We are not communicating this in any way though. Doing so would be helpful. Perhaps this API standard could even have layers, to indicate what's really core, what are secondary sets of functionality to include in other libraries, etc.
>
> Discussion and next steps
> -------------------------
> What I'd like to get a sense of is:
> - Is this a good idea to begin with?
> - What should the scope be?
> - What should the format be (a NEP, some other doc, defining in code)?
>
> If this idea is well-received, I can try to draft a proposal during the next month (help/volunteers welcome!). It can then be discussed at SciPy'19 - high-bandwidth communication may help to get a set of people on the same page and hash out a lot of details.
>
> Thoughts?
>
> Ralf
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion



-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the NumPy-Discussion mailing list