
On Jun 1, 2019, at 4:17 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
Hi all,
I have an idea that I've discussed with a few people in person, and the feedback has generally been positive. So I'd like to bring it up here, to get a sense of if this is going to fly. Note that this is NOT a proposal at this point.
Idea in five words: define a NumPy API standard
As an amateur user of Numpy (hobby programming), and at the opposite end of the spectrum from the Numpy development team, I’d like to raise my hand and applaud this idea. I think it would make my use of Numpy significantly easier if an API not only specified the basic API structure, but also regularized it to the extent possible. Thanks, Bill Wing
Observations ------------ - Many libraries, both in Python and other languages, have APIs copied from or inspired by NumPy. - All of those APIs are incomplete, and many deviate from NumPy either by accident or on purpose. - The NumPy API is very large and ill-defined.
Libraries with a NumPy-like API ------------------------------- In Python: - GPU: Tensorflow, PyTorch, CuPy, MXNet - distributed: Dask - sparse: pydata/sparse - other: tensorly, uarray/unumpy, ...
In other languages: - JavaScript: numjs - Go: Gonum - Rust: rust-ndarray, rust-numpy - C++: xtensor - C: XND - Java: ND4J - C#: NumSharp, numpy.net <http://numpy.net/> - Ruby: Narray, xnd-ruby - R: Rray
This is an incomplete list. Xtensor and XND aim for multi-language support. These libraries are of varying completeness, size and quality - everything from one-person efforts that have just started, to large code bases that go beyond NumPy in features or performance.
Idea ---- Define a standard for "the NumPy API" (or "NumPy core API", or .... - it's just a name for now), that other libraries can use as a guide on what to implement and when to say they are NumPy compatible.
In scope: - Define a NumPy API standard, containing an N-dimensional array object and a set of functions. - List of functions and ndarray methods to include. - Recommendations about where to deviate from NumPy (e.g. leave out array scalars)
Out of scope, or to be treated separately: - dtypes and casting - (g)ufuncs - function behavior (e.g. returning views vs. copies, which keyword arguments to include) - indexing behavior - submodules (fft, random, linalg)
Who cares and why? - Library authors: this saves them work and helps them make decisions. - End users: consistency between libraries/languages, helps transfer knowledge and understand code - NumPy developers: gives them a vocabulary for "the NumPy API", "compatible with NumPy", etc.
Risks: - If not done well, we just add to the confusion rather than make things better. - Opportunity for endless amount of bikeshedding - ?
Some more rationale: We (NumPy devs) mostly have a shared understanding of what is "core NumPy functionality", what we'd like to remove but are stuck with, what's not used a whole lot, etc. Examples: financial functions don't belong, array creation methods with weird names like np.r_ were a mistake. We are not communicating this in any way though. Doing so would be helpful. Perhaps this API standard could even have layers, to indicate what's really core, what are secondary sets of functionality to include in other libraries, etc.
Discussion and next steps ------------------------- What I'd like to get a sense of is: - Is this a good idea to begin with? - What should the scope be? - What should the format be (a NEP, some other doc, defining in code)?
If this idea is well-received, I can try to draft a proposal during the next month (help/volunteers welcome!). It can then be discussed at SciPy'19 - high-bandwidth communication may help to get a set of people on the same page and hash out a lot of details.
Thoughts?
Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion