defining a NumPy API standard?
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
Hi all, I have an idea that I've discussed with a few people in person, and the feedback has generally been positive. So I'd like to bring it up here, to get a sense of if this is going to fly. Note that this is NOT a proposal at this point. Idea in five words: define a NumPy API standard Observations ------------ - Many libraries, both in Python and other languages, have APIs copied from or inspired by NumPy. - All of those APIs are incomplete, and many deviate from NumPy either by accident or on purpose. - The NumPy API is very large and ill-defined. Libraries with a NumPy-like API ------------------------------- In Python: - GPU: Tensorflow, PyTorch, CuPy, MXNet - distributed: Dask - sparse: pydata/sparse - other: tensorly, uarray/unumpy, ... In other languages: - JavaScript: numjs - Go: Gonum - Rust: rust-ndarray, rust-numpy - C++: xtensor - C: XND - Java: ND4J - C#: NumSharp, numpy.net - Ruby: Narray, xnd-ruby - R: Rray This is an incomplete list. Xtensor and XND aim for multi-language support. These libraries are of varying completeness, size and quality - everything from one-person efforts that have just started, to large code bases that go beyond NumPy in features or performance. Idea ---- Define a standard for "the NumPy API" (or "NumPy core API", or .... - it's just a name for now), that other libraries can use as a guide on what to implement and when to say they are NumPy compatible. In scope: - Define a NumPy API standard, containing an N-dimensional array object and a set of functions. - List of functions and ndarray methods to include. - Recommendations about where to deviate from NumPy (e.g. leave out array scalars) Out of scope, or to be treated separately: - dtypes and casting - (g)ufuncs - function behavior (e.g. returning views vs. copies, which keyword arguments to include) - indexing behavior - submodules (fft, random, linalg) Who cares and why? - Library authors: this saves them work and helps them make decisions. - End users: consistency between libraries/languages, helps transfer knowledge and understand code - NumPy developers: gives them a vocabulary for "the NumPy API", "compatible with NumPy", etc. Risks: - If not done well, we just add to the confusion rather than make things better. - Opportunity for endless amount of bikeshedding - ? Some more rationale: We (NumPy devs) mostly have a shared understanding of what is "core NumPy functionality", what we'd like to remove but are stuck with, what's not used a whole lot, etc. Examples: financial functions don't belong, array creation methods with weird names like np.r_ were a mistake. We are not communicating this in any way though. Doing so would be helpful. Perhaps this API standard could even have layers, to indicate what's really core, what are secondary sets of functionality to include in other libraries, etc. Discussion and next steps ------------------------- What I'd like to get a sense of is: - Is this a good idea to begin with? - What should the scope be? - What should the format be (a NEP, some other doc, defining in code)? If this idea is well-received, I can try to draft a proposal during the next month (help/volunteers welcome!). It can then be discussed at SciPy'19 - high-bandwidth communication may help to get a set of people on the same page and hash out a lot of details. Thoughts? Ralf
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
It's possible I'm not getting what you're thinking, but from what you describe in your email I think it's a bad idea. Standards take a tremendous amount of work (no really, an absurdly massively huge amount of work, more than you can imagine if you haven't done it). And they don't do what people usually hope they do. Many many standards are written all the time that have zero effect on reality, and the effort is wasted. They're really only useful when you have to solve a coordination problem: lots of people want to do the same thing as each other, whatever that is, but no-one knows what the thing should be. That's not a problem at all for us, because numpy already exists. If you want to improve compatibility between Python libraries, then I don't think it will be relevant. Users aren't writing code against "the numpy standard", they're not testing their libraries against "the numpy standard", they're using/testing against numpy. If library authors want to be compatible with numpy, they need to match what numpy does, not what some document says. OTOH if they think they have a better idea and its worth breaking compatibility, they're going to do it regardless of what some document somewhere says. If you want to share the lessons learned from numpy in the hopes of improving future libraries that don't care about numpy compatibility per se, in python or other languages, then that seems like a great idea! But that's not a standard, that's a journal article called something like "NumPy: A retrospective". Other languages aren't going to match numpy one-to-one anyway, because they'll be adapting things to their language's idioms; they certainly don't care about whether you decided 'newaxis MUST be defined to None' or merely 'SHOULD' be defined to None. IMO if you try the most likely outcome will be that it will suck up a lot of energy writing it, and then the only effect is that everyone will keep doing what they would have done anyway but now with extra self-righteousness and yelling depending on whether that turns out to match the standard or not. -n On Sat, Jun 1, 2019 at 1:18 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
-- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019 at 11:32 AM Nathaniel Smith <njs@pobox.com> wrote:
It's possible I'm not getting what you're thinking, but from what you describe in your email I think it's a bad idea.
Hi Nathaniel, I think you are indeed not getting what I meant and are just responding to the word "standard". I'll give a concrete example. Here is the xtensor to numpy comparison: https://xtensor.readthedocs.io/en/latest/numpy.html. The xtensor authors clearly have made sane choices, but they did have to spend a lot of effort making those choices - what to include and what not. Now, the XND team is just starting to build out their Python API. Hameer is building out unumpy. There's all the other arrays libraries I mentioned. We can say "sort it out yourself, make your own choices", or we can provide some guidance. So far the authors of those libaries I have asked say they would appreciate the guidance. Cheers, Ralf
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019, 09:13 Ralf Gommers <ralf.gommers@gmail.com> wrote:
Well, that's the word you chose :-) I think it's very possible that what you're thinking is a good idea, but it's actually something else, like better high-level documentation, or a NEP documenting things we wish we did differently but are stuck with, or a generic duck array test suite to improve compatibility and make it easier to bootstrap new libraries, etc. The word "standard" is tricky: - it has a pretty precise technical meaning that is different from all of those things, so if those are what you want then it's a bad word to use. - it's a somewhat arcane niche of engineering practice that a lot of people don't have direct experience with, so there are a ton of people with vague and magical ideas about how standards work, and if you use the word then they'll start expecting all kinds of things. (See the response up thread where someone thinks that you just proposed to make a bunch of incompatible changes to numpy.) This makes it difficult to have a productive discussion, because everyone is misinterpreting each other. I bet if we can articulate more precisely what exactly you're hoping to accomplish, then we'll also be able to figure out specific concrete actions that will help, and they won't involve the word "standard". For example:
That sounds great. Maybe you want... a mailing list or a forum for array library implementors to compare notes? ("So we ran into this unexpected problem implementing einsum, how did you handle it? And btw numpy devs, why is it like that in the first place?") Maybe you want someone to write up a review of existing APIs like xtensor, dask, xarray, sparse, ... to see where they deviated from numpy and if there are any commonalities? Or someone could do an analysis of existing code and publish tables of how often different features are used, so array implementors can make better choices about what to implement first? Or maybe just encouraging Hameer to be really proactive about sharing drafts and gathering feedback here? -n
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019 at 10:32 PM Nathaniel Smith <njs@pobox.com> wrote:
It's just one word out of 100 line email. I'm happy to retract it. Please pretend it wasn't there and re-read the rest. Replace it with the list of functions that I propose in my previous email.
Please see my email of 1 hour ago.
No. ("So we ran into this unexpected problem implementing einsum, how did you
handle it? And btw numpy devs, why is it like that in the first place?")
can be done on this list. Maybe you want someone to write up a review of existing APIs like xtensor,
dask, xarray, sparse, ... to see where they deviated from numpy and if there are any commonalities?
That will be useful in verifying that the list of functions for "core of NumPy" I proposed is sensible. We're not going to make up things out of thin air.
That's done:) NumPy table: https://github.com/Quansight-Labs/python-api-inspect/blob/master/data/csv/nu... Blog post with explanation: https://labs.quansight.org/blog/2019/05/python-package-function-usage/ And yes, it's another useful data point in verifying our choices. Or maybe just encouraging Hameer to be really proactive about sharing
drafts and gathering feedback here?
No. (well, it's always good to be proactive, but besides the point here) Cheers, Ralf
![](https://secure.gravatar.com/avatar/bf8f3c2be96ccb8b7bc0f83f3d6eb316.jpg?s=120&d=mm&r=g)
As an amateur user of Numpy (hobby programming), and at the opposite end of the spectrum from the Numpy development team, I’d like to raise my hand and applaud this idea. I think it would make my use of Numpy significantly easier if an API not only specified the basic API structure, but also regularized it to the extent possible. Thanks, Bill Wing
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Ralf, Despite sharing Nathaniel's doubts about the ease of defining the numpy API and the likelihood of people actually sticking to a limited subset of what numpy exposes, I quite like the actual things you propose to do! But my liking it is for reasons that are different from your stated ones: I think the proposed actions are likely to benefit greatly both for users (like Bill above) and current and prospective developers. To me, it seems almost as a side benefit (if a very nice one) that it might help other projects to share an API; a larger benefit may come from tapping into the experience of other projects in thinking about what are the true basic functions/method that one should have. More concretely, to address Nathaniel's (very reasonable) worry about ending up wasting a lot of time, I think it may be good to identify smaller parts, each of which are useful on their own. In this respect, I think an excellent place to start might be something you are planning already anyway: update the user documentation. Doing this will necessarily require thinking about, e.g., what `ndarray` methods and properties are actually fundamental, as you only want to focus on a few. With that in place, one could then, as you suggest, reorganize the reference documentation to put those most important properties up front, and ones that we really think are mistakes at the bottom, with explanations of why we think so and what the alternative is. Also for the reference documentation, it would help to group functions more logically. The above could lead to three next steps, all of which I think would be useful. First, for (prospective) developers as well as for future maintenance, I think it would be quite a large benefit if we (slowly but surely) rewrote code that implements the less basic functionality in terms of more basic functions (e.g., replace use of `array.fill(...)` or `np.copyto(array, ...)` with `array[...] =`). Second, we could update Nathaniel's NEP about distinct properties duck arrays might want to mimic/implement. Third, we could actual implementing the logical groupings identified in the code base (and describing them!). Currently, it is a mess: for the C files, I typically have to grep to even find where things are done, and while for the functions defined in python files that is not necessary, many have historical rather than logical groupings (looking at you, `from_numeric`!), and even more descriptive ones like `shape_base` are split over `lib` and `core`. I think it would help everybody if we went to a python-like layout, with a true core and libraries such as polynomial, fft, ma, etc. Anyway, re-reading your message, I realize the above is not really what you wrote about, so perhaps this is irrelevant... All the best, Marten
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019 at 6:12 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
Agreed, there is some reverse learning there as well. Projects like Dask and Xtensor already went through making these choices, which can teach us as NumPy developers some lessons.
That perhaps another rationale for doing this. The docs are likely to get a fairly major overhaul this year. If we don't write down a coherent plan then we're just going to make very similar decisions as when we'd write up a "standard", just ad hoc and with much less review.
That could indeed be nice. I think Travis referred to this as defining an "RNumPy" (similar to RPython as a subset of Python).
Second, we could update Nathaniel's NEP about distinct properties duck arrays might want to mimic/implement.
I wasn't thinking about that indeed, but agreed that it could be helpful.
I'd really like this. Also to have sane namespace in numpy, and a basis for putting something in numpy.lib vs the main namespace vs some other namespace (there are a couple of semi-public ones).
Anyway, re-reading your message, I realize the above is not really what you wrote about, so perhaps this is irrelevant...
Not irrelevant, I think you're making some good points. Cheers, Ralf
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019 at 10:12 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I generally agree with this. The most useful aspect of this exercise is likely to be clarifying NumPy for its own developers, and maybe offering a guide to future simplification. Trying to put something together that everyone agrees to as an official standard would be a big project and, as Nathaniel points out, would involve an enormous amount of work, much time, and doubtless many arguments. What might be a less ambitious exercise would be identifying commonalities in the current numpy-like languages. That would have the advantage of feedback from actual user experience, and would be more like a lessons learned document that would be helpful to others.
I keep thinking duck type. Or in this case, duck type lite.
I've had similar thoughts.
Second, we could update Nathaniel's NEP about distinct properties duck arrays might want to mimic/implement.
Yes.
Chuck
![](https://secure.gravatar.com/avatar/72f994ca072df3a3d2c3db8a137790fd.jpg?s=120&d=mm&r=g)
On 1/6/19 7:31 pm, Charles R Harris wrote:
I would include tests as well. Rather than hammer out a full standard based on extensive discussions and negotiations, I would suggest NumPy might be able set a de-facto "standard" based on pieces of the the current numpy user documentation and test suite. Then other projects could use "passing the tests" as an indication that they implement the NumPy API, and could refer to the documentation where appropriate. Once we have a base repo under numpy with tests and documentations for the generally accepted baseline interfaces. we can discuss on a case-by-case basis via pull requests and issues whether other interfaces should be included. If we find general classes of similarity that can be concisely described but not all duckarray packages support (structured arrays, for instance), these could become test-specifiers `@pytest.skipif(not HAVE_STRUCTURED_ARRAYS)`, the tests and documentation would only apply if that specifier exists. Matti
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019 at 8:46 PM Matti Picus <matti.picus@gmail.com> wrote:
I think this is potentially useful, but *far* more prescriptive and detailed than I had in mind. Both you and Nathaniel seem to have not understood what I mean by "out of scope", so I think that's my fault in not being explicit enough. I *do not* want to prescribe behavior. Instead, a simple yes/no for each function in numpy and method on ndarray. Our API is huge. A simple count: main namespace: 600 fft: 30 linalg: 30 random: 60 ndarray: 70 lib: 20 lib.npyio: 35 etc. (many more ill-thought out but not clearly private submodules) Just the main namespace plus ndarray methods is close to 700 objects. If you want to build a NumPy-like thing, that's 700 decisions to make. I'm suggesting something as simple as a list of functions that constitute a sensible "core of NumPy". That list would not include anything in fft/linalg/random, since those can easily be separated out (indeed, if we could disappear fft and linalg and just rely on scipy, pyfftw etc., that would be great). It would not include financial functions. And so on. I guess we'd end up with most ndarray methods plus <150 functions. That list could be used for many purposes: improve the docs, serve as the set of functions to implement in xnd.array, unumpy & co, Marten's suggestion of implementing other functions in terms of basic functions, etc. Two other thoughts: 1. NumPy is not done. Our thinking on how to evolve the NumPy API is fairly muddled. When new functions are proposed, it's decided on on a case-by-case basis, usually without a guiding principle. We need to improve that. A "core of NumPy" list could be a part of that puzzle. 2. We often argue about deprecations. Deprecations are costly, but so is keeping around functions that are not very useful or have a poor design. This may offer a middle ground. Don't let others repeat our mistakes, signal to users that a function is of questionable value, without breaking already written code. Cheers, Ralf
![](https://secure.gravatar.com/avatar/88ba0cd741b199eca4691fabe57ee67f.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019 at 10:05 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
This sounds like a restructuring or factorization of the API, in order to make it smaller, and thus easier to learn and use. It may start with the docs, by paying more attention to the "core" or important functions and methods, and noting the deprecated, or not frequently used, or not important functions. This could also help the satellite projects, which use NumPy API as an example, and may also be influenced by them and their decisions. Regards, Dashamir
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Sun, 2019-06-02 at 08:42 +0200, Ralf Gommers wrote:
<snip>
Trying to follow the discussion, there seems to be various ideas? Do I understand it right that the original proposal was much like doing a list of: * np.ndarray.cumprod: low importance -> prefer np.multiply.accumulate * np.ravel_multi_index: low importance, but distinct feature Maybe with added groups such as "transpose-like" and "reshape-like" functions? This would be based on 1. "Experience" and 2. usage statistics. This seems mostly a task for 2-3 people to then throw out there for discussion. There will be some very difficult/impossible calls, since in the end Nathaniel is right, we do not quite know the question we want to answer. But for a huge part of the API it may not be problematic? Then there is an idea of providing better mixins (and tests). This could be made easier by the first idea, for prioritization. Although, the first idea is probably not really necessary to kick this off at all. The interesting parts to me seem likely how to best solve testing of the mixins and numpy-api-duplicators in general. Implementing a growing set of mixin seems likely fairly straight forwrad (although maybe much easier to approach if there is a list from the first project)? And, once we have a start, maybe we can rely on the array-like implementors to be the main developers (limiting us mostly to review). The last part would be probably for users and consumers of array-likes. This largely overlaps, but comes closer to the problem of "standard". If we have a list of functions that we tend to see as more or less important, it may be interesting for downstream projects to restrict themselves to simplify interoperability e.g. with dask. Maybe we do not have to draw a strict line though? How plausible would it be to set up a list (best auto-updating) saying nothing but: `np.concatenate` supported by: dask, jax, cupy I am not sure if this is helpful, but it feels to me that the first part is what Ralf was thinking of? Just to kick of such a a "living document". I could maybe help with providing the second pair of eyes for a first iteration there, Ralf. The last list I would actually find interesting myself, but not sure how easy it would be to approach it? Best, Sebastian
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Mon, Jun 3, 2019 at 7:56 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
Indeed. Certainly no more than that was my idea.
Agreed, won't be problematic.
Indeed. I think there's actually 3 levels here (at least): 1. function name: high/low importance or some such simple classification 2. function signature and behavior: is the behavior optimal, what would be change, etc. 3. making duck arrays and subclasses that rely on all those functions and their behavior easier to implemement/use Mixins are a specific answer to (3). And it's unclear if they're the best answer (could be, I don't know - please don't start a discussion on that here). Either way, working on (3) will be helped by having a better sense of (1) and (2). Also think about effort: (2) is at least an order of magnitude more work than (1), and (3) likely even more work than (2).
That's probably not that hard, and I agree it would be quite useful. The namespaces of each of those libraries is probably not the same, but with dir() and some strings and lists you'll get a long way here I think.
Indeed. I could maybe help with providing the second pair of eyes
for a first iteration there, Ralf.
Awesome, thanks Sebastian. Cheers, Ralf The last list I would actually find
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
One little point here:
* np.ndarray.cumprod: low importance -> prefer np.multiply.accumulate
I think that's and example of something that *should* be part of the numpy API, but should be implemented as a mixin, based on np.multiply.accumulate. As I'm a still confused about the goal here, that means that: Users should still use `.cumprod`, but implementers of numpy-like packages should implement `.multiply.accumulate`, and not directly `cumprod`, but rather use the numpy ABC, or however it is implemented. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/5d8caeb7bf0e4b29cc75a5f5646b0db3.jpg?s=120&d=mm&r=g)
This slide deck from Matthew Rocklin at SciPy 2019 might be relevant: https://matthewrocklin.com/slides/scipy-2019#/ On Tue, Jun 4, 2019 at 12:06 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
-- Mark Mikofski, PhD (2005) *Fiat Lux*
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sat, Jul 13, 2019 at 12:48 AM Mark Mikofski <mikofski@berkeley.edu> wrote:
This slide deck from Matthew Rocklin at SciPy 2019 might be relevant: https://matthewrocklin.com/slides/scipy-2019#/
That was a very nice talk indeed. It's also up on Youtube, worth watching: https://www.youtube.com/watch?v=Q0DsdiY-jiw I've also put up an 0.0.1 version of RNumPy "restricted NumPy" up on PyPI (mostly to reserve the name, but it's usable). The README and __init__.py docstring plus the package itself (https://github.com/Quansight-Labs/rnumpy) should give a better idea of the ideas we were discussing in this thread. Cheers, Ralf
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019 at 1:05 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
I think this is potentially useful, but *far* more prescriptive and detailed than I had in mind. Both you and Nathaniel seem to have not understood what I mean by "out of scope", so I think that's my fault in not being explicit enough. I *do not* want to prescribe behavior. Instead, a simple yes/no for each function in numpy and method on ndarray.
So yes/no are the answers. But what's the question? "If we were redesigning numpy in a fantasy world without external constraints or compatibility issues, would we include this function?" "Is this function well designed?" "Do we think that supporting this function is necessary to achieve practical duck-array compatibility?" "If someone implements this function, should we give them a 'numpy core compliant!' logo to put on their website?" "Do we recommend that people use this function in new code?" "If we were trying to design a minimal set of primitives and implement the rest of numpy in terms of them, then is this function a good candidate for a primitive?" These are all really different things, and useful for solving different problems... I feel like you might be lumping them together some? Also, I'm guessing there are a bunch of functions where you think part of the interface is fine and part of the interface is broken. (E.g. dot's behavior on high-dimensional arrays.) Do you think this "one bool per function" structure will be fine-grained enough for what you want to do?
Two other thoughts: 1. NumPy is not done. Our thinking on how to evolve the NumPy API is fairly muddled. When new functions are proposed, it's decided on on a case-by-case basis, usually without a guiding principle. We need to improve that. A "core of NumPy" list could be a part of that puzzle.
I think we do have some rough consensus principles on what's in scope and what isn't in scope for numpy, but yeah, articulating them more clearly could be useful. Stuff like "output types and shape should be predictable from input types and shape", "numpy's core responsibilities are the array/dtype/ufunc interfaces, and providing a lingua franca for python numerical libraries to interoperate" (and therefore: "if it can live outside numpy it probably should"), etc. I'm seeing this as a living document (a NEP?) that tries to capture some rules of thumb and that we update as we go. That seems pretty different to me than a long list of yes/no checkboxes though?
2. We often argue about deprecations. Deprecations are costly, but so is keeping around functions that are not very useful or have a poor design. This may offer a middle ground. Don't let others repeat our mistakes, signal to users that a function is of questionable value, without breaking already written code.
The idea has come up a few times of having a "soft deprecation" level, where we put a warning in the docs but not in the code. It seems like a reasonable idea to me. It's inherently a kind of case-by-case thing that can be done incrementally. But, if someone wants to systematically work through all the docs and do the case-by-case analysis, that also seems like a reasonable idea to me. I'm not sure if that's the same as your proposal or not. -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 12:35 AM Nathaniel Smith <njs@pobox.com> wrote:
No, I feel like you just want to see a real proposal. At this point I've gotten some really useful feedback, in particular from Marten (thanks!), and I have a better idea of what to do. So I'll answer a few of your questions, and propose to leave the rest till I actually have some more solid to discuss. That will likely answer many of your questions.
Indeed, but that's a much harder problem to tackle. Again, there's a reason I put function behavior explicitly out of scope. Do you think this "one
bool per function" structure will be fine-grained enough for what you want to do?
yes
Very rough perhaps. I don't think we are on the same wavelength at all about the cost of adding new functions, the cost of deprecations, the use of submodules and even what's public or private right now. That can't be solved all at once, but I think what my idea will help with some of these. but yeah, articulating them more
All of these are valid questions. Most of that propably needs to be in the scope document (https://www.numpy.org/neps/scope.html). Which also needs to be improved. I'm seeing this as a living document (a NEP?) NEP would work. Although I'd prefer a way to be able to reference some fixed version of it rather than it being always in flux. that tries to capture
not the same, but related. Ralf
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019 at 11:59 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Okay, that's fine. You scared me a bit with the initial email, but I really am trying to be helpful :-). I'm not looking for a detailed proposal; I'm just super confused right now about what you're trying to accomplish or how this table of yes/no values will help do it. I look forward to hearing more!
I'm seeing this as a living document (a NEP?)
NEP would work. Although I'd prefer a way to be able to reference some fixed version of it rather than it being always in flux.
When I say "living" I mean: it would be seen as documenting our consensus and necessarily fuzzy rather than normative and precise like most NEPs. Maybe this is obvious and not worth mentioning. But I wouldn't expect it to change rapidly. Unless our collective opinions change rapidly I guess, but that seems unlikely. (And of course NEPs are in git so we always have the ability to link to a point-in-time snapshot if we need to reference something.) -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 9:46 AM Nathaniel Smith <njs@pobox.com> wrote:
Thanks! I know this is going to be a little complicated to get everyone on the same page. That's why I'm aiming to get a draft out before SciPy'19 so there's a chance to discuss it in person with everyone who is there. Mailing lists are a poor interface. Will you be at SciPy?
Yeah, I'm going for useful rather than normative:) Maybe this is obvious and not worth mentioning. But I
Agreed. One perhaps unintended side effect of separating out the NEPs doc build from the full doc build is that we stopped shipping NEPs with our releases. It would be nicer to say "NEP as of 1.16" rather than "NEP as of commit 1324adf59". Ah well, that's for another time. Ralf
![](https://secure.gravatar.com/avatar/88ba0cd741b199eca4691fabe57ee67f.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 10:01 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Would it be useful if we could integrate the documentation system with a discussion forum (like Discourse.org)? Each function can be linked to its own discussion topic, where users and developers can discuss about the function, upvote or downvote it etc. This kind of discussion seems to be a bit more structured than a mailing list discussion. Dashamir
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 10:53 AM Dashamir Hoxha <dashohoxha@gmail.com> wrote:
A more modern forum is nice indeed. It is not strictly better than mailing lists though. So what I would like is a Discourse like interface on top of the mailing list, so we get the features you're talking about without a painful migration and breaking all links to threads in the archives. Mailman 3 does provide this (example: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/). I'm keeping an eye on what's going on with Mailman 3 migration of the python.org provided infrastructure. I think we can do this in the near to medium future. I don't want us to be the guinea pig though:) Cheers, Ralf
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 12:07 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
To save anyone else the trouble of posting this link, here's Guido's thumbs down on Discourse (and he's not the only one) as a replacement for Python mailing lists: https://discuss.python.org/t/disappointed-and-overwhelmed-by-discourse/982. Tastes vary:) Ralf
![](https://secure.gravatar.com/avatar/88ba0cd741b199eca4691fabe57ee67f.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 12:12 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
I did not suggest replacing the mailing lists with Discourse. I suggested integrating documentation with Discourse, so that for each function there is a separate discussion topic for this function. For each function on the documentation page there can be a "Feedback" or "Comment" link that goes to the corresponding discussion topic for that function. This way Discourse can be used like a commenting system (similar to Disqus). In the discussion page of the function people can upvote the function (using the "like" feature of Discourse) and can also explain why they think it is important. This may help building a consensus about which are the important or "core" functions of NumPy. Or maybe it doesn't have to be so complex after all, and mailing list discussions, combined with face-to-face discussions on conferences or online meetings can do it better. Dashamir
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 12:44 PM Dashamir Hoxha <dashohoxha@gmail.com> wrote:
Oh okay, I misunderstood you. I don't think that's desirable; it's too complicated and has too much overhead in setting up and maintaining. Between looking at libraries like Dask and Xtensor, tooling to measure actual API usage ( https://labs.quansight.org/blog/2019/05/python-package-function-usage/), and just using our own knowledge, we have enough information to make choices. Ralf
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 3:45 AM Dashamir Hoxha <dashohoxha@gmail.com> wrote:
We could make a giHub repo for a document, and use issues to separately discuss each topic. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
Re: Successful specifications (I’ll avoid using the word standard): Moving: HTML5/CSS3, C++, Rust, Python, Java. Static: C I’d really like this to be a moving spec... A static one is never much use, and is doomed to miss use cases, either today or some from the future. Best Regards, Hameer Abbasi
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
I quite like the idea of trying to be better at defining the API through tests - the substitution principle in action! Systematically applying tests to both ndarray and MaskedArray might be a start internally (just a pytest fixture away...). But definitely start with more of an overview. -- Marten
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
I would perhaps start with ndarray itself. Quite a lot seems superfluous Shapes: - need: shape, strides, reshape, transpose; - probably: ndim, size, T - less so: nbytes, ravel, flatten, squeeze, and swapaxes. Getting/setting: - need __getitem__, __setitem__; - less so: fill, put, take, item, itemset, repeat, compress, diagonal; Datatype/Copies/views/conversion - need: dtype, copy, view, astype, flags - less so: ctypes, dump, dumps, getfield, setfield, itemsize, byteswap, newbyteorder, resize, setflags, tobytes, tofile, tolist, tostring, Iteration - need __iter__ - less so: flat Numerics - need: conj, real, imag - maybe also: min, max, mean, sum, std, var, prod, partition, sort, tracet; - less so: the arg* ones, cumsum, cumprod, clip, round, dot, all, any, nonzero, ptp, searchsorted, choose. All the best, Marten
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Sun, 2019-06-02 at 08:38 +0200, Ralf Gommers wrote:
It is a bit tricky. I dislike flat for example, but it does have occasional use cases. min/max, etc. are interesting, in that they are just aliases to minimum.reduce, and could be argued to be covered by the ufunc. For other projects, the list of actual usage statistics may almost be more interesting then what we can come up with. Although it depends a bit where the long haul goes (but it seems right now that is not the proposal). For example, we could actually mark all our functions, and then you could test SciPy for "numpy-core" compatible (i.e. test the users). We could also want to work towards reference tests at some point. It would be a huge amount of work, but if other projects would want to help maintain it, maybe it can safe work in the long run? One thing that I think may be interesting, would be to attempt to make a graph of what functions can be implemented using other functions. Example: transpose <-> swapaxes <-> moveaxis Indexing: delete, insert (+empty_like) reshape: atleast_Nd, ravel (+ensure/copy) (and then find a minimal set, etc.) Many of these have multiple possible implementations, though. But if we could create even an indea of such a dependency graph, it could be very cool to find "what is important". Much of this is trivial, but maybe it could help to get a picture of where things might go. Anyway, it seems like a good idea to do something, but what that something is and how difficult it would be seems hard to judge. But I guess that should not stop us from moving. Maybe information of usage and groupings/opinions on importance is already a lot. Best, Sebastian
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
Exactly. This is great, thanks Marten. I agree with pretty much everything in this list.
For my part, a few things immediately popped out at my that I disagree with. ;-) Which does not mean it isn’t a useful exercise, but it does mean we should expect a fair bit of debate. But I do think we should be clear as to what the point is: I think it could be helpful for clarifying for new and long standing users of numpy what the “numpythonic” way to use numpy is. I think this is very closely tied to the duck typing discussion. But for guiding implementations of “numpy-like” libraries, not so much: they are going to implement the features their users need — whether it’s “officially” part of the numpy API is a minor concern. Unless there is an official “Standard”, but it doesn’t sound like anyone has that in mind. I’m also a bit confused as to the scope: is this effort about the python API only? In which case, I’m not sure how it relates to libraries in/for other languages. Or only about those that provide a Python binding? When I first read the topic of this thread, I expected it to be about the C API — it would be nice to clearly define what parts of the C API are considered public and stable. (Though maybe that’s already done — I do get numpy API deprecation warnings at times..) -CHB
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
Some of your categories here sound like they might be suitable for ABCs that provide mixin methods, which is something I think Hameer suggested in the past. Perhaps it's worth re-exploring that avenue. Eric On Sat, Jun 1, 2019, 18:18 Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 2:21 PM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Indeed, and of course for __array_ufunc__ we moved there a bit already, with `NDArrayOperatorsMixin` [1]. One could certainly similarly have NDShapingMixin that, e.g., relied on `shape`, `reshape`, and `transpose` to implement `ravel`, `swapaxes`, etc. And indeed use those mixins in `ndarray` itself. For this also having a summary of base functions/methods would be very helpful. -- Marten
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 1:08 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I would definitely support writing more mixins and helper functions (either in NumPy, or externally) to make it easier to re-implement NumPy's public API. Certainly there is plenty of room to make it easier to leverage __array_ufunc__ and __array_function__. For some recent examples of what these helpers functions could look like, see JAX's implementation of NumPy, which is written in terms of a much smaller array library called LAX: https://github.com/google/jax/blob/9dfe27880517d5583048e7a3384b504681968fb4/... Hypothetically, JAX could be written on top of a "restricted NumPy" instead, which in turn could have an implementation written in LAX. This would facilitate reusing JAX's higher level functions for automatic differentiation and vectorization on top of different array backends. I would also be happy to see guidance for NumPy API re-implementers, both for those scratching from scratch (e.g., in a new language) or who plan to copy NumPy's Python API (e.g., with __array_function__). I would focus on: 1. Describing the tradeoffs of challenging design decisions that NumPy may have gotten wrong, e.g., scalars and indexing. 2. Describing common "gotchas" where it's easy to deviate from NumPy's semantics unintentionally, e.g., with scalar arithmetic dtypes or indexing edge cases. I would *not* try to identify a "core" list of methods/functionality to implement. Everyone uses their own slice of NumPy's API, so the rational approach for anyone trying to reimplement exactly (i.e., with __array_function__) is to start with a minimal subset and add functionality on demand to meet user's needs. Also, many of the choices involved in making an array library don't really have objectively right or wrong answers, and authors are going to make intentional deviations from NumPy's semantics when it makes sense for them. Cheers, Stephan
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
I would agree that the set should be minimal at first, but would comment that we should still have a better taxonomy of functions that should be supported, in terms of the functionality they provide and functionality that is required for them to work. E.g. __setitem__ needs immutability. Best Regards, Hameer Abbasi
![](https://secure.gravatar.com/avatar/d9ac9213ada4a807322f99081296784b.jpg?s=120&d=mm&r=g)
Hi Marten, On Sat, 01 Jun 2019 12:11:38 -0400, Marten van Kerkwijk wrote:
How hard do you think it would be to address this issue? You seem to have some notion of which pain points should be prioritized, and it might be useful to jot those down somewhere (tracking issue on GitHub?). Stéfan
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Stefan, On Mon, Jun 3, 2019 at 4:26 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
The python side would, I think, not be too hard. But I don't really have that much of a notion - it would very much be informed by making a list first. For the C parts, I feel even more at a loss: one really would have to start with a summary of what is actually there (and I think the organization may well be quite logical already; I've not so felt it was wrong as in need of an overview). Somewhat of an aside, but relevant for the general discussion: updating/rewriting the user documentation may well be the best *first* step. It certainly doesn't hurt to try to make some list now, but my guess that the best one will emerge only when one tries to summarize what a new user should know/understand. All the best, Marten
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
It's possible I'm not getting what you're thinking, but from what you describe in your email I think it's a bad idea. Standards take a tremendous amount of work (no really, an absurdly massively huge amount of work, more than you can imagine if you haven't done it). And they don't do what people usually hope they do. Many many standards are written all the time that have zero effect on reality, and the effort is wasted. They're really only useful when you have to solve a coordination problem: lots of people want to do the same thing as each other, whatever that is, but no-one knows what the thing should be. That's not a problem at all for us, because numpy already exists. If you want to improve compatibility between Python libraries, then I don't think it will be relevant. Users aren't writing code against "the numpy standard", they're not testing their libraries against "the numpy standard", they're using/testing against numpy. If library authors want to be compatible with numpy, they need to match what numpy does, not what some document says. OTOH if they think they have a better idea and its worth breaking compatibility, they're going to do it regardless of what some document somewhere says. If you want to share the lessons learned from numpy in the hopes of improving future libraries that don't care about numpy compatibility per se, in python or other languages, then that seems like a great idea! But that's not a standard, that's a journal article called something like "NumPy: A retrospective". Other languages aren't going to match numpy one-to-one anyway, because they'll be adapting things to their language's idioms; they certainly don't care about whether you decided 'newaxis MUST be defined to None' or merely 'SHOULD' be defined to None. IMO if you try the most likely outcome will be that it will suck up a lot of energy writing it, and then the only effect is that everyone will keep doing what they would have done anyway but now with extra self-righteousness and yelling depending on whether that turns out to match the standard or not. -n On Sat, Jun 1, 2019 at 1:18 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
-- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019 at 11:32 AM Nathaniel Smith <njs@pobox.com> wrote:
It's possible I'm not getting what you're thinking, but from what you describe in your email I think it's a bad idea.
Hi Nathaniel, I think you are indeed not getting what I meant and are just responding to the word "standard". I'll give a concrete example. Here is the xtensor to numpy comparison: https://xtensor.readthedocs.io/en/latest/numpy.html. The xtensor authors clearly have made sane choices, but they did have to spend a lot of effort making those choices - what to include and what not. Now, the XND team is just starting to build out their Python API. Hameer is building out unumpy. There's all the other arrays libraries I mentioned. We can say "sort it out yourself, make your own choices", or we can provide some guidance. So far the authors of those libaries I have asked say they would appreciate the guidance. Cheers, Ralf
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019, 09:13 Ralf Gommers <ralf.gommers@gmail.com> wrote:
Well, that's the word you chose :-) I think it's very possible that what you're thinking is a good idea, but it's actually something else, like better high-level documentation, or a NEP documenting things we wish we did differently but are stuck with, or a generic duck array test suite to improve compatibility and make it easier to bootstrap new libraries, etc. The word "standard" is tricky: - it has a pretty precise technical meaning that is different from all of those things, so if those are what you want then it's a bad word to use. - it's a somewhat arcane niche of engineering practice that a lot of people don't have direct experience with, so there are a ton of people with vague and magical ideas about how standards work, and if you use the word then they'll start expecting all kinds of things. (See the response up thread where someone thinks that you just proposed to make a bunch of incompatible changes to numpy.) This makes it difficult to have a productive discussion, because everyone is misinterpreting each other. I bet if we can articulate more precisely what exactly you're hoping to accomplish, then we'll also be able to figure out specific concrete actions that will help, and they won't involve the word "standard". For example:
That sounds great. Maybe you want... a mailing list or a forum for array library implementors to compare notes? ("So we ran into this unexpected problem implementing einsum, how did you handle it? And btw numpy devs, why is it like that in the first place?") Maybe you want someone to write up a review of existing APIs like xtensor, dask, xarray, sparse, ... to see where they deviated from numpy and if there are any commonalities? Or someone could do an analysis of existing code and publish tables of how often different features are used, so array implementors can make better choices about what to implement first? Or maybe just encouraging Hameer to be really proactive about sharing drafts and gathering feedback here? -n
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019 at 10:32 PM Nathaniel Smith <njs@pobox.com> wrote:
It's just one word out of 100 line email. I'm happy to retract it. Please pretend it wasn't there and re-read the rest. Replace it with the list of functions that I propose in my previous email.
Please see my email of 1 hour ago.
No. ("So we ran into this unexpected problem implementing einsum, how did you
handle it? And btw numpy devs, why is it like that in the first place?")
can be done on this list. Maybe you want someone to write up a review of existing APIs like xtensor,
dask, xarray, sparse, ... to see where they deviated from numpy and if there are any commonalities?
That will be useful in verifying that the list of functions for "core of NumPy" I proposed is sensible. We're not going to make up things out of thin air.
That's done:) NumPy table: https://github.com/Quansight-Labs/python-api-inspect/blob/master/data/csv/nu... Blog post with explanation: https://labs.quansight.org/blog/2019/05/python-package-function-usage/ And yes, it's another useful data point in verifying our choices. Or maybe just encouraging Hameer to be really proactive about sharing
drafts and gathering feedback here?
No. (well, it's always good to be proactive, but besides the point here) Cheers, Ralf
![](https://secure.gravatar.com/avatar/bf8f3c2be96ccb8b7bc0f83f3d6eb316.jpg?s=120&d=mm&r=g)
As an amateur user of Numpy (hobby programming), and at the opposite end of the spectrum from the Numpy development team, I’d like to raise my hand and applaud this idea. I think it would make my use of Numpy significantly easier if an API not only specified the basic API structure, but also regularized it to the extent possible. Thanks, Bill Wing
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Ralf, Despite sharing Nathaniel's doubts about the ease of defining the numpy API and the likelihood of people actually sticking to a limited subset of what numpy exposes, I quite like the actual things you propose to do! But my liking it is for reasons that are different from your stated ones: I think the proposed actions are likely to benefit greatly both for users (like Bill above) and current and prospective developers. To me, it seems almost as a side benefit (if a very nice one) that it might help other projects to share an API; a larger benefit may come from tapping into the experience of other projects in thinking about what are the true basic functions/method that one should have. More concretely, to address Nathaniel's (very reasonable) worry about ending up wasting a lot of time, I think it may be good to identify smaller parts, each of which are useful on their own. In this respect, I think an excellent place to start might be something you are planning already anyway: update the user documentation. Doing this will necessarily require thinking about, e.g., what `ndarray` methods and properties are actually fundamental, as you only want to focus on a few. With that in place, one could then, as you suggest, reorganize the reference documentation to put those most important properties up front, and ones that we really think are mistakes at the bottom, with explanations of why we think so and what the alternative is. Also for the reference documentation, it would help to group functions more logically. The above could lead to three next steps, all of which I think would be useful. First, for (prospective) developers as well as for future maintenance, I think it would be quite a large benefit if we (slowly but surely) rewrote code that implements the less basic functionality in terms of more basic functions (e.g., replace use of `array.fill(...)` or `np.copyto(array, ...)` with `array[...] =`). Second, we could update Nathaniel's NEP about distinct properties duck arrays might want to mimic/implement. Third, we could actual implementing the logical groupings identified in the code base (and describing them!). Currently, it is a mess: for the C files, I typically have to grep to even find where things are done, and while for the functions defined in python files that is not necessary, many have historical rather than logical groupings (looking at you, `from_numeric`!), and even more descriptive ones like `shape_base` are split over `lib` and `core`. I think it would help everybody if we went to a python-like layout, with a true core and libraries such as polynomial, fft, ma, etc. Anyway, re-reading your message, I realize the above is not really what you wrote about, so perhaps this is irrelevant... All the best, Marten
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019 at 6:12 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
Agreed, there is some reverse learning there as well. Projects like Dask and Xtensor already went through making these choices, which can teach us as NumPy developers some lessons.
That perhaps another rationale for doing this. The docs are likely to get a fairly major overhaul this year. If we don't write down a coherent plan then we're just going to make very similar decisions as when we'd write up a "standard", just ad hoc and with much less review.
That could indeed be nice. I think Travis referred to this as defining an "RNumPy" (similar to RPython as a subset of Python).
Second, we could update Nathaniel's NEP about distinct properties duck arrays might want to mimic/implement.
I wasn't thinking about that indeed, but agreed that it could be helpful.
I'd really like this. Also to have sane namespace in numpy, and a basis for putting something in numpy.lib vs the main namespace vs some other namespace (there are a couple of semi-public ones).
Anyway, re-reading your message, I realize the above is not really what you wrote about, so perhaps this is irrelevant...
Not irrelevant, I think you're making some good points. Cheers, Ralf
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019 at 10:12 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I generally agree with this. The most useful aspect of this exercise is likely to be clarifying NumPy for its own developers, and maybe offering a guide to future simplification. Trying to put something together that everyone agrees to as an official standard would be a big project and, as Nathaniel points out, would involve an enormous amount of work, much time, and doubtless many arguments. What might be a less ambitious exercise would be identifying commonalities in the current numpy-like languages. That would have the advantage of feedback from actual user experience, and would be more like a lessons learned document that would be helpful to others.
I keep thinking duck type. Or in this case, duck type lite.
I've had similar thoughts.
Second, we could update Nathaniel's NEP about distinct properties duck arrays might want to mimic/implement.
Yes.
Chuck
![](https://secure.gravatar.com/avatar/72f994ca072df3a3d2c3db8a137790fd.jpg?s=120&d=mm&r=g)
On 1/6/19 7:31 pm, Charles R Harris wrote:
I would include tests as well. Rather than hammer out a full standard based on extensive discussions and negotiations, I would suggest NumPy might be able set a de-facto "standard" based on pieces of the the current numpy user documentation and test suite. Then other projects could use "passing the tests" as an indication that they implement the NumPy API, and could refer to the documentation where appropriate. Once we have a base repo under numpy with tests and documentations for the generally accepted baseline interfaces. we can discuss on a case-by-case basis via pull requests and issues whether other interfaces should be included. If we find general classes of similarity that can be concisely described but not all duckarray packages support (structured arrays, for instance), these could become test-specifiers `@pytest.skipif(not HAVE_STRUCTURED_ARRAYS)`, the tests and documentation would only apply if that specifier exists. Matti
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019 at 8:46 PM Matti Picus <matti.picus@gmail.com> wrote:
I think this is potentially useful, but *far* more prescriptive and detailed than I had in mind. Both you and Nathaniel seem to have not understood what I mean by "out of scope", so I think that's my fault in not being explicit enough. I *do not* want to prescribe behavior. Instead, a simple yes/no for each function in numpy and method on ndarray. Our API is huge. A simple count: main namespace: 600 fft: 30 linalg: 30 random: 60 ndarray: 70 lib: 20 lib.npyio: 35 etc. (many more ill-thought out but not clearly private submodules) Just the main namespace plus ndarray methods is close to 700 objects. If you want to build a NumPy-like thing, that's 700 decisions to make. I'm suggesting something as simple as a list of functions that constitute a sensible "core of NumPy". That list would not include anything in fft/linalg/random, since those can easily be separated out (indeed, if we could disappear fft and linalg and just rely on scipy, pyfftw etc., that would be great). It would not include financial functions. And so on. I guess we'd end up with most ndarray methods plus <150 functions. That list could be used for many purposes: improve the docs, serve as the set of functions to implement in xnd.array, unumpy & co, Marten's suggestion of implementing other functions in terms of basic functions, etc. Two other thoughts: 1. NumPy is not done. Our thinking on how to evolve the NumPy API is fairly muddled. When new functions are proposed, it's decided on on a case-by-case basis, usually without a guiding principle. We need to improve that. A "core of NumPy" list could be a part of that puzzle. 2. We often argue about deprecations. Deprecations are costly, but so is keeping around functions that are not very useful or have a poor design. This may offer a middle ground. Don't let others repeat our mistakes, signal to users that a function is of questionable value, without breaking already written code. Cheers, Ralf
![](https://secure.gravatar.com/avatar/88ba0cd741b199eca4691fabe57ee67f.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019 at 10:05 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
This sounds like a restructuring or factorization of the API, in order to make it smaller, and thus easier to learn and use. It may start with the docs, by paying more attention to the "core" or important functions and methods, and noting the deprecated, or not frequently used, or not important functions. This could also help the satellite projects, which use NumPy API as an example, and may also be influenced by them and their decisions. Regards, Dashamir
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Sun, 2019-06-02 at 08:42 +0200, Ralf Gommers wrote:
<snip>
Trying to follow the discussion, there seems to be various ideas? Do I understand it right that the original proposal was much like doing a list of: * np.ndarray.cumprod: low importance -> prefer np.multiply.accumulate * np.ravel_multi_index: low importance, but distinct feature Maybe with added groups such as "transpose-like" and "reshape-like" functions? This would be based on 1. "Experience" and 2. usage statistics. This seems mostly a task for 2-3 people to then throw out there for discussion. There will be some very difficult/impossible calls, since in the end Nathaniel is right, we do not quite know the question we want to answer. But for a huge part of the API it may not be problematic? Then there is an idea of providing better mixins (and tests). This could be made easier by the first idea, for prioritization. Although, the first idea is probably not really necessary to kick this off at all. The interesting parts to me seem likely how to best solve testing of the mixins and numpy-api-duplicators in general. Implementing a growing set of mixin seems likely fairly straight forwrad (although maybe much easier to approach if there is a list from the first project)? And, once we have a start, maybe we can rely on the array-like implementors to be the main developers (limiting us mostly to review). The last part would be probably for users and consumers of array-likes. This largely overlaps, but comes closer to the problem of "standard". If we have a list of functions that we tend to see as more or less important, it may be interesting for downstream projects to restrict themselves to simplify interoperability e.g. with dask. Maybe we do not have to draw a strict line though? How plausible would it be to set up a list (best auto-updating) saying nothing but: `np.concatenate` supported by: dask, jax, cupy I am not sure if this is helpful, but it feels to me that the first part is what Ralf was thinking of? Just to kick of such a a "living document". I could maybe help with providing the second pair of eyes for a first iteration there, Ralf. The last list I would actually find interesting myself, but not sure how easy it would be to approach it? Best, Sebastian
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Mon, Jun 3, 2019 at 7:56 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
Indeed. Certainly no more than that was my idea.
Agreed, won't be problematic.
Indeed. I think there's actually 3 levels here (at least): 1. function name: high/low importance or some such simple classification 2. function signature and behavior: is the behavior optimal, what would be change, etc. 3. making duck arrays and subclasses that rely on all those functions and their behavior easier to implemement/use Mixins are a specific answer to (3). And it's unclear if they're the best answer (could be, I don't know - please don't start a discussion on that here). Either way, working on (3) will be helped by having a better sense of (1) and (2). Also think about effort: (2) is at least an order of magnitude more work than (1), and (3) likely even more work than (2).
That's probably not that hard, and I agree it would be quite useful. The namespaces of each of those libraries is probably not the same, but with dir() and some strings and lists you'll get a long way here I think.
Indeed. I could maybe help with providing the second pair of eyes
for a first iteration there, Ralf.
Awesome, thanks Sebastian. Cheers, Ralf The last list I would actually find
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
One little point here:
* np.ndarray.cumprod: low importance -> prefer np.multiply.accumulate
I think that's and example of something that *should* be part of the numpy API, but should be implemented as a mixin, based on np.multiply.accumulate. As I'm a still confused about the goal here, that means that: Users should still use `.cumprod`, but implementers of numpy-like packages should implement `.multiply.accumulate`, and not directly `cumprod`, but rather use the numpy ABC, or however it is implemented. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/5d8caeb7bf0e4b29cc75a5f5646b0db3.jpg?s=120&d=mm&r=g)
This slide deck from Matthew Rocklin at SciPy 2019 might be relevant: https://matthewrocklin.com/slides/scipy-2019#/ On Tue, Jun 4, 2019 at 12:06 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
-- Mark Mikofski, PhD (2005) *Fiat Lux*
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sat, Jul 13, 2019 at 12:48 AM Mark Mikofski <mikofski@berkeley.edu> wrote:
This slide deck from Matthew Rocklin at SciPy 2019 might be relevant: https://matthewrocklin.com/slides/scipy-2019#/
That was a very nice talk indeed. It's also up on Youtube, worth watching: https://www.youtube.com/watch?v=Q0DsdiY-jiw I've also put up an 0.0.1 version of RNumPy "restricted NumPy" up on PyPI (mostly to reserve the name, but it's usable). The README and __init__.py docstring plus the package itself (https://github.com/Quansight-Labs/rnumpy) should give a better idea of the ideas we were discussing in this thread. Cheers, Ralf
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019 at 1:05 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
I think this is potentially useful, but *far* more prescriptive and detailed than I had in mind. Both you and Nathaniel seem to have not understood what I mean by "out of scope", so I think that's my fault in not being explicit enough. I *do not* want to prescribe behavior. Instead, a simple yes/no for each function in numpy and method on ndarray.
So yes/no are the answers. But what's the question? "If we were redesigning numpy in a fantasy world without external constraints or compatibility issues, would we include this function?" "Is this function well designed?" "Do we think that supporting this function is necessary to achieve practical duck-array compatibility?" "If someone implements this function, should we give them a 'numpy core compliant!' logo to put on their website?" "Do we recommend that people use this function in new code?" "If we were trying to design a minimal set of primitives and implement the rest of numpy in terms of them, then is this function a good candidate for a primitive?" These are all really different things, and useful for solving different problems... I feel like you might be lumping them together some? Also, I'm guessing there are a bunch of functions where you think part of the interface is fine and part of the interface is broken. (E.g. dot's behavior on high-dimensional arrays.) Do you think this "one bool per function" structure will be fine-grained enough for what you want to do?
Two other thoughts: 1. NumPy is not done. Our thinking on how to evolve the NumPy API is fairly muddled. When new functions are proposed, it's decided on on a case-by-case basis, usually without a guiding principle. We need to improve that. A "core of NumPy" list could be a part of that puzzle.
I think we do have some rough consensus principles on what's in scope and what isn't in scope for numpy, but yeah, articulating them more clearly could be useful. Stuff like "output types and shape should be predictable from input types and shape", "numpy's core responsibilities are the array/dtype/ufunc interfaces, and providing a lingua franca for python numerical libraries to interoperate" (and therefore: "if it can live outside numpy it probably should"), etc. I'm seeing this as a living document (a NEP?) that tries to capture some rules of thumb and that we update as we go. That seems pretty different to me than a long list of yes/no checkboxes though?
2. We often argue about deprecations. Deprecations are costly, but so is keeping around functions that are not very useful or have a poor design. This may offer a middle ground. Don't let others repeat our mistakes, signal to users that a function is of questionable value, without breaking already written code.
The idea has come up a few times of having a "soft deprecation" level, where we put a warning in the docs but not in the code. It seems like a reasonable idea to me. It's inherently a kind of case-by-case thing that can be done incrementally. But, if someone wants to systematically work through all the docs and do the case-by-case analysis, that also seems like a reasonable idea to me. I'm not sure if that's the same as your proposal or not. -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 12:35 AM Nathaniel Smith <njs@pobox.com> wrote:
No, I feel like you just want to see a real proposal. At this point I've gotten some really useful feedback, in particular from Marten (thanks!), and I have a better idea of what to do. So I'll answer a few of your questions, and propose to leave the rest till I actually have some more solid to discuss. That will likely answer many of your questions.
Indeed, but that's a much harder problem to tackle. Again, there's a reason I put function behavior explicitly out of scope. Do you think this "one
bool per function" structure will be fine-grained enough for what you want to do?
yes
Very rough perhaps. I don't think we are on the same wavelength at all about the cost of adding new functions, the cost of deprecations, the use of submodules and even what's public or private right now. That can't be solved all at once, but I think what my idea will help with some of these. but yeah, articulating them more
All of these are valid questions. Most of that propably needs to be in the scope document (https://www.numpy.org/neps/scope.html). Which also needs to be improved. I'm seeing this as a living document (a NEP?) NEP would work. Although I'd prefer a way to be able to reference some fixed version of it rather than it being always in flux. that tries to capture
not the same, but related. Ralf
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sat, Jun 1, 2019 at 11:59 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Okay, that's fine. You scared me a bit with the initial email, but I really am trying to be helpful :-). I'm not looking for a detailed proposal; I'm just super confused right now about what you're trying to accomplish or how this table of yes/no values will help do it. I look forward to hearing more!
I'm seeing this as a living document (a NEP?)
NEP would work. Although I'd prefer a way to be able to reference some fixed version of it rather than it being always in flux.
When I say "living" I mean: it would be seen as documenting our consensus and necessarily fuzzy rather than normative and precise like most NEPs. Maybe this is obvious and not worth mentioning. But I wouldn't expect it to change rapidly. Unless our collective opinions change rapidly I guess, but that seems unlikely. (And of course NEPs are in git so we always have the ability to link to a point-in-time snapshot if we need to reference something.) -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 9:46 AM Nathaniel Smith <njs@pobox.com> wrote:
Thanks! I know this is going to be a little complicated to get everyone on the same page. That's why I'm aiming to get a draft out before SciPy'19 so there's a chance to discuss it in person with everyone who is there. Mailing lists are a poor interface. Will you be at SciPy?
Yeah, I'm going for useful rather than normative:) Maybe this is obvious and not worth mentioning. But I
Agreed. One perhaps unintended side effect of separating out the NEPs doc build from the full doc build is that we stopped shipping NEPs with our releases. It would be nicer to say "NEP as of 1.16" rather than "NEP as of commit 1324adf59". Ah well, that's for another time. Ralf
![](https://secure.gravatar.com/avatar/88ba0cd741b199eca4691fabe57ee67f.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 10:01 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Would it be useful if we could integrate the documentation system with a discussion forum (like Discourse.org)? Each function can be linked to its own discussion topic, where users and developers can discuss about the function, upvote or downvote it etc. This kind of discussion seems to be a bit more structured than a mailing list discussion. Dashamir
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 10:53 AM Dashamir Hoxha <dashohoxha@gmail.com> wrote:
A more modern forum is nice indeed. It is not strictly better than mailing lists though. So what I would like is a Discourse like interface on top of the mailing list, so we get the features you're talking about without a painful migration and breaking all links to threads in the archives. Mailman 3 does provide this (example: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/). I'm keeping an eye on what's going on with Mailman 3 migration of the python.org provided infrastructure. I think we can do this in the near to medium future. I don't want us to be the guinea pig though:) Cheers, Ralf
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 12:07 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
To save anyone else the trouble of posting this link, here's Guido's thumbs down on Discourse (and he's not the only one) as a replacement for Python mailing lists: https://discuss.python.org/t/disappointed-and-overwhelmed-by-discourse/982. Tastes vary:) Ralf
![](https://secure.gravatar.com/avatar/88ba0cd741b199eca4691fabe57ee67f.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 12:12 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
I did not suggest replacing the mailing lists with Discourse. I suggested integrating documentation with Discourse, so that for each function there is a separate discussion topic for this function. For each function on the documentation page there can be a "Feedback" or "Comment" link that goes to the corresponding discussion topic for that function. This way Discourse can be used like a commenting system (similar to Disqus). In the discussion page of the function people can upvote the function (using the "like" feature of Discourse) and can also explain why they think it is important. This may help building a consensus about which are the important or "core" functions of NumPy. Or maybe it doesn't have to be so complex after all, and mailing list discussions, combined with face-to-face discussions on conferences or online meetings can do it better. Dashamir
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 12:44 PM Dashamir Hoxha <dashohoxha@gmail.com> wrote:
Oh okay, I misunderstood you. I don't think that's desirable; it's too complicated and has too much overhead in setting up and maintaining. Between looking at libraries like Dask and Xtensor, tooling to measure actual API usage ( https://labs.quansight.org/blog/2019/05/python-package-function-usage/), and just using our own knowledge, we have enough information to make choices. Ralf
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 3:45 AM Dashamir Hoxha <dashohoxha@gmail.com> wrote:
We could make a giHub repo for a document, and use issues to separately discuss each topic. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
Re: Successful specifications (I’ll avoid using the word standard): Moving: HTML5/CSS3, C++, Rust, Python, Java. Static: C I’d really like this to be a moving spec... A static one is never much use, and is doomed to miss use cases, either today or some from the future. Best Regards, Hameer Abbasi
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
I quite like the idea of trying to be better at defining the API through tests - the substitution principle in action! Systematically applying tests to both ndarray and MaskedArray might be a start internally (just a pytest fixture away...). But definitely start with more of an overview. -- Marten
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
I would perhaps start with ndarray itself. Quite a lot seems superfluous Shapes: - need: shape, strides, reshape, transpose; - probably: ndim, size, T - less so: nbytes, ravel, flatten, squeeze, and swapaxes. Getting/setting: - need __getitem__, __setitem__; - less so: fill, put, take, item, itemset, repeat, compress, diagonal; Datatype/Copies/views/conversion - need: dtype, copy, view, astype, flags - less so: ctypes, dump, dumps, getfield, setfield, itemsize, byteswap, newbyteorder, resize, setflags, tobytes, tofile, tolist, tostring, Iteration - need __iter__ - less so: flat Numerics - need: conj, real, imag - maybe also: min, max, mean, sum, std, var, prod, partition, sort, tracet; - less so: the arg* ones, cumsum, cumprod, clip, round, dot, all, any, nonzero, ptp, searchsorted, choose. All the best, Marten
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Sun, 2019-06-02 at 08:38 +0200, Ralf Gommers wrote:
It is a bit tricky. I dislike flat for example, but it does have occasional use cases. min/max, etc. are interesting, in that they are just aliases to minimum.reduce, and could be argued to be covered by the ufunc. For other projects, the list of actual usage statistics may almost be more interesting then what we can come up with. Although it depends a bit where the long haul goes (but it seems right now that is not the proposal). For example, we could actually mark all our functions, and then you could test SciPy for "numpy-core" compatible (i.e. test the users). We could also want to work towards reference tests at some point. It would be a huge amount of work, but if other projects would want to help maintain it, maybe it can safe work in the long run? One thing that I think may be interesting, would be to attempt to make a graph of what functions can be implemented using other functions. Example: transpose <-> swapaxes <-> moveaxis Indexing: delete, insert (+empty_like) reshape: atleast_Nd, ravel (+ensure/copy) (and then find a minimal set, etc.) Many of these have multiple possible implementations, though. But if we could create even an indea of such a dependency graph, it could be very cool to find "what is important". Much of this is trivial, but maybe it could help to get a picture of where things might go. Anyway, it seems like a good idea to do something, but what that something is and how difficult it would be seems hard to judge. But I guess that should not stop us from moving. Maybe information of usage and groupings/opinions on importance is already a lot. Best, Sebastian
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
Exactly. This is great, thanks Marten. I agree with pretty much everything in this list.
For my part, a few things immediately popped out at my that I disagree with. ;-) Which does not mean it isn’t a useful exercise, but it does mean we should expect a fair bit of debate. But I do think we should be clear as to what the point is: I think it could be helpful for clarifying for new and long standing users of numpy what the “numpythonic” way to use numpy is. I think this is very closely tied to the duck typing discussion. But for guiding implementations of “numpy-like” libraries, not so much: they are going to implement the features their users need — whether it’s “officially” part of the numpy API is a minor concern. Unless there is an official “Standard”, but it doesn’t sound like anyone has that in mind. I’m also a bit confused as to the scope: is this effort about the python API only? In which case, I’m not sure how it relates to libraries in/for other languages. Or only about those that provide a Python binding? When I first read the topic of this thread, I expected it to be about the C API — it would be nice to clearly define what parts of the C API are considered public and stable. (Though maybe that’s already done — I do get numpy API deprecation warnings at times..) -CHB
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
Some of your categories here sound like they might be suitable for ABCs that provide mixin methods, which is something I think Hameer suggested in the past. Perhaps it's worth re-exploring that avenue. Eric On Sat, Jun 1, 2019, 18:18 Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 2:21 PM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Indeed, and of course for __array_ufunc__ we moved there a bit already, with `NDArrayOperatorsMixin` [1]. One could certainly similarly have NDShapingMixin that, e.g., relied on `shape`, `reshape`, and `transpose` to implement `ravel`, `swapaxes`, etc. And indeed use those mixins in `ndarray` itself. For this also having a summary of base functions/methods would be very helpful. -- Marten
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Sun, Jun 2, 2019 at 1:08 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I would definitely support writing more mixins and helper functions (either in NumPy, or externally) to make it easier to re-implement NumPy's public API. Certainly there is plenty of room to make it easier to leverage __array_ufunc__ and __array_function__. For some recent examples of what these helpers functions could look like, see JAX's implementation of NumPy, which is written in terms of a much smaller array library called LAX: https://github.com/google/jax/blob/9dfe27880517d5583048e7a3384b504681968fb4/... Hypothetically, JAX could be written on top of a "restricted NumPy" instead, which in turn could have an implementation written in LAX. This would facilitate reusing JAX's higher level functions for automatic differentiation and vectorization on top of different array backends. I would also be happy to see guidance for NumPy API re-implementers, both for those scratching from scratch (e.g., in a new language) or who plan to copy NumPy's Python API (e.g., with __array_function__). I would focus on: 1. Describing the tradeoffs of challenging design decisions that NumPy may have gotten wrong, e.g., scalars and indexing. 2. Describing common "gotchas" where it's easy to deviate from NumPy's semantics unintentionally, e.g., with scalar arithmetic dtypes or indexing edge cases. I would *not* try to identify a "core" list of methods/functionality to implement. Everyone uses their own slice of NumPy's API, so the rational approach for anyone trying to reimplement exactly (i.e., with __array_function__) is to start with a minimal subset and add functionality on demand to meet user's needs. Also, many of the choices involved in making an array library don't really have objectively right or wrong answers, and authors are going to make intentional deviations from NumPy's semantics when it makes sense for them. Cheers, Stephan
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
I would agree that the set should be minimal at first, but would comment that we should still have a better taxonomy of functions that should be supported, in terms of the functionality they provide and functionality that is required for them to work. E.g. __setitem__ needs immutability. Best Regards, Hameer Abbasi
![](https://secure.gravatar.com/avatar/d9ac9213ada4a807322f99081296784b.jpg?s=120&d=mm&r=g)
Hi Marten, On Sat, 01 Jun 2019 12:11:38 -0400, Marten van Kerkwijk wrote:
How hard do you think it would be to address this issue? You seem to have some notion of which pain points should be prioritized, and it might be useful to jot those down somewhere (tracking issue on GitHub?). Stéfan
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Stefan, On Mon, Jun 3, 2019 at 4:26 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
The python side would, I think, not be too hard. But I don't really have that much of a notion - it would very much be informed by making a list first. For the C parts, I feel even more at a loss: one really would have to start with a summary of what is actually there (and I think the organization may well be quite logical already; I've not so felt it was wrong as in need of an overview). Somewhat of an aside, but relevant for the general discussion: updating/rewriting the user documentation may well be the best *first* step. It certainly doesn't hurt to try to make some list now, but my guess that the best one will emerge only when one tries to summarize what a new user should know/understand. All the best, Marten
participants (15)
-
Charles R Harris
-
Chris Barker
-
Chris Barker - NOAA Federal
-
Dashamir Hoxha
-
Eric Wieser
-
Hameer Abbasi
-
Mark Mikofski
-
Marten van Kerkwijk
-
Matti Picus
-
Nathaniel Smith
-
Ralf Gommers
-
Sebastian Berg
-
Stefan van der Walt
-
Stephan Hoyer
-
William Ray Wing