![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Mon, Sep 2, 2019 at 2:15 AM Hameer Abbasi <einstein.edison@gmail.com> wrote:
Me, Ralf Gommers and Peter Bell (both cc’d) have come up with a proposal on how to solve the array creation and duck array problems. The solution is outlined in NEP-31, currently in the form of a PR, [1]
Thanks for putting this together! It'd be great to have more engagement between uarray and numpy.
============================================================
NEP 31 — Context-local and global overrides of the NumPy API
============================================================
Now that I've read this over, my main feedback is that right now it seems too vague and high-level to give it a fair evaluation? The idea of a NEP is to lay out a problem and proposed solution in enough detail that it can be evaluated and critiqued, but this felt to me more like it was pointing at some other documents for all the details and then promising that uarray has solutions for all our problems.
This NEP takes a more holistic approach: It assumes that there are parts of the API that need to be overridable, and that these will grow over time. It provides a general framework and a mechanism to avoid a design of a new protocol each time this is required.
The idea of a holistic approach makes me nervous, because I'm not sure we have holistic problems. Sometimes a holistic approach is the right thing; other times it means sweeping the actual problems under the rug, so things *look* simple and clean but in fact nothing has been solved, and they just end up biting us later. And from the NEP as currently written, I can't tell whether this is the good kind of holistic or the bad kind of holistic. Now I'm writing vague handwavey things, so let me follow my own advice and make it more concrete with an example :-). When Stephan and I were writing NEP 22, the single thing we spent the most time discussing was the problem of duck-array coercion, and in particular what to do about existing code that does np.asarray(duck_array_obj). The reason this is challenging is that there's a lot of code written in Cython/C/C++ that calls np.asarray, and then blindly casts the return value to a PyArray struct and starts accessing the raw memory fields. If np.asarray starts returning anything besides a real-actual np.ndarray object, then this code will start corrupting random memory, leading to a segfault at best. Stephan felt strongly that this meant that existing np.asarray calls *must not* ever return anything besides an np.ndarray object, and therefore we needed to add a new function np.asduckarray(), or maybe an explicit opt-in flag like np.asarray(..., allow_duck_array=True). I agreed that this was a problem, but thought we might be able to get away with an "opt-out" system, where we add an allow_duck_array= flag, but make it *default* to True, and document that the Cython/C/C++ users who want to work with a raw np.ndarray object should modify their code to explicitly call np.asarray(obj, allow_duck_array=False). This would mean that for a while people who tried to pass duck-arrays into legacy library would get segfaults, but there would be a clear path for fixing these issues as they were discovered. Either way, there are also some other details to figure out: how does this affect the C version of asarray? What about np.asfortranarray – probably that should default to allow_duck_array=False, even if we did make np.asarray default to allow_duck_array=True, right? Now if I understand right, your proposal would be to make it so any code in any package could arbitrarily change the behavior of np.asarray for all inputs, e.g. I could just decide that np.asarray([1, 2, 3]) should return some arbitrary non-np.ndarray object. It seems like this has a much greater potential for breaking existing Cython/C/C++ code, and the NEP doesn't currently describe why this extra power is useful, and it doesn't currently describe how it plans to mitigate the downsides. (For example, if a caller needs a real np.ndarray, then is there some way to explicitly request one? The NEP doesn't say.) Maybe this is all fine and there are solutions to these issues, but any proposal to address duck array coercion needs to at least talk about these issues! And that's just one example... array coercion is a particularly central and tricky problem, but the numpy API big, and there are probably other problems like this. For another example, I don't understand what the NEP is proposing to do about dtypes at all. That's why I think the NEP needs to be fleshed out a lot more before it will be possible to evaluate fairly. -n -- Nathaniel J. Smith -- https://vorpus.org