[Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

Nathaniel Smith njs at pobox.com
Mon Sep 2 17:09:02 EDT 2019


On Mon, Sep 2, 2019 at 2:15 AM Hameer Abbasi <einstein.edison at gmail.com> wrote:
> Me, Ralf Gommers and Peter Bell (both cc’d) have come up with a proposal on how to solve the array creation and duck array problems. The solution is outlined in NEP-31, currently in the form of a PR, [1]

Thanks for putting this together! It'd be great to have more
engagement between uarray and numpy.

> ============================================================
>
> NEP 31 — Context-local and global overrides of the NumPy API
>
> ============================================================

Now that I've read this over, my main feedback is that right now it
seems too vague and high-level to give it a fair evaluation? The idea
of a NEP is to lay out a problem and proposed solution in enough
detail that it can be evaluated and critiqued, but this felt to me
more like it was pointing at some other documents for all the details
and then promising that uarray has solutions for all our problems.

> This NEP takes a more holistic approach: It assumes that there are parts of the API that need to be
> overridable, and that these will grow over time. It provides a general framework and a mechanism to
> avoid a design of a new protocol each time this is required.

The idea of a holistic approach makes me nervous, because I'm not sure
we have holistic problems. Sometimes a holistic approach is the right
thing; other times it means sweeping the actual problems under the
rug, so things *look* simple and clean but in fact nothing has been
solved, and they just end up biting us later. And from the NEP as
currently written, I can't tell whether this is the good kind of
holistic or the bad kind of holistic.

Now I'm writing vague handwavey things, so let me follow my own advice
and make it more concrete with an example :-).

When Stephan and I were writing NEP 22, the single thing we spent the
most time discussing was the problem of duck-array coercion, and in
particular what to do about existing code that does
np.asarray(duck_array_obj).

The reason this is challenging is that there's a lot of code written
in Cython/C/C++ that calls np.asarray, and then blindly casts the
return value to a PyArray struct and starts accessing the raw memory
fields. If np.asarray starts returning anything besides a real-actual
np.ndarray object, then this code will start corrupting random memory,
leading to a segfault at best.

Stephan felt strongly that this meant that existing np.asarray calls
*must not* ever return anything besides an np.ndarray object, and
therefore we needed to add a new function np.asduckarray(), or maybe
an explicit opt-in flag like np.asarray(..., allow_duck_array=True).

I agreed that this was a problem, but thought we might be able to get
away with an "opt-out" system, where we add an allow_duck_array= flag,
but make it *default* to True, and document that the Cython/C/C++
users who want to work with a raw np.ndarray object should modify
their code to explicitly call np.asarray(obj, allow_duck_array=False).
This would mean that for a while people who tried to pass duck-arrays
into legacy library would get segfaults, but there would be a clear
path for fixing these issues as they were discovered.

Either way, there are also some other details to figure out: how does
this affect the C version of asarray? What about np.asfortranarray –
probably that should default to allow_duck_array=False, even if we did
make np.asarray default to allow_duck_array=True, right?

Now if I understand right, your proposal would be to make it so any
code in any package could arbitrarily change the behavior of
np.asarray for all inputs, e.g. I could just decide that
np.asarray([1, 2, 3]) should return some arbitrary non-np.ndarray
object. It seems like this has a much greater potential for breaking
existing Cython/C/C++ code, and the NEP doesn't currently describe why
this extra power is useful, and it doesn't currently describe how it
plans to mitigate the downsides. (For example, if a caller needs a
real np.ndarray, then is there some way to explicitly request one? The
NEP doesn't say.) Maybe this is all fine and there are solutions to
these issues, but any proposal to address duck array coercion needs to
at least talk about these issues!

And that's just one example... array coercion is a particularly
central and tricky problem, but the numpy API big, and there are
probably other problems like this. For another example, I don't
understand what the NEP is proposing to do about dtypes at all.

That's why I think the NEP needs to be fleshed out a lot more before
it will be possible to evaluate fairly.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the NumPy-Discussion mailing list