[Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

Nathaniel Smith njs at pobox.com
Fri Mar 9 04:29:17 EST 2018


On Thu, Mar 8, 2018 at 7:06 AM, Marten van Kerkwijk
<m.h.vankerkwijk at gmail.com> wrote:
> A larger comment: you state that you think `np.asanyarray` is a
> mistake since `np.matrix` and `np.ma.MaskedArray` would pass through
> and that those do not strictly mimic `NDArray`. Here, I agree with
> `matrix` (but since we're deprecating it, let's remove that from the
> discussion), but I do not see how your proposed interface would not
> let `MaskedArray` pass through, nor really that one would necessarily
> want that.

We can discuss whether MaskedArray should be an AbstractArray.
Conceptually it probably should be; I think that was a goal of the
MaskedArray authors (even if they wouldn't have put it that way). In
practice there are a lot of funny quirks in MaskedArray, so I'd want
to look more carefully in case there are weird incompatibilities that
would cause problems. Note that we can figure this out after the NEP
is finished, too.

I wonder if the matplotlib folks have any thoughts on this? I know
they're one of the more prominent libraries that tries to handle both
regular and masked arrays, so maybe they could comment on how often
they run

> I think it may be good to distinguish two separate cases:
> 1. Everything has exactly the same meaning as for `ndarray` but the
> data is stored differently (i.e., only `view` does not work). One can
> thus expect that for `output = function(inputs)`, at the end all
> `duck_output == ndarray_output`.
> 2. Everything is implemented but operations may give different output
> (depending on masks for masked arrays, units for quantities, etc.), so
> generally `duck_output != ndarray_output`.
>
> Which one of these are you aiming at? By including
> `NDArrayOperatorsMixin`, it would seem option (2), but perhaps not? Is
> there a case for both separately?

Well, (1) is much easier to design around, because it's well-defined
:-). And I'm not sure that there's a principled difference between
regular arrays and masked arrays/quantity arrays; these *could* be
ndarray objects with special dtypes and extra methods, neither of
which would disqualify you from being a "case 1" array.

(I guess one issue is that because MaskedArray ignores the mask by
default, you could get weird results from things like mean
calculations: np.sum(masked_arr) / np.prod(masked_arr.shape) does not
give the right result. This isn't an issue for quantities, though, or
for an R-style NA that propagated by default.)

> Smaller general comment: at least in the NEP I would not worry about
> deprecating `NDArrayOperatorsMixin` - this may well be handy in itself
> (for things that implement `__array_ufunc__` but do not have shape,
> etc. (I have been doing some work on creating ufunc chains that would
> use this -- but they definitely are not array-like). Similarly, I
> think there is room for an `NDArrayShapeMixin` which might help with
> `concatenate` and friends.

Fair enough.

> Finally, on the name: `asarray` and `asanyarray` are just shims over
> `array`, so one option would be to add an argument in `array` (or
> broaden the scope of `subok`).

We definitely don't want to broaden the scope of 'subok', because one
of the goals here is to have something that projects like sklearn can
use, and they won't use subok :-). (In particular, np.matrix is
definitely not a duck array of any kind.)

And supporting array() is tricky, because then you have to figure out
what to do with the copy=, order=, subok=, ndmin= arguments. copy= in
particular is tricky given that we don't know the object's type! I
guess we could call obj.copy() or something... but for this first
iteration it seemed simplest to make a new function that just has the
most important stuff for writing generic functions that accept duck
arrays.

What we could do is, in addition to adding some kind of
asabstractarray() function, *also* make it so asanyarray() starts
accepting abstract/duck arrays, on the theory that anyone who's
willing to put up with asanyarrays()'s weak guarantees won't notice if
we weaken them a bit more. Honestly though I'd rather just not touch
asanyarray at all, and maybe even deprecate it someday.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the NumPy-Discussion mailing list