[Numpy-discussion] new NEP: np.AbstractArray and np.asabstractarray

Sebastian Berg sebastian at sipsolutions.net
Fri Mar 9 05:51:21 EST 2018


On Thu, 2018-03-08 at 18:56 +0000, Stephan Hoyer wrote:
> Hi Nathaniel,
> 
> Thanks for starting the discussion!
> 
> Like Marten says, I think it would be useful to more clearly define
> what it means to be an abstract array. ndarray has lots of
> methods/properties that expose internal implementation (e.g., view,
> strides) that presumably we don't want to require as part of this
> interfaces. On the other hand, dtype and shape are almost assuredly
> part of this interface.
> 
> To help guide the discussion, it would be good to identify concrete
> examples of types that should and should not satisfy this interface,
> e.g.,
> Marten's case 1: works exactly like ndarray, but stores data
> differently: parallel arrays (e.g., dask.array), sparse arrays (e.g.,
> https://github.com/pydata/sparse), hypothetical non-strided arrays
> (e.g., always C ordered).
> Marten's case 2: same methods as ndarray, but gives different
> results: np.ma.MaskedArray, arrays with units (quantities), maybe
> labeled arrays like xarray.DataArray
> 
> I don't think we have a hope of making a single base class for case 2
> work with everything in NumPy, but we can define interfaces with
> different levels of functionality.


True, but I guess the aim is not to care at all about how things are
implemented (so only 2)? I agree that we can aim to be as close as
possible, but should not expect to reach it.
My personal opinion:

1. To do this, we should start it "experimentally"

2. We need something like a reference implementation. First, because it
allows testing whether a function e.g. in numpy is actually abstract-
safe and second because it will be the only way to find out what our
minimal abstract interface actually is (assuming we have started 3).

3. Go ahead with putting it into numpy functions and see how much you
need to make them work. In the end, my guess is, everything that works
for MaskedArrays and xarray is a pretty safe bet.

I disagree with the statement that we do not need to define the minimal
reference. In practice we do as soon as we use it for numpy functions.

- Sebastian


> 
> Because there is such a gradation of "duck array" types, I agree with
> Marten that we should not deprecate NDArrayOperatorsMixin. It's
> useful for types like xarray.Dataset that define __array_ufunc__ but
> cannot satisfy the full abstract array interface.
> 
> Finally for the name, what about `asduckarray`? Thought perhaps that
> could be a source of confusion, and given the gradation of duck array
> like types.
> 
> Cheers,
> Stephan
> 
> On Thu, Mar 8, 2018 at 7:07 AM Marten van Kerkwijk <m.h.vankerkwijk at g
> mail.com> wrote:
> > Hi Nathaniel,
> > 
> > Overall, hugely in favour!  For detailed comments, it would be good
> > to
> > have a link to a PR; could you put that up?
> > 
> > A larger comment: you state that you think `np.asanyarray` is a
> > mistake since `np.matrix` and `np.ma.MaskedArray` would pass
> > through
> > and that those do not strictly mimic `NDArray`. Here, I agree with
> > `matrix` (but since we're deprecating it, let's remove that from
> > the
> > discussion), but I do not see how your proposed interface would not
> > let `MaskedArray` pass through, nor really that one would
> > necessarily
> > want that.
> > 
> > I think it may be good to distinguish two separate cases:
> > 1. Everything has exactly the same meaning as for `ndarray` but the
> > data is stored differently (i.e., only `view` does not work). One
> > can
> > thus expect that for `output = function(inputs)`, at the end all
> > `duck_output == ndarray_output`.
> > 2. Everything is implemented but operations may give different
> > output
> > (depending on masks for masked arrays, units for quantities, etc.),
> > so
> > generally `duck_output != ndarray_output`.
> > 
> > Which one of these are you aiming at? By including
> > `NDArrayOperatorsMixin`, it would seem option (2), but perhaps not?
> > Is
> > there a case for both separately?
> > 
> > Smaller general comment: at least in the NEP I would not worry
> > about
> > deprecating `NDArrayOperatorsMixin` - this may well be handy in
> > itself
> > (for things that implement `__array_ufunc__` but do not have shape,
> > etc. (I have been doing some work on creating ufunc chains that
> > would
> > use this -- but they definitely are not array-like). Similarly, I
> > think there is room for an `NDArrayShapeMixin` which might help
> > with
> > `concatenate` and friends.
> > 
> > Finally, on the name: `asarray` and `asanyarray` are just shims
> > over
> > `array`, so one option would be to add an argument in `array` (or
> > broaden the scope of `subok`).
> > 
> > As an explicit suggestion, one could introduce a `duck` or
> > `abstract`
> > argument to `array` which is used in `asarray` and `asanyarray` as
> > well (corresponding to options 1 and 2), and eventually default to
> > something sensible (I would think `False` for `asarray` and `True`
> > for
> > `asanyarray`).
> > 
> > All the best,
> > 
> > Marten
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180309/6fcf57bf/attachment.sig>


More information about the NumPy-Discussion mailing list