[Numpy-discussion] asarray/anyarray; matrix/subclass

Sat Nov 10 15:16:28 EST 2018

On Sat, Nov 10, 2018 at 9:49 AM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi Hameer,
>
> I do not think we should change `asanyarray` itself to special-case
> matrix; rather, we could start converting `asarray` to `asanyarray` and
> solve the problems that produces for matrices in `matrix` itself (e.g., by
> overriding the relevant function with `__array_function__`).
>
> I think the idea of providing an `__anyarray__` method (in analogy with
> `__array__`) might work. Indeed, the default in `ndarray` (and thus all its
> subclasses) could be to let it return `self`  and to override it for
> `matrix` to return an ndarray view.
>

Yes, we certainly would rather implement a matrix.__anyarray__ method (if
we're already doing a new protocol) rather than special case np.matrix
explicitly.

Unfortunately, per Nathaniel's comments about NA skipping behavior, it
seems like we will also need MaskedArray.__anyarray__ to return something
other than itself. In principle, we should probably write new version of
MaskedArray that doesn't deviate from ndarray semantics, but that's a
rather large project (we'd also probably want to stop subclassing ndarray).

Changing the default aggregation behavior for the existing MaskedArray is
also an option but that would be a serious annoyance to users and backwards
compatibility break. If the only way MaskedArray violates Liskov is in
terms of NA skipping aggregations by default, then this might be viable. In
practice, this would require adding an explicit skipna argument so
FutureWarnings could be silenced. The plus side of this option is that it
would make it easier to use np.anyarray() or any new coercion function
throughout the internal NumPy code base.

To summarize, I think these are our options:
1. Change the behavior of np.anyarray() to check for an __anyarray__()
protocol. Change np.matrix.__anyarray__() to return a base numpy array
(this is a minor backwards compatibility break, but probably for the best).
Start issuing a FutureWarning for any MaskedArray operations that violate
Liskov and add a skipna argument that in the future will default to
skipna=False.
2. Introduce a new coercion function, e.g., np.duckarray(). This is the
easiest option because we don't need to cleanup NumPy's existing ndarray
subclasses.

P.S. I'm just glad pandas stopped subclassing ndarray a while ago --
there's no way pandas.Series() could be fixed up to not violate Liskov :).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20181110/c0f17017/attachment.html>