[Numpy-discussion] asarray/anyarray; matrix/subclass

Sat Nov 10 16:15:16 EST 2018

> If the only way MaskedArray violates Liskov is in terms of NA skipping
aggregations by default, then this might be viable

One of the ways to fix these liskov substitution problems is just to
introduce more base classes - for instance, if we had an `NDContainer` base
class with only slicing support, then masked arrays would be an exact
liskov substitution, but np.matrix would not.

Eric

On Sat, 10 Nov 2018 at 12:17 Stephan Hoyer <shoyer at gmail.com> wrote:

> On Sat, Nov 10, 2018 at 9:49 AM Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
>
>> Hi Hameer,
>>
>> I do not think we should change `asanyarray` itself to special-case
>> matrix; rather, we could start converting `asarray` to `asanyarray` and
>> solve the problems that produces for matrices in `matrix` itself (e.g., by
>> overriding the relevant function with `__array_function__`).
>>
>> I think the idea of providing an `__anyarray__` method (in analogy with
>> `__array__`) might work. Indeed, the default in `ndarray` (and thus all its
>> subclasses) could be to let it return `self`  and to override it for
>> `matrix` to return an ndarray view.
>>
>
> Yes, we certainly would rather implement a matrix.__anyarray__ method (if
> we're already doing a new protocol) rather than special case np.matrix
> explicitly.
>
> Unfortunately, per Nathaniel's comments about NA skipping behavior, it
> seems like we will also need MaskedArray.__anyarray__ to return something
> other than itself. In principle, we should probably write new version of
> MaskedArray that doesn't deviate from ndarray semantics, but that's a
> rather large project (we'd also probably want to stop subclassing ndarray).
>
> Changing the default aggregation behavior for the existing MaskedArray is
> also an option but that would be a serious annoyance to users and backwards
> compatibility break. If the only way MaskedArray violates Liskov is in
> terms of NA skipping aggregations by default, then this might be viable. In
> practice, this would require adding an explicit skipna argument so
> FutureWarnings could be silenced. The plus side of this option is that it
> would make it easier to use np.anyarray() or any new coercion function
> throughout the internal NumPy code base.
>
> To summarize, I think these are our options:
> 1. Change the behavior of np.anyarray() to check for an __anyarray__()
> protocol. Change np.matrix.__anyarray__() to return a base numpy array
> (this is a minor backwards compatibility break, but probably for the best).
> Start issuing a FutureWarning for any MaskedArray operations that violate
> Liskov and add a skipna argument that in the future will default to
> skipna=False.
> 2. Introduce a new coercion function, e.g., np.duckarray(). This is the
> easiest option because we don't need to cleanup NumPy's existing ndarray
> subclasses.
>
> P.S. I'm just glad pandas stopped subclassing ndarray a while ago --
> there's no way pandas.Series() could be fixed up to not violate Liskov :).
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20181110/f92d4007/attachment-0001.html>