[Numpy-discussion] asarray/anyarray; matrix/subclass

Nathaniel Smith njs at pobox.com
Fri Nov 9 18:45:46 EST 2018


But matrix isn't the only problem with asanyarray. np.ma also violates
Liskov. No doubt there are other problematic ndarray subclasses out
there too...

If we were going to try to reuse asanyarray through some deprecation
mechanism, I think we'd need to deprecate allowing asanyarray to
return *any* ndarray subclass, unless they explicitly provided an
__asanyarray__ dunder. But at that point I'm not sure what the point
would be of reusing it.

On Fri, Nov 9, 2018 at 7:15 AM, Hameer Abbasi <einstein.edison at gmail.com> wrote:
> Begin forwarded message:
>
> From: Stephan Hoyer
> Date: Friday, Nov 09, 2018 at 3:19 PM
> To: Hameer Abbasi
> Cc: Stefan van der Walt , Marten van Kerkwijk
> Subject: asarray/anyarray; matrix/subclass
>
> This is a great discussion, but let's try to have it in public (e.g., on the
> NumPy mailing list).
> On Fri, Nov 9, 2018 at 8:42 AM Hameer Abbasi <einstein.edison at gmail.com>
> wrote:
>>
>> Hi Stephan,
>>
>> The issue I have with writing another function is that asarray/asanyarray
>> are so widely used that it’d be a huge maintenance task to update them
>> throughout NumPy, not to mention other codebases, not to mention other
>> codebases having to rely on newer NumPy versions for this. In short, it
>> would dramatically reduce adaptability of this function.
>>
>> One path we can take is to allow asarray/asanyarray to be overridable via
>> __array_function__ (the former is debatable). This solves most of our
>> duck-array related issues without introducing another protocol.
>>
>> Regardless of what path we choose, I would recommend changing asanyarray
>> to not pass through np.matrix regardless, instead passing through
>> mat.view(type=np.ndarray) instead, which has O(1) cost and memory. In the
>> vast majority of contexts, it’s used to ensure an array-ish structure for
>> another operation, and usually there’s no guarantee that what comes out will
>> be a matrix anyway. I suggest we raise a FutureWarning and then change this
>> behaviour.
>>
>> There have been a number of discussions about deprecating np.matrix (and a
>> few about MaskedArray as well, though there are less compelling reasons for
>> that one). I suggest we start down that path as soon as possible. The
>> biggest (only?) user I know of blocking that is scipy.sparse, and we’re on
>> our way to replacing that with PyData/Sparse.
>>
>> Best Regards,
>> Hameer Abbasi
>>
>> On Friday, Nov 09, 2018 at 1:26 AM, Stephan Hoyer <shoyer at gmail.com>
>> wrote:
>> Hi Hameer,
>>
>> I'd love to talk about this in more detail. I agree that something like
>> this is needed.
>>
>> The challenge with reusing an existing function like asanyarray() is that
>> there is at least one (somewhat?) widely used ndarray subclass that badly
>> violates the Liskov Substitution Principle: np.matrix.
>>
>> NumPy can't really use np.asanyarray() widely for internal purposes until
>> we don't have to worry about np matrix. We might special case np.matrix in
>> some way, but then asanyarray() would do totally opposite things on
>> different versions of NumPy. It's almost certainly a better idea to just
>> write a new function with the desired semantics, and "soft deprecate"
>> asanyarray(). The new function can explicitly black list np.matrix, as well
>> as any other subclasses we know of that badly violate LSP.
>>
>> Cheers,
>> Stephan
>> On Thu, Nov 8, 2018 at 5:06 PM Hameer Abbasi <einstein.edison at gmail.com>
>> wrote:
>>>
>>> No, Stefan, I’ll do that now. Putting you in the cc.
>>>
>>> It slipped my mind among the million other things I had in mind — Namely:
>>> My job visa. It was only done this Monday.
>>>
>>> Hi, Marten, Stephan:
>>>
>>> Stefan wants me to write up a NEP that allows a given object to specify
>>> that it is a duck array — Namely, that it follows duck-array semantics.
>>>
>>> We were thinking of switching asanyarray to switch to passing through
>>> anything that implements the duck-array protocol along with ndarray
>>> subclasses. I’m sure this would help XArray and Quantity work better with
>>> existing codebases, along with PyData/Sparse arrays.
>>>
>>> Would you be interested?
>>>
>>> Best Regards,
>>> Hameer Abbasi
>>>
>>> On Thursday, Nov 08, 2018 at 9:09 PM, Stefan van der Walt
>>> <stefanv at berkeley.edu> wrote:
>>> Hi Hameer,
>>>
>>> In last week's meeting, we had the following in the notes:
>>>
>>> Hameer is contacting Marten & Stephan and write up a draft NEP for
>>> clarifying the asarray/asanyarray and matrix/subclass path forward.
>>>
>>>
>>> Did any of that happen that you could share?
>>>
>>> Thanks and best regards,
>>> Stéfan
>
>
> Hello, everyone,
>
> Me, Stefan van der Walt, Stephan Hoyer and Marten van Kerkwijk were having a
> discussion about the state of matrix, asarray and asanyarray. Our thoughts
> are summarised above (in the quoted text that I’m forwarding)
>
> Basically, this grew out of a discussion relating to asanyarray/asarray
> inconsistencies in NumPy about which to use where. Historically, asarray was
> used in many libraries/places instead of asanyarray usually because
> np.matrix caused problems due to its special behaviour with regard to
> indexing (it always returns a 2-D object when eliminating one dimension, but
> a 0-D one when eliminating both), its behaviour regarding __mul__ (the
> multiplication operator represents matrix multiplication rather than
> element-wise multiplication) and its fixed dimensionality (matrix is 2D
> only). Because of these three things, as Stephan accurately pointed out, it
> violates the Liskov Substitution Principle.
>
> Because of this behaviour, many libraries switched from using asanyarray to
> asarray, as np.matrix wouldn’t work with their code. This shut out other
> matrix subclasses from being used as well, such as MaskedArray and
> astropy.Quantity. Even if asanyarray is used, there is usually no guarantee
> that a matrix will be returned instead of an array.
>
> The changes I’m proposing are twofold, but simple:
>
> asanyarray should return mat.view(type=np.ndarray) instead of matrices,
> after an appropriate time with a FutureWarning. This allows us to preserve
> the performance (Creating a view is O(1) both in memory and time), and the
> mutability of the original matrix. This change should happen after a
> FutureWarning and the usual grace period.
> In the spirit of allowing duck-arrays to work with existing NumPy code,
> asanyarray should be overridable via __array_function__, so that duck arrays
> can decide whether to pass themselves through. If subclasses are allowed, so
> should ducka-arrays as well.
>
> This is a part of a larger effort to deprecate np.matrix. As far as I’m
> aware, it has one big customer (scipy.sparse). The effort to replace that is
> already underway at PyData/Sparse.
>
> Best Regards,
> Hameer Abbasi
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>



-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the NumPy-Discussion mailing list