[Numpy-discussion] asarray/anyarray; matrix/subclass

Fri Nov 9 18:39:13 EST 2018

I’m still not sure I agree with the advantages of reusing asanyarray(),
even if matrix did not exist. Yes, asanyarray will exist in old NumPy
versions, but you can’t use it with sparse arrays anyways because it will
have the wrong semantics. I expect this would be a bug magnet, with
inadvertent loading of sparse arrays into memory if you’re accidentally
using old NumPy.

With regards to the protocol, I would suggest a dedicated method, e.g.,
__asanyarray__ (or something similar based on the final chosen name of the
function). Coercing to arrays is special enough to have its own dedicated
protocol, and it could be useful for libraries like xarray to check for
__asanyarray__ attributes before deciding which coercion mechanism to use.
On Fri, Nov 9, 2018 at 10:17 AM Hameer Abbasi <einstein.edison at gmail.com>
wrote:

> Begin forwarded message:
>
> From: Stephan Hoyer
> Date: Friday, Nov 09, 2018 at 3:19 PM
> To: Hameer Abbasi
> Cc: Stefan van der Walt , Marten van Kerkwijk
> Subject: asarray/anyarray; matrix/subclass
>
> This is a great discussion, but let's try to have it in public (e.g., on
> the NumPy mailing list).
> On Fri, Nov 9, 2018 at 8:42 AM Hameer Abbasi <einstein.edison at gmail.com>
> wrote:
>
>> Hi Stephan,
>>
>> The issue I have with writing another function is that asarray/asanyarray
>> are so widely used that it’d be a huge maintenance task to update them
>> throughout NumPy, not to mention other codebases, not to mention other
>> codebases having to rely on newer NumPy versions for this. In short, it
>> would dramatically reduce adaptability of this function.
>>
>> One path we can take is to allow asarray/asanyarray to be overridable via
>> __array_function__ (the former is debatable). This solves most of our
>> duck-array related issues without introducing another protocol.
>>
>> Regardless of what path we choose, I would recommend changing asanyarray
>> to not pass through np.matrix regardless, instead passing through
>> mat.view(type=np.ndarray) instead, which has O(1) cost and memory. In the
>> vast majority of contexts, it’s used to ensure an array-ish structure for
>> another operation, and usually there’s no guarantee that what comes out
>> will be a matrix anyway. I suggest we raise a FutureWarning and then change
>> this behaviour.
>>
>> There have been a number of discussions about deprecating np.matrix (and
>> a few about MaskedArray as well, though there are less compelling reasons
>> for that one). I suggest we start down that path as soon as possible. The
>> biggest (only?) user I know of blocking that is scipy.sparse, and we’re on
>> our way to replacing that with PyData/Sparse.
>>
>> Best Regards,
>> Hameer Abbasi
>>
>> On Friday, Nov 09, 2018 at 1:26 AM, Stephan Hoyer <shoyer at gmail.com>
>> wrote:
>> Hi Hameer,
>>
>> I'd love to talk about this in more detail. I agree that something like
>> this is needed.
>>
>> The challenge with reusing an existing function like asanyarray() is that
>> there is at least one (somewhat?) widely used ndarray subclass that badly
>> violates the Liskov Substitution Principle: np.matrix.
>>
>> NumPy can't really use np.asanyarray() widely for internal purposes until
>> we don't have to worry about np matrix. We might special case np.matrix in
>> some way, but then asanyarray() would do totally opposite things on
>> different versions of NumPy. It's almost certainly a better idea to just
>> write a new function with the desired semantics, and "soft deprecate"
>> asanyarray(). The new function can explicitly black list np.matrix, as well
>> as any other subclasses we know of that badly violate LSP.
>>
>> Cheers,
>> Stephan
>> On Thu, Nov 8, 2018 at 5:06 PM Hameer Abbasi <einstein.edison at gmail.com>
>> wrote:
>>
>>> No, Stefan, I’ll do that now. Putting you in the cc.
>>>
>>> It slipped my mind among the million other things I had in mind —
>>> Namely: My job visa. It was only done this Monday.
>>>
>>> Hi, Marten, Stephan:
>>>
>>> Stefan wants me to write up a NEP that allows a given object to specify
>>> that it is a duck array — Namely, that it follows duck-array semantics.
>>>
>>> We were thinking of switching asanyarray to switch to passing through
>>> anything that implements the duck-array protocol along with ndarray
>>> subclasses. I’m sure this would help XArray and Quantity work better with
>>> existing codebases, along with PyData/Sparse arrays.
>>>
>>> Would you be interested?
>>>
>>> Best Regards,
>>> Hameer Abbasi
>>>
>>> On Thursday, Nov 08, 2018 at 9:09 PM, Stefan van der Walt <
>>> stefanv at berkeley.edu> wrote:
>>> Hi Hameer,
>>>
>>> In last week's meeting, we had the following in the notes:
>>>
>>> Hameer is contacting Marten & Stephan and write up a draft NEP for
>>> clarifying the asarray/asanyarray and matrix/subclass path forward.
>>>
>>>
>>> Did any of that happen that you could share?
>>>
>>> Thanks and best regards,
>>> Stéfan
>>>
>>>
> Hello, everyone,
>
> Me, Stefan van der Walt, Stephan Hoyer and Marten van Kerkwijk were having
> a discussion about the state of matrix, asarray and asanyarray. Our
> thoughts are summarised above (in the quoted text that I’m forwarding)
>
> Basically, this grew out of a discussion relating to asanyarray/asarray
> inconsistencies in NumPy about which to use where. Historically, asarray
> was used in many libraries/places instead of asanyarray usually because
> np.matrix caused problems due to its special behaviour with regard to
> indexing (it always returns a 2-D object when eliminating one dimension,
> but a 0-D one when eliminating both), its behaviour regarding __mul__ (the
> multiplication operator represents matrix multiplication rather than
> element-wise multiplication) and its fixed dimensionality (matrix is 2D
> only). Because of these three things, as Stephan accurately pointed out, it
> violates the Liskov Substitution Principle.
>
> Because of this behaviour, many libraries switched from using asanyarray
> to asarray, as np.matrix wouldn’t work with their code. This shut out other
> matrix subclasses from being used as well, such as MaskedArray and
> astropy.Quantity. Even if asanyarray is used, there is usually no guarantee
> that a matrix will be returned instead of an array.
>
> The changes I’m proposing are twofold, but simple:
>
>    - asanyarray should return mat.view(type=np.ndarray) instead of
>    matrices, after an appropriate time with a FutureWarning. This allows us to
>    preserve the performance (Creating a view is O(1) both in memory and time),
>    and the mutability of the original matrix. This change should happen after
>    a FutureWarning and the usual grace period.
>    - In the spirit of allowing duck-arrays to work with existing NumPy
>    code, asanyarray should be overridable via __array_function__, so that duck
>    arrays can decide whether to pass themselves through. If subclasses are
>    allowed, so should ducka-arrays as well.
>
> This is a part of a larger effort to deprecate np.matrix. As far as I’m
> aware, it has one big customer (scipy.sparse). The effort to replace that
> is already underway at PyData/Sparse.
>
> Best Regards,
> Hameer Abbasi
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20181109/9accb25d/attachment-0001.html>