[Numpy-discussion] asanyarray vs. asarray

Charles R Harris charlesr.harris at gmail.com
Fri Oct 19 22:00:02 EDT 2018


On Fri, Oct 19, 2018 at 7:50 PM Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> Subclasses such as MaskedArray and, yes, Quantity, are widely used, and if
> they cause problems perhaps that should be seen as a sign that ndarray
> subclassing should be made easier and clearer.
>
> Both maskedarray and quantity seem like something that would make more
> sense at the dtype level if our dtype system was easier to extend. It might
> be good to compile a list of subclassing applications, and split them into
> “this ought to be a dtype” and “this ought to be a different type of
> container”.
>

Wes Mckinney has been benchmarking masks vs sentinel values for arrow:
http://wesmckinney.com/blog/bitmaps-vs-sentinel-values/. The (bit) masks
are faster. I'm not convinced dtypes are the way to go.

Chuck


> On Fri, 19 Oct 2018 at 18:24 Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
>
>> Hi All,
>>
>> It seems there are two extreme possibilities for general functions:
>> 1. Put `asarray` everywhere. The main benefit that I can see is that even
>> if people put in list instead of arrays, one is guaranteed to have shape,
>> dtype, etc. But it seems a bit like calling `int` on everything that might
>> get used as an index, instead of letting the actual indexing do the proper
>> thing and call `__index__`.
>> 2. Do not coerce at all, but rather write code assuming something is an
>> array already. This will often, but not always, just work for array mimics,
>> with coercion done only where necessary (e.g., in lower-lying C code such
>> as that of the ufuncs which has a smaller API surface and can be overridden
>> more easily).
>>
>> The current __array_function__ work may well provide us with a way to
>> combine both, if we (over time) move the coercion inside
>> `ndarray.__array_function__` so that the actual implementation *can* assume
>> it deals with pure ndarray - then, when relevant, calling that
>> implementation will be what subclasses/duck arrays can happily do (and it
>> is up to them to ensure this works).
>>
>> Of course, the above does not really answer what to do in the meantime.
>> But perhaps it helps in thinking of what we are actually aiming for.
>>
>> One last thing: could we please stop bashing subclasses? One can subclass
>> essentially everything in python, often to great advantage. Subclasses such
>> as MaskedArray and, yes, Quantity, are widely used, and if they cause
>> problems perhaps that should be seen as a sign that ndarray subclassing
>> should be made easier and clearer.
>>
>> All the best,
>>
>> Marten
>>
>>
>> On Fri, Oct 19, 2018 at 7:02 PM Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Oct 19, 2018 at 10:28 PM Ralf Gommers <ralf.gommers at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Oct 19, 2018 at 4:15 PM Hameer Abbasi <
>>>> einstein.edison at gmail.com> wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer <shoyer at gmail.com>
>>>>> wrote:
>>>>> I don't think it makes much sense to change NumPy's existing usage of
>>>>> asarray() to asanyarray() unless we add subok=True arguments (which default
>>>>> to False). But this ends up cluttering NumPy's public API, which is also
>>>>> undesirable.
>>>>>
>>>>> Agreed so far.
>>>>>
>>>>
>>>> I'm not sure I agree. "subok" is very unpythonic; the average numpy
>>>> library function should work fine for a well-behaved subclass (i.e. most
>>>> things out there except np.matrix).
>>>>
>>>>>
>>>>> The preferred way to override NumPy functions going forward should be
>>>>> __array_function__.
>>>>>
>>>>>
>>>>> I think we should “soft support” i.e. allow but consider unsupported,
>>>>> the case where one of NumPy’s functions is implemented in terms of others
>>>>> and “passing through” an array results in the correct behaviour for that
>>>>> array.
>>>>>
>>>>
>>>> I don't think we have or want such a concept as "soft support". We
>>>> intend to not break anything that now has asanyarray, i.e. it's supported
>>>> and ideally we have regression tests for all such functions. For anything
>>>> we transition over from asarray to asanyarray, PRs should come with new
>>>> tests.
>>>>
>>>>
>>>>>
>>>>> On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk <
>>>>> m.h.vankerkwijk at gmail.com> wrote:
>>>>>
>>>>>> There are exceptions for `matrix` in quite a few places, and there
>>>>>> now is warning for `maxtrix` - it might not be bad to use `asanyarray` and
>>>>>> add an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric
>>>>>> Wieser to just add the exception to `asanyarray` itself - that way when
>>>>>> matrix is truly deprecated, it will be a very easy change.
>>>>>>
>>>>> I don't quite understand this. Adding exceptions is not deprecation -
>>>> we then may as well just rip np.matrix out straight away.
>>>>
>>>> What I suggested in the call about this issue is that it's not very
>>>> effective to treat functions like percentile/quantile one by one without an
>>>> overarching strategy. A way forward could be for someone to write an
>>>> overview of which sets of functions now have asanyarray (and actually work
>>>> with subclasses), which ones we can and want to change now, and which ones
>>>> we can and want to change after np.matrix is gone. Also, some guidelines
>>>> for new functions that we add to numpy would be handy. I suspect we've been
>>>> adding new functions that use asarray rather than asanyarray, which is
>>>> probably undesired.
>>>>
>>>
>>> Thanks Nathaniel and Stephan. Your comments on my other two points are
>>> both clear and correct (and have been made a number of times before). I
>>> think the "write an overview so we can stop making ad-hoc decisions and
>>> having these discussions" is the most important point I was trying to make
>>> though. If we had such a doc and it concluded "hence we don't change
>>> anything, __array_function__ is the only way to go" then we can just close
>>> PRs like https://github.com/numpy/numpy/pull/11162 straight away.
>>>
>>> Cheers,
>>> Ralf
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20181019/6419024c/attachment-0001.html>


More information about the NumPy-Discussion mailing list