[Numpy-discussion] Changes to np.digitize since NumPy 1.9?

Jaime Fernández del Río jaime.frio at gmail.com
Fri Aug 14 01:11:36 EDT 2015


On Thu, Aug 13, 2015 at 9:57 AM, Jaime Fernández del Río <
jaime.frio at gmail.com> wrote:

> On Thu, Aug 13, 2015 at 7:59 AM, Nathan Goldbaum <nathan12343 at gmail.com>
> wrote:
>
>>
>>
>> On Thu, Aug 13, 2015 at 9:44 AM, Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>>
>>>
>>> On Thu, Aug 13, 2015 at 12:09 AM, Jaime Fernández del Río <
>>> jaime.frio at gmail.com> wrote:
>>>
>>>> On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum <nathan12343 at gmail.com
>>>> > wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I've been testing the package I spend most of my time on, yt, under
>>>>> numpy 1.10b1 since the announcement went out.
>>>>>
>>>>> I think I've narrowed down and fixed all of the test failures that
>>>>> cropped up except for one last issue. It seems that the behavior of
>>>>> np.digitize with respect to ndarray subclasses has changed since the NumPy
>>>>> 1.9 series. Consider the following test script:
>>>>>
>>>>> ```python
>>>>> import numpy as np
>>>>>
>>>>>
>>>>> class MyArray(np.ndarray):
>>>>>     def __new__(cls, *args, **kwargs):
>>>>>         return np.ndarray.__new__(cls, *args, **kwargs)
>>>>>
>>>>> data = np.arange(100)
>>>>>
>>>>> bins = np.arange(100) + 0.5
>>>>>
>>>>> data = data.view(MyArray)
>>>>>
>>>>> bins = bins.view(MyArray)
>>>>>
>>>>> digits = np.digitize(data, bins)
>>>>>
>>>>> print type(digits)
>>>>> ```
>>>>>
>>>>> Under NumPy 1.9.2, this prints "<type 'numpy.ndarray'>", but under the
>>>>> 1.10 beta, it prints "<class '__main__.MyArray'>"
>>>>>
>>>>> I'm curious why this change was made. Since digitize outputs index
>>>>> arrays, it doesn't make sense to me why it should return anything but a
>>>>> plain ndarray. I see in the release notes that digitize now uses
>>>>> searchsorted under the hood. Is this related?
>>>>>
>>>>
>>>> It is indeed searchsorted's fault, as it returns an object of the same
>>>> type as the needle (the items to search for):
>>>>
>>>> >>> import numpy as np
>>>> >>> class A(np.ndarray): pass
>>>> >>> class B(np.ndarray): pass
>>>> >>> np.arange(10).view(A).searchsorted(np.arange(5).view(B))
>>>> B([0, 1, 2, 3, 4])
>>>>
>>>> I am all for making index-returning functions always return a base
>>>> ndarray, and will be more than happy to send a PR fixing this if there is
>>>> some agreement.
>>>>
>>>
>>> I think that is the right thing to do.
>>>
>>
>> Awesome, I'd appreciate having a PR to fix this. Arguably the return type
>> *could* be the same type as the inputs, but given that it's a behavior
>> change I agree that it's best to add a patch so the output of serachsorted
>> is "sanitized" to be an ndarray before it's returned by digitize.
>>
>
> It is relatively simple to do, just replace Py_TYPE(ap2) with
> &PyArray_Type in this line:
>
>
> https://github.com/numpy/numpy/blob/maintenance/1.10.x/numpy/core/src/multiarray/item_selection.c#L1725
>
> Then fix all the tests that are expecting searchsorted to return something
> else than a base ndarray. We already have modified nonzero to return base
> ndarray's in this release, see the release notes, so it will go with the
> same theme.
>
> For 1.11 I think we should try to extend this "if it returns an index, it
> will be a base ndarray" to all other functions that don't right now. Then
> sit back and watch AstroPy come down in flames... ;-)))
>
> Seriously, I think this makes a lot of sense, and should be documented as
> the way NumPy handles index arrays.
>
> Anyway, I will try to find time tonight to put this PR together, unless
> someone beats me to it, which I would be totally fine with.
>

PR #6206 it is: https://github.com/numpy/numpy/pull/6206

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150813/96fa9e39/attachment.html>


More information about the NumPy-Discussion mailing list