[Numpy-discussion] Changes to np.digitize since NumPy 1.9?

Nathan Goldbaum nathan12343 at gmail.com
Thu Aug 13 10:59:33 EDT 2015


On Thu, Aug 13, 2015 at 9:44 AM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Thu, Aug 13, 2015 at 12:09 AM, Jaime Fernández del Río <
> jaime.frio at gmail.com> wrote:
>
>> On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum <nathan12343 at gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I've been testing the package I spend most of my time on, yt, under
>>> numpy 1.10b1 since the announcement went out.
>>>
>>> I think I've narrowed down and fixed all of the test failures that
>>> cropped up except for one last issue. It seems that the behavior of
>>> np.digitize with respect to ndarray subclasses has changed since the NumPy
>>> 1.9 series. Consider the following test script:
>>>
>>> ```python
>>> import numpy as np
>>>
>>>
>>> class MyArray(np.ndarray):
>>>     def __new__(cls, *args, **kwargs):
>>>         return np.ndarray.__new__(cls, *args, **kwargs)
>>>
>>> data = np.arange(100)
>>>
>>> bins = np.arange(100) + 0.5
>>>
>>> data = data.view(MyArray)
>>>
>>> bins = bins.view(MyArray)
>>>
>>> digits = np.digitize(data, bins)
>>>
>>> print type(digits)
>>> ```
>>>
>>> Under NumPy 1.9.2, this prints "<type 'numpy.ndarray'>", but under the
>>> 1.10 beta, it prints "<class '__main__.MyArray'>"
>>>
>>> I'm curious why this change was made. Since digitize outputs index
>>> arrays, it doesn't make sense to me why it should return anything but a
>>> plain ndarray. I see in the release notes that digitize now uses
>>> searchsorted under the hood. Is this related?
>>>
>>
>> It is indeed searchsorted's fault, as it returns an object of the same
>> type as the needle (the items to search for):
>>
>> >>> import numpy as np
>> >>> class A(np.ndarray): pass
>> >>> class B(np.ndarray): pass
>> >>> np.arange(10).view(A).searchsorted(np.arange(5).view(B))
>> B([0, 1, 2, 3, 4])
>>
>> I am all for making index-returning functions always return a base
>> ndarray, and will be more than happy to send a PR fixing this if there is
>> some agreement.
>>
>
> I think that is the right thing to do.
>

Awesome, I'd appreciate having a PR to fix this. Arguably the return type
*could* be the same type as the inputs, but given that it's a behavior
change I agree that it's best to add a patch so the output of serachsorted
is "sanitized" to be an ndarray before it's returned by digitize.

To answer Nathaniel's question, I opened an issue on yt's bitbucket page to
record the test failures:

https://bitbucket.org/yt_analysis/yt/issues/1063/new-test-failures-using-numpy-110-beta

I've fixed two of the classes of errors in that bug in yt itself, since it
looks like we were relying on buggy or deprecated behavior in NumPy. Here
are the PRs for those fixes:

https://bitbucket.org/yt_analysis/yt/pull-requests/1697/cast-enzo-grid-start-index-to-int-arrays/diff
https://bitbucket.org/yt_analysis/yt/pull-requests/1696/add-assert_allclose_units-like/diff

>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150813/611423d3/attachment.html>


More information about the NumPy-Discussion mailing list