Changes to np.digitize since NumPy 1.9?
Hi all,
I've been testing the package I spend most of my time on, yt, under numpy
1.10b1 since the announcement went out.
I think I've narrowed down and fixed all of the test failures that cropped
up except for one last issue. It seems that the behavior of np.digitize
with respect to ndarray subclasses has changed since the NumPy 1.9 series.
Consider the following test script:
```python
import numpy as np
class MyArray(np.ndarray):
def __new__(cls, *args, **kwargs):
return np.ndarray.__new__(cls, *args, **kwargs)
data = np.arange(100)
bins = np.arange(100) + 0.5
data = data.view(MyArray)
bins = bins.view(MyArray)
digits = np.digitize(data, bins)
print type(digits)
```
Under NumPy 1.9.2, this prints "
On Aug 12, 2015 2:06 PM, "Nathan Goldbaum"
Hi all,
I've been testing the package I spend most of my time on, yt, under numpy 1.10b1 since the announcement went out.
I think I've narrowed down and fixed all of the test failures that cropped up except for one last issue.
This doesn't respond to your main question -- sorry! -- but is there a list of the changes you had to make somewhere? We generally do want to know when we break things -- that's why we do pre-releases! -- but it's often hard to know :-). -n
On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum
Hi all,
I've been testing the package I spend most of my time on, yt, under numpy 1.10b1 since the announcement went out.
I think I've narrowed down and fixed all of the test failures that cropped up except for one last issue. It seems that the behavior of np.digitize with respect to ndarray subclasses has changed since the NumPy 1.9 series. Consider the following test script:
```python import numpy as np
class MyArray(np.ndarray): def __new__(cls, *args, **kwargs): return np.ndarray.__new__(cls, *args, **kwargs)
data = np.arange(100)
bins = np.arange(100) + 0.5
data = data.view(MyArray)
bins = bins.view(MyArray)
digits = np.digitize(data, bins)
print type(digits) ```
Under NumPy 1.9.2, this prints "
", but under the 1.10 beta, it prints " " I'm curious why this change was made. Since digitize outputs index arrays, it doesn't make sense to me why it should return anything but a plain ndarray. I see in the release notes that digitize now uses searchsorted under the hood. Is this related?
It is indeed searchsorted's fault, as it returns an object of the same type as the needle (the items to search for):
import numpy as np class A(np.ndarray): pass class B(np.ndarray): pass np.arange(10).view(A).searchsorted(np.arange(5).view(B)) B([0, 1, 2, 3, 4])
I am all for making index-returning functions always return a base ndarray, and will be more than happy to send a PR fixing this if there is some agreement. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
On Aug 12, 2015 11:12 PM, "Jaime Fernández del Río"
On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum
Hi all,
I've been testing the package I spend most of my time on, yt, under
numpy 1.10b1 since the announcement went out.
I think I've narrowed down and fixed all of the test failures that
cropped up except for one last issue. It seems that the behavior of np.digitize with respect to ndarray subclasses has changed since the NumPy 1.9 series. Consider the following test script:
```python import numpy as np
class MyArray(np.ndarray): def __new__(cls, *args, **kwargs): return np.ndarray.__new__(cls, *args, **kwargs)
data = np.arange(100)
bins = np.arange(100) + 0.5
data = data.view(MyArray)
bins = bins.view(MyArray)
digits = np.digitize(data, bins)
print type(digits) ```
Under NumPy 1.9.2, this prints "
", but under the 1.10 beta, it prints "
" I'm curious why this change was made. Since digitize outputs index
arrays, it doesn't make sense to me why it should return anything but a
wrote: plain ndarray. I see in the release notes that digitize now uses searchsorted under the hood. Is this related?
It is indeed searchsorted's fault, as it returns an object of the same
type as the needle (the items to search for):
import numpy as np class A(np.ndarray): pass class B(np.ndarray): pass np.arange(10).view(A).searchsorted(np.arange(5).view(B)) B([0, 1, 2, 3, 4])
I am all for making index-returning functions always return a base
ndarray, and will be more than happy to send a PR fixing this if there is some agreement. Makes sense to me. I won't be surprised if someone else then shows up saying that of course they depend on index array return types matching the input, but if that happens then I guess we can let them and Nathan fight it out :-). -n
On Thu, Aug 13, 2015 at 12:09 AM, Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum
wrote: Hi all,
I've been testing the package I spend most of my time on, yt, under numpy 1.10b1 since the announcement went out.
I think I've narrowed down and fixed all of the test failures that cropped up except for one last issue. It seems that the behavior of np.digitize with respect to ndarray subclasses has changed since the NumPy 1.9 series. Consider the following test script:
```python import numpy as np
class MyArray(np.ndarray): def __new__(cls, *args, **kwargs): return np.ndarray.__new__(cls, *args, **kwargs)
data = np.arange(100)
bins = np.arange(100) + 0.5
data = data.view(MyArray)
bins = bins.view(MyArray)
digits = np.digitize(data, bins)
print type(digits) ```
Under NumPy 1.9.2, this prints "
", but under the 1.10 beta, it prints " " I'm curious why this change was made. Since digitize outputs index arrays, it doesn't make sense to me why it should return anything but a plain ndarray. I see in the release notes that digitize now uses searchsorted under the hood. Is this related?
It is indeed searchsorted's fault, as it returns an object of the same type as the needle (the items to search for):
import numpy as np class A(np.ndarray): pass class B(np.ndarray): pass np.arange(10).view(A).searchsorted(np.arange(5).view(B)) B([0, 1, 2, 3, 4])
I am all for making index-returning functions always return a base ndarray, and will be more than happy to send a PR fixing this if there is some agreement.
I think that is the right thing to do. Chuck
On Thu, Aug 13, 2015 at 9:44 AM, Charles R Harris wrote: On Thu, Aug 13, 2015 at 12:09 AM, Jaime Fernández del Río <
jaime.frio@gmail.com> wrote: On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum Hi all, I've been testing the package I spend most of my time on, yt, under
numpy 1.10b1 since the announcement went out. I think I've narrowed down and fixed all of the test failures that
cropped up except for one last issue. It seems that the behavior of
np.digitize with respect to ndarray subclasses has changed since the NumPy
1.9 series. Consider the following test script: ```python
import numpy as np class MyArray(np.ndarray):
def __new__(cls, *args, **kwargs):
return np.ndarray.__new__(cls, *args, **kwargs) data = np.arange(100) bins = np.arange(100) + 0.5 data = data.view(MyArray) bins = bins.view(MyArray) digits = np.digitize(data, bins) print type(digits)
``` Under NumPy 1.9.2, this prints " I'm curious why this change was made. Since digitize outputs index
arrays, it doesn't make sense to me why it should return anything but a
plain ndarray. I see in the release notes that digitize now uses
searchsorted under the hood. Is this related? It is indeed searchsorted's fault, as it returns an object of the same
type as the needle (the items to search for): import numpy as np
class A(np.ndarray): pass
class B(np.ndarray): pass
np.arange(10).view(A).searchsorted(np.arange(5).view(B))
B([0, 1, 2, 3, 4]) I am all for making index-returning functions always return a base
ndarray, and will be more than happy to send a PR fixing this if there is
some agreement. I think that is the right thing to do. Awesome, I'd appreciate having a PR to fix this. Arguably the return type
*could* be the same type as the inputs, but given that it's a behavior
change I agree that it's best to add a patch so the output of serachsorted
is "sanitized" to be an ndarray before it's returned by digitize.
To answer Nathaniel's question, I opened an issue on yt's bitbucket page to
record the test failures:
https://bitbucket.org/yt_analysis/yt/issues/1063/new-test-failures-using-num...
I've fixed two of the classes of errors in that bug in yt itself, since it
looks like we were relying on buggy or deprecated behavior in NumPy. Here
are the PRs for those fixes:
https://bitbucket.org/yt_analysis/yt/pull-requests/1697/cast-enzo-grid-start...
https://bitbucket.org/yt_analysis/yt/pull-requests/1696/add-assert_allclose_... Chuck _______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Aug 13, 2015 at 7:59 AM, Nathan Goldbaum
On Thu, Aug 13, 2015 at 9:44 AM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Thu, Aug 13, 2015 at 12:09 AM, Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum
wrote: Hi all,
I've been testing the package I spend most of my time on, yt, under numpy 1.10b1 since the announcement went out.
I think I've narrowed down and fixed all of the test failures that cropped up except for one last issue. It seems that the behavior of np.digitize with respect to ndarray subclasses has changed since the NumPy 1.9 series. Consider the following test script:
```python import numpy as np
class MyArray(np.ndarray): def __new__(cls, *args, **kwargs): return np.ndarray.__new__(cls, *args, **kwargs)
data = np.arange(100)
bins = np.arange(100) + 0.5
data = data.view(MyArray)
bins = bins.view(MyArray)
digits = np.digitize(data, bins)
print type(digits) ```
Under NumPy 1.9.2, this prints "
", but under the 1.10 beta, it prints " " I'm curious why this change was made. Since digitize outputs index arrays, it doesn't make sense to me why it should return anything but a plain ndarray. I see in the release notes that digitize now uses searchsorted under the hood. Is this related?
It is indeed searchsorted's fault, as it returns an object of the same type as the needle (the items to search for):
import numpy as np class A(np.ndarray): pass class B(np.ndarray): pass np.arange(10).view(A).searchsorted(np.arange(5).view(B)) B([0, 1, 2, 3, 4])
I am all for making index-returning functions always return a base ndarray, and will be more than happy to send a PR fixing this if there is some agreement.
I think that is the right thing to do.
Awesome, I'd appreciate having a PR to fix this. Arguably the return type *could* be the same type as the inputs, but given that it's a behavior change I agree that it's best to add a patch so the output of serachsorted is "sanitized" to be an ndarray before it's returned by digitize.
It is relatively simple to do, just replace Py_TYPE(ap2) with &PyArray_Type in this line: https://github.com/numpy/numpy/blob/maintenance/1.10.x/numpy/core/src/multia... Then fix all the tests that are expecting searchsorted to return something else than a base ndarray. We already have modified nonzero to return base ndarray's in this release, see the release notes, so it will go with the same theme. For 1.11 I think we should try to extend this "if it returns an index, it will be a base ndarray" to all other functions that don't right now. Then sit back and watch AstroPy come down in flames... ;-))) Seriously, I think this makes a lot of sense, and should be documented as the way NumPy handles index arrays. Anyway, I will try to find time tonight to put this PR together, unless someone beats me to it, which I would be totally fine with. Jaime
To answer Nathaniel's question, I opened an issue on yt's bitbucket page to record the test failures:
https://bitbucket.org/yt_analysis/yt/issues/1063/new-test-failures-using-num...
I've fixed two of the classes of errors in that bug in yt itself, since it looks like we were relying on buggy or deprecated behavior in NumPy. Here are the PRs for those fixes:
https://bitbucket.org/yt_analysis/yt/pull-requests/1697/cast-enzo-grid-start...
https://bitbucket.org/yt_analysis/yt/pull-requests/1696/add-assert_allclose_...
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
On Thu, Aug 13, 2015 at 9:57 AM, Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
On Thu, Aug 13, 2015 at 7:59 AM, Nathan Goldbaum
wrote: On Thu, Aug 13, 2015 at 9:44 AM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Thu, Aug 13, 2015 at 12:09 AM, Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum
wrote:
Hi all,
I've been testing the package I spend most of my time on, yt, under numpy 1.10b1 since the announcement went out.
I think I've narrowed down and fixed all of the test failures that cropped up except for one last issue. It seems that the behavior of np.digitize with respect to ndarray subclasses has changed since the NumPy 1.9 series. Consider the following test script:
```python import numpy as np
class MyArray(np.ndarray): def __new__(cls, *args, **kwargs): return np.ndarray.__new__(cls, *args, **kwargs)
data = np.arange(100)
bins = np.arange(100) + 0.5
data = data.view(MyArray)
bins = bins.view(MyArray)
digits = np.digitize(data, bins)
print type(digits) ```
Under NumPy 1.9.2, this prints "
", but under the 1.10 beta, it prints " " I'm curious why this change was made. Since digitize outputs index arrays, it doesn't make sense to me why it should return anything but a plain ndarray. I see in the release notes that digitize now uses searchsorted under the hood. Is this related?
It is indeed searchsorted's fault, as it returns an object of the same type as the needle (the items to search for):
> import numpy as np > class A(np.ndarray): pass > class B(np.ndarray): pass > np.arange(10).view(A).searchsorted(np.arange(5).view(B)) B([0, 1, 2, 3, 4])
I am all for making index-returning functions always return a base ndarray, and will be more than happy to send a PR fixing this if there is some agreement.
I think that is the right thing to do.
Awesome, I'd appreciate having a PR to fix this. Arguably the return type *could* be the same type as the inputs, but given that it's a behavior change I agree that it's best to add a patch so the output of serachsorted is "sanitized" to be an ndarray before it's returned by digitize.
It is relatively simple to do, just replace Py_TYPE(ap2) with &PyArray_Type in this line:
https://github.com/numpy/numpy/blob/maintenance/1.10.x/numpy/core/src/multia...
Then fix all the tests that are expecting searchsorted to return something else than a base ndarray. We already have modified nonzero to return base ndarray's in this release, see the release notes, so it will go with the same theme.
For 1.11 I think we should try to extend this "if it returns an index, it will be a base ndarray" to all other functions that don't right now. Then sit back and watch AstroPy come down in flames... ;-)))
Seriously, I think this makes a lot of sense, and should be documented as the way NumPy handles index arrays.
Anyway, I will try to find time tonight to put this PR together, unless someone beats me to it, which I would be totally fine with.
PR #6206 it is: https://github.com/numpy/numpy/pull/6206 Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
For what it's worth, also from my astropy perspective I think hat any index array should be a base ndarray! -- Marten On Fri, Aug 14, 2015 at 7:11 AM, Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
On Thu, Aug 13, 2015 at 9:57 AM, Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
On Thu, Aug 13, 2015 at 7:59 AM, Nathan Goldbaum
wrote: On Thu, Aug 13, 2015 at 9:44 AM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Thu, Aug 13, 2015 at 12:09 AM, Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum < nathan12343@gmail.com> wrote:
Hi all,
I've been testing the package I spend most of my time on, yt, under numpy 1.10b1 since the announcement went out.
I think I've narrowed down and fixed all of the test failures that cropped up except for one last issue. It seems that the behavior of np.digitize with respect to ndarray subclasses has changed since the NumPy 1.9 series. Consider the following test script:
```python import numpy as np
class MyArray(np.ndarray): def __new__(cls, *args, **kwargs): return np.ndarray.__new__(cls, *args, **kwargs)
data = np.arange(100)
bins = np.arange(100) + 0.5
data = data.view(MyArray)
bins = bins.view(MyArray)
digits = np.digitize(data, bins)
print type(digits) ```
Under NumPy 1.9.2, this prints "
", but under the 1.10 beta, it prints " " I'm curious why this change was made. Since digitize outputs index arrays, it doesn't make sense to me why it should return anything but a plain ndarray. I see in the release notes that digitize now uses searchsorted under the hood. Is this related?
It is indeed searchsorted's fault, as it returns an object of the same type as the needle (the items to search for):
>> import numpy as np >> class A(np.ndarray): pass >> class B(np.ndarray): pass >> np.arange(10).view(A).searchsorted(np.arange(5).view(B)) B([0, 1, 2, 3, 4])
I am all for making index-returning functions always return a base ndarray, and will be more than happy to send a PR fixing this if there is some agreement.
I think that is the right thing to do.
Awesome, I'd appreciate having a PR to fix this. Arguably the return type *could* be the same type as the inputs, but given that it's a behavior change I agree that it's best to add a patch so the output of serachsorted is "sanitized" to be an ndarray before it's returned by digitize.
It is relatively simple to do, just replace Py_TYPE(ap2) with &PyArray_Type in this line:
https://github.com/numpy/numpy/blob/maintenance/1.10.x/numpy/core/src/multia...
Then fix all the tests that are expecting searchsorted to return something else than a base ndarray. We already have modified nonzero to return base ndarray's in this release, see the release notes, so it will go with the same theme.
For 1.11 I think we should try to extend this "if it returns an index, it will be a base ndarray" to all other functions that don't right now. Then sit back and watch AstroPy come down in flames... ;-)))
Seriously, I think this makes a lot of sense, and should be documented as the way NumPy handles index arrays.
Anyway, I will try to find time tonight to put this PR together, unless someone beats me to it, which I would be totally fine with.
PR #6206 it is: https://github.com/numpy/numpy/pull/6206
Jaime
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (5)
-
Charles R Harris
-
Jaime Fernández del Río
-
Marten van Kerkwijk
-
Nathan Goldbaum
-
Nathaniel Smith