Mailman 3 Warnings in numpy.ma.test() - NumPy-Discussion

Warnings in numpy.ma.test()

Pierre GM

17 Mar 2010 17 Mar '10

6:07 a.m.

All, As you're probably aware, the current test suite for numpy.ma raises some nagging warnings such as "invalid value in ...". These warnings are only issued when a standard numpy ufunc (eg., np.sqrt) is called on a MaskedArray, instead of its numpy.ma (eg., np.ma.sqrt) equivalent. The reason is that the masked versions of the ufuncs temporarily set the numpy error status to 'ignore' before the operation takes place, and reset the status to its original value. I thought I could use the new __array_prepare__ method to intercept the call of a standard ufunc. After actual testing, that can't work. __array_prepare only help to prepare the *output* of the operation, not to change the input on the fly, just for this operation. Actually, you can modify the input in place, but it's usually not what you want. Then, I tried to use __array_prepare__ to store the current error status in the input, force it to ignore divide/invalid errors and send the input to the ufunc. Doesn't work either: np.seterr in __array_prepare__ does change the error status, but as far as I understand, the ufunc is called is still called with the original error status. That means that if something goes wrong, your error status can stay stuck. Not a good idea either. I'm running out of ideas at this point. For the test suite, I'd suggest to disable the warnings in test_fix_invalid and test_basic_arithmetic. An additional issue is that if one of the error status is set to 'raise', the numpy ufunc will raise the exception (as expected), while its numpy.ma version will not. I'll put also a warning in the docs to that effect. Please send me your comments before I commit any changes. Cheers, P.

Show replies by date

Darren Dale

17 Mar 17 Mar

12:19 p.m.

On Wed, Mar 17, 2010 at 2:07 AM, Pierre GM wrote:

...

All, As you're probably aware, the current test suite for numpy.ma raises some nagging warnings such as "invalid value in ...". These warnings are only issued when a standard numpy ufunc (eg., np.sqrt) is called on a MaskedArray, instead of its numpy.ma (eg., np.ma.sqrt) equivalent. The reason is that the masked versions of the ufuncs temporarily set the numpy error status to 'ignore' before the operation takes place, and reset the status to its original value.

...

I thought I could use the new __array_prepare__ method to intercept the call of a standard ufunc. After actual testing, that can't work. __array_prepare only help to prepare the *output* of the operation, not to change the input on the fly, just for this operation. Actually, you can modify the input in place, but it's usually not what you want.

That is correct, __array_prepare__ is called just after the output array is created, but before the ufunc actually gets down to business. I have the same limitation in quantities you are now seeing with masked array, in my case I want the opportunity to rescale different but compatible quantities for the operation (without changing the original arrays in place, of course).

...

Then, I tried to use __array_prepare__ to store the current error status in the input, force it to ignore divide/invalid errors and send the input to the ufunc. Doesn't work either: np.seterr in __array_prepare__ does change the error status, but as far as I understand, the ufunc is called is still called with the original error status. That means that if something goes wrong, your error status can stay stuck. Not a good idea either. I'm running out of ideas at this point. For the test suite, I'd suggest to disable the warnings in test_fix_invalid and test_basic_arithmetic. An additional issue is that if one of the error status is set to 'raise', the numpy ufunc will raise the exception (as expected), while its numpy.ma version will not. I'll put also a warning in the docs to that effect. Please send me your comments before I commit any changes.

I started thinking about a third method called __input_prepare__ that would be called on the way into the ufunc, which would allow you to intercept the input and pass a somehow modified copy back to the ufunc. The total flow would be: 1) Call myufunc(x, y[, z]) 2) myufunc calls ?.__input_prepare__(myufunc, x, y), which returns x', y' (or simply passes through x,y by default) 3) myufunc creates the output array z (if not specified) and calls ?.__array_prepare__(z, (myufunc, x, y, ...)) 4) myufunc finally gets around to performing the calculation 5) myufunc calls ?.__array_wrap__(z, (myufunc, x, y, ...)) and returns the result to the caller Is this general enough for your use case? I haven't tried to think about how to change some global state at one point and change it back at another, that seems like a bad idea and difficult to support. Darren

Ryan May

2:11 p.m.

On Wed, Mar 17, 2010 at 7:19 AM, Darren Dale wrote:

...

Is this general enough for your use case? I haven't tried to think about how to change some global state at one point and change it back at another, that seems like a bad idea and difficult to support.

Sounds like the textbook use case for the python 2.5/2.6 context manager. Pity we can't use it yet... (and I'm not sure it'd be easy to wrap around the calls here.) Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma

Darren Dale

2:20 p.m.

On Wed, Mar 17, 2010 at 10:11 AM, Ryan May wrote:

...

On Wed, Mar 17, 2010 at 7:19 AM, Darren Dale wrote:

...
Is this general enough for your use case? I haven't tried to think about how to change some global state at one point and change it back at another, that seems like a bad idea and difficult to support.

Sounds like the textbook use case for the python 2.5/2.6 context manager. Pity we can't use it yet... (and I'm not sure it'd be easy to wrap around the calls here.)

I don't think context managers would work. They would be implemented in one of the subclasses special methods and would thus go out of scope before the ufunc got around to performing the calculation that required the change in state. Darren

Ryan May

3:33 p.m.

On Wed, Mar 17, 2010 at 9:20 AM, Darren Dale wrote:

...

On Wed, Mar 17, 2010 at 10:11 AM, Ryan May wrote:

...
On Wed, Mar 17, 2010 at 7:19 AM, Darren Dale wrote:

...
Is this general enough for your use case? I haven't tried to think about how to change some global state at one point and change it back at another, that seems like a bad idea and difficult to support.

Sounds like the textbook use case for the python 2.5/2.6 context manager. Pity we can't use it yet... (and I'm not sure it'd be easy to wrap around the calls here.)

I don't think context managers would work. They would be implemented in one of the subclasses special methods and would thus go out of scope before the ufunc got around to performing the calculation that required the change in state.

Right, that's the part I was referring to in the last part of my post. But the concept of modifying global state and ensuring that no matter what happens, that state reset to its initial condition, is the textbook use case for context managers. Problem is, I think that limitation replies to any method that tries to be exception-safe. It seems like you basically need to wrap the initial function call. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma

Charles R Harris

2:45 p.m.

On Wed, Mar 17, 2010 at 6:19 AM, Darren Dale wrote:

...

On Wed, Mar 17, 2010 at 2:07 AM, Pierre GM wrote:

...
All, As you're probably aware, the current test suite for numpy.ma raises some nagging warnings such as "invalid value in ...". These warnings are only issued when a standard numpy ufunc (eg., np.sqrt) is called on a MaskedArray, instead of its numpy.ma (eg., np.ma.sqrt) equivalent. The reason is that the masked versions of the ufuncs temporarily set the numpy error status to 'ignore' before the operation takes place, and reset the status to its original value.

...
I thought I could use the new __array_prepare__ method to intercept the call of a standard ufunc. After actual testing, that can't work. __array_prepare only help to prepare the *output* of the operation, not to change the input on the fly, just for this operation. Actually, you can modify the input in place, but it's usually not what you want.

That is correct, __array_prepare__ is called just after the output array is created, but before the ufunc actually gets down to business. I have the same limitation in quantities you are now seeing with masked array, in my case I want the opportunity to rescale different but compatible quantities for the operation (without changing the original arrays in place, of course).

...
Then, I tried to use __array_prepare__ to store the current error status in the input, force it to ignore divide/invalid errors and send the input to the ufunc. Doesn't work either: np.seterr in __array_prepare__ does change the error status, but as far as I understand, the ufunc is called is still called with the original error status. That means that if something goes wrong, your error status can stay stuck. Not a good idea either. I'm running out of ideas at this point. For the test suite, I'd suggest to disable the warnings in test_fix_invalid and test_basic_arithmetic. An additional issue is that if one of the error status is set to 'raise', the numpy ufunc will raise the exception (as expected), while its numpy.maversion will not. I'll put also a warning in the docs to that effect. Please send me your comments before I commit any changes.

I started thinking about a third method called __input_prepare__ that would be called on the way into the ufunc, which would allow you to intercept the input and pass a somehow modified copy back to the ufunc. The total flow would be:

1) Call myufunc(x, y[, z]) 2) myufunc calls ?.__input_prepare__(myufunc, x, y), which returns x', y' (or simply passes through x,y by default) 3) myufunc creates the output array z (if not specified) and calls ?.__array_prepare__(z, (myufunc, x, y, ...)) 4) myufunc finally gets around to performing the calculation 5) myufunc calls ?.__array_wrap__(z, (myufunc, x, y, ...)) and returns the result to the caller

Is this general enough for your use case? I haven't tried to think about how to change some global state at one point and change it back at another, that seems like a bad idea and difficult to support.

I'm not a masked array user and not familiar with the specific problems here, but as an outsider it's beginning to look like one little fix after another. Is there some larger framework that would help here? Changes to the ufuncs themselves? There was some code for masked ufuncs on the c level posted a while back that I thought was interesting, would it help to have masked masked versions of the ufuncs? So on and so forth. It just looks like a larger design issue needs to be addressed here. Chuck

Darren Dale

3:35 p.m.

On Wed, Mar 17, 2010 at 10:45 AM, Charles R Harris wrote:

...

On Wed, Mar 17, 2010 at 6:19 AM, Darren Dale wrote:

...
On Wed, Mar 17, 2010 at 2:07 AM, Pierre GM wrote:

...
All, As you're probably aware, the current test suite for numpy.ma raises some nagging warnings such as "invalid value in ...". These warnings are only issued when a standard numpy ufunc (eg., np.sqrt) is called on a MaskedArray, instead of its numpy.ma (eg., np.ma.sqrt) equivalent. The reason is that the masked versions of the ufuncs temporarily set the numpy error status to 'ignore' before the operation takes place, and reset the status to its original value.

...
I thought I could use the new __array_prepare__ method to intercept the call of a standard ufunc. After actual testing, that can't work. __array_prepare only help to prepare the *output* of the operation, not to change the input on the fly, just for this operation. Actually, you can modify the input in place, but it's usually not what you want.

That is correct, __array_prepare__ is called just after the output array is created, but before the ufunc actually gets down to business. I have the same limitation in quantities you are now seeing with masked array, in my case I want the opportunity to rescale different but compatible quantities for the operation (without changing the original arrays in place, of course).

...
Then, I tried to use __array_prepare__ to store the current error status in the input, force it to ignore divide/invalid errors and send the input to the ufunc. Doesn't work either: np.seterr in __array_prepare__ does change the error status, but as far as I understand, the ufunc is called is still called with the original error status. That means that if something goes wrong, your error status can stay stuck. Not a good idea either. I'm running out of ideas at this point. For the test suite, I'd suggest to disable the warnings in test_fix_invalid and test_basic_arithmetic. An additional issue is that if one of the error status is set to 'raise', the numpy ufunc will raise the exception (as expected), while its numpy.ma version will not. I'll put also a warning in the docs to that effect. Please send me your comments before I commit any changes.

I started thinking about a third method called __input_prepare__ that would be called on the way into the ufunc, which would allow you to intercept the input and pass a somehow modified copy back to the ufunc. The total flow would be:

1) Call myufunc(x, y[, z]) 2) myufunc calls ?.__input_prepare__(myufunc, x, y), which returns x', y' (or simply passes through x,y by default) 3) myufunc creates the output array z (if not specified) and calls ?.__array_prepare__(z, (myufunc, x, y, ...)) 4) myufunc finally gets around to performing the calculation 5) myufunc calls ?.__array_wrap__(z, (myufunc, x, y, ...)) and returns the result to the caller

Is this general enough for your use case? I haven't tried to think about how to change some global state at one point and change it back at another, that seems like a bad idea and difficult to support.

I'm not a masked array user and not familiar with the specific problems here, but as an outsider it's beginning to look like one little fix after another.

Yeah, I was concerned that criticism would come up.

...

Is there some larger framework that would help here?

I think there is: http://www.python.org/dev/peps/pep-3124/

...

Changes to the ufuncs themselves?

Perhaps, if ufuncs were instances of a class that implemented __call__, it would be easier to include context management. Maybe this approach could be coupled with input_prepare, array_prepare and array_wrap to provide everything we need.

...

There was some code for masked ufuncs on the c level posted a while back that I thought was interesting, would it help to have masked masked versions of the ufuncs?

I think we need a solution that avoids implementing an entirely new set of ufuncs for specific subclasses.

...

So on and so forth. It just looks like a larger design issue needs to be addressed here.

I'm interested to hear other people's perspectives or suggestions. Darren

Eric Firing

6:28 p.m.

Charles R Harris wrote:

...

On Wed, Mar 17, 2010 at 6:19 AM, Darren Dale mailto:dsdale24@gmail.com> wrote:

On Wed, Mar 17, 2010 at 2:07 AM, Pierre GM mailto:pgmdevlist@gmail.com> wrote: > All, > As you're probably aware, the current test suite for numpy.ma http://numpy.ma raises some nagging warnings such as "invalid value in ...". These warnings are only issued when a standard numpy ufunc (eg., np.sqrt) is called on a MaskedArray, instead of its numpy.ma http://numpy.ma (eg., np.ma.sqrt) equivalent. The reason is that the masked versions of the ufuncs temporarily set the numpy error status to 'ignore' before the operation takes place, and reset the status to its original value.

> I thought I could use the new __array_prepare__ method to intercept the call of a standard ufunc. After actual testing, that can't work. __array_prepare only help to prepare the *output* of the operation, not to change the input on the fly, just for this operation. Actually, you can modify the input in place, but it's usually not what you want.

That is correct, __array_prepare__ is called just after the output array is created, but before the ufunc actually gets down to business. I have the same limitation in quantities you are now seeing with masked array, in my case I want the opportunity to rescale different but compatible quantities for the operation (without changing the original arrays in place, of course).

> Then, I tried to use __array_prepare__ to store the current error status in the input, force it to ignore divide/invalid errors and send the input to the ufunc. Doesn't work either: np.seterr in __array_prepare__ does change the error status, but as far as I understand, the ufunc is called is still called with the original error status. That means that if something goes wrong, your error status can stay stuck. Not a good idea either. > I'm running out of ideas at this point. For the test suite, I'd suggest to disable the warnings in test_fix_invalid and test_basic_arithmetic. > An additional issue is that if one of the error status is set to 'raise', the numpy ufunc will raise the exception (as expected), while its numpy.ma http://numpy.ma version will not. I'll put also a warning in the docs to that effect. > Please send me your comments before I commit any changes.

I started thinking about a third method called __input_prepare__ that would be called on the way into the ufunc, which would allow you to intercept the input and pass a somehow modified copy back to the ufunc. The total flow would be:

1) Call myufunc(x, y[, z]) 2) myufunc calls ?.__input_prepare__(myufunc, x, y), which returns x', y' (or simply passes through x,y by default) 3) myufunc creates the output array z (if not specified) and calls ?.__array_prepare__(z, (myufunc, x, y, ...)) 4) myufunc finally gets around to performing the calculation 5) myufunc calls ?.__array_wrap__(z, (myufunc, x, y, ...)) and returns the result to the caller

Is this general enough for your use case? I haven't tried to think about how to change some global state at one point and change it back at another, that seems like a bad idea and difficult to support.

I'm not a masked array user and not familiar with the specific problems here, but as an outsider it's beginning to look like one little fix after another. Is there some larger framework that would help here? Changes to the ufuncs themselves? There was some code for masked ufuncs on the c level posted a while back that I thought was interesting, would it help to have masked masked versions of the ufuncs? So on and so forth. It just looks like a larger design issue needs to be addressed here.

Chuck, I'm glad you found it interesting, and I'm sorry I haven't had time to follow up on the work with masked ufuncs in C. My motivation for going to the C level was speed and control; many ma operations are very slow compared to their numpy counterparts, and moving the mask handling to C can erase nearly all of this penalty. Regarding nan-handling, using masked ufuncs in C means that calculations are simply not done with masked values, so it doesn't matter whether a masked value is invalid or not; consequently, so long as an invalid value is masked, the seterr state doesn't matter. And, the seterr state then applies normally to the unmasked values. I'm not sure whether this solves the problem at hand, but it does seem to me to be sensible behavior and a step in the right direction. The devil is in the details--coming up with some basic masked ufunc functionality in C was fairly easy, but figuring out how to handle all ufuncs, and especially their methods (reduce, etc.) would be quite a bit of work. It might be a good project for a student. Realistically, I don't think I will ever have the time to do it myself. In case anyone is interested, my initial feeble attempt nearly a year ago is still on github: http://github.com/efiring/numpy-work Eric

...

Chuck

------------------------------------------------------------------------

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Christopher Barker

7:12 p.m.

Eric Firing wrote:

...

My motivation for going to the C level was speed and control; many ma operations are very slow compared to their numpy counterparts, and moving the mask handling to C can erase nearly all of this penalty.

really? very cool. I was thinking about this the other day, and thinking that in some grand future vision, all numpy arrays should be masked arrays (or could be). The idea is that missing/invalid data is a really common case, and it is simply wonderful to have the software handle that gracefully. One of the things I liked about MATLAB was that NaNs were well handled almost all the time. Given all the limitations of NaN, having a masked array is a better way to go, but I'd love it if they were "just there", and therefore EVERY numpy function and package built on numpy would handle them gracefully. I had thought that there would be a significant performance penalty, and thus there would be a boatload of "if_mask" code all over the place, but maybe not. Anyway, just a fantasy, but C-level ufuncs that support masks would be great. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

josef.pktd＠gmail.com

7:18 p.m.

On Wed, Mar 17, 2010 at 3:12 PM, Christopher Barker wrote:

...

Eric Firing wrote:

...
My motivation for going to the C level was speed and control; many ma operations are very slow compared to their numpy counterparts, and moving the mask handling to C can erase nearly all of this penalty.

really? very cool. I was thinking about this the other day, and thinking that in some grand future vision, all numpy arrays should be masked arrays (or could be). The idea is that missing/invalid data is a really common case, and it is simply wonderful to have the software handle that gracefully.

One of the things I liked about MATLAB was that NaNs were well handled almost all the time. Given all the limitations of NaN, having a masked array is a better way to go, but I'd love it if they were "just there", and therefore EVERY numpy function and package built on numpy would handle them gracefully. I had thought that there would be a significant performance penalty, and thus there would be a boatload of "if_mask" code all over the place, but maybe not.

many function are defined differently for missing values, in stats, regression or time series analysis with the assumption of equally spaced time periods always needs to use special methods to handle missing values. Plus, you have to operate on two arrays and keep both in memory. So the penalty is pretty high even in C. (on the statsmodels mailing list, Wes did a comparison for different implementations of moving average, although the difference wouldn't be as huge as it currently is.) Josef

...

Anyway, just a fantasy, but C-level ufuncs that support masks would be great.

-Chris

-- Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Pierre GM

9:25 p.m.

On Mar 17, 2010, at 3:18 PM, josef.pktd@gmail.com wrote:

...

On Wed, Mar 17, 2010 at 3:12 PM, Christopher Barker wrote:

...
One of the things I liked about MATLAB was that NaNs were well handled almost all the time. Given all the limitations of NaN, having a masked array is a better way to go, but I'd love it if they were "just there", and therefore EVERY numpy function and package built on numpy would handle them gracefully. I had thought that there would be a significant performance penalty, and thus there would be a boatload of "if_mask" code all over the place, but maybe not.

many function are defined differently for missing values, in stats, regression or time series analysis with the assumption of equally spaced time periods always needs to use special methods to handle missing values.

I think Christopher was referring to ufuncs, not necessarily to more complicated functions (like in stats or such).

Christopher Barker

18 Mar 18 Mar

7:12 p.m.

josef.pktd@gmail.com wrote:

...

On Wed, Mar 17, 2010 at 3:12 PM, Christopher Barker

...

...
Given all the limitations of NaN, having a masked array is a better way to go, but I'd love it if they were "just there", and therefore EVERY numpy function and package built on numpy would handle them gracefully.

...

many function are defined differently for missing values, in stats, regression or time series analysis with the assumption of equally spaced time periods always needs to use special methods to handle missing values.

sure -- that's kind of my point -- if EVERY numpy array were (potentially) masked, then folks would write code to deal with them appropriately.

...

Plus, you have to operate on two arrays and keep both in memory. So the penalty is pretty high even in C.

Only if there is actually a mask, which might make this pretty ugly -- lots of "if mask" code branching. If a given routine either didn't make sense with missing values, or was simply too costly with them, it could certainly raise an exception if it got an array with a non-null mask.

...

...
Anyway, just a fantasy, but C-level ufuncs that support masks would be great.

-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Gael Varoquaux

7:19 p.m.

On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:

...

sure -- that's kind of my point -- if EVERY numpy array were (potentially) masked, then folks would write code to deal with them appropriately.

That's pretty much saying: "I have a complicated problem and I want every one else to have to deal with the full complexity of it, even if they have a simple problem". In my experience, such choice doesn't fair well, unless it is inside a controled codebase, and someone working on that codebase is ready to spend heaps of time working on that specific issue. Gaël

josef.pktd＠gmail.com

7:32 p.m.

On Thu, Mar 18, 2010 at 3:19 PM, Gael Varoquaux wrote:

...

On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:

...
sure -- that's kind of my point -- if EVERY numpy array were (potentially) masked, then folks would write code to deal with them appropriately.

That's pretty much saying: "I have a complicated problem and I want every one else to have to deal with the full complexity of it, even if they have a simple problem". In my experience, such choice doesn't fair well, unless it is inside a controled codebase, and someone working on that codebase is ready to spend heaps of time working on that specific issue.

If the mask doesn't get quietly added during an operation, we would need to keep out the masked arrays at the front door. I worry about speed penalties for pure number crunching, although, if nomask=True gives a fast path, then it might not be too much of a problem. ufuncs are simple enough, but for reduce operations and other more complicated things (linalg, fft) the user would need to control how missing values are supposed to be handled, which still requires special treatment and "if mask" all over the place. Josef

...

Gaël _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Christopher Barker

7:46 p.m.

Gael Varoquaux wrote:

...

On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:

...
sure -- that's kind of my point -- if EVERY numpy array were (potentially) masked, then folks would write code to deal with them appropriately.

That's pretty much saying: "I have a complicated problem and I want every one else to have to deal with the full complexity of it, even if they have a simple problem".

Well -- I did say it was a fantasy... But I disagree -- having invalid data is a very common case. What we have now is a situation where we have two parallel systems, masked arrays and regular arrays. Each time someone does something new with masked arrays, they often find another missing feature, and have to solve that. Also, the fact that masked arrays are tacked on means that performance suffers. Maybe it would simply be too ugly, but If I were to start from the ground up with a scientific computing package, I would want to put in support for missing values from that start. There are some cases where is it simply too complicated or to expensive to handle missing values -- fine, then an exception is raised. You may be right about how complicated it would be, and what would happen is that everyone would simply put a: if a.masked: raise ("I can't deal with masked dat") stanza at the top of every new method they wrote, but I suspect that if the core infrastructure was in place, it would be used. I'm facing this at the moment: not a big deal, but I'm using histogram2d on some data. I just realized that it may have some NaNs in it, and I have no idea how those are being handled. I also want to move to masked arrays and have no idea if histogram2d can deal with those. At the least, I need to do some testing, and I suspect I'll need to do some hacking on histogram2d (or just write my own). I'm sure I'm not the only one in the world that needs to histogram some data that may have invalid values -- so wouldn't it be nice if that were already handled in a defined way? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Ryan May

7:53 p.m.

On Thu, Mar 18, 2010 at 2:46 PM, Christopher Barker wrote:

...

Gael Varoquaux wrote:

...
On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:

...
sure -- that's kind of my point -- if EVERY numpy array were (potentially) masked, then folks would write code to deal with them appropriately.

That's pretty much saying: "I have a complicated problem and I want every one else to have to deal with the full complexity of it, even if they have a simple problem".

Well -- I did say it was a fantasy...

But I disagree -- having invalid data is a very common case. What we have now is a situation where we have two parallel systems, masked arrays and regular arrays. Each time someone does something new with masked arrays, they often find another missing feature, and have to solve that. Also, the fact that masked arrays are tacked on means that performance suffers.

Case in point, I just found a bug in np.gradient where it forces the output to be an ndarray. (http://projects.scipy.org/numpy/ticket/1435). Easy fix that doesn't actually require any special casing for masked arrays, just making sure to use the proper function to create a new array of the same subclass as the input. However, now for any place that I can't patch I have to use a custom function until a fixed numpy is released. Maybe universal support for masked arrays (and masking invalid points) is a pipe dream, but every function in numpy should IMO deal properly with subclasses of ndarray. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma

Eric Firing

9:12 p.m.

Ryan May wrote:

...

On Thu, Mar 18, 2010 at 2:46 PM, Christopher Barker wrote:

...
Gael Varoquaux wrote:

...
On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:

...
sure -- that's kind of my point -- if EVERY numpy array were (potentially) masked, then folks would write code to deal with them appropriately. That's pretty much saying: "I have a complicated problem and I want every one else to have to deal with the full complexity of it, even if they have a simple problem". Well -- I did say it was a fantasy...

But I disagree -- having invalid data is a very common case. What we have now is a situation where we have two parallel systems, masked arrays and regular arrays. Each time someone does something new with masked arrays, they often find another missing feature, and have to solve that. Also, the fact that masked arrays are tacked on means that performance suffers.

Case in point, I just found a bug in np.gradient where it forces the output to be an ndarray. (http://projects.scipy.org/numpy/ticket/1435). Easy fix that doesn't actually require any special casing for masked arrays, just making sure to use the proper function to create a new array of the same subclass as the input. However, now for any place that I can't patch I have to use a custom function until a fixed numpy is released.

Maybe universal support for masked arrays (and masking invalid points) is a pipe dream, but every function in numpy should IMO deal properly with subclasses of ndarray.

1) This can't be done in general because subclasses can change things to the point where there is little one can count on. The matrix subclass, for example, redefines multiplication and iteration, making it difficult to write functions that will work for ndarrays or matrices. 2) There is a lot that can be done to improve the handling of masked arrays, and I still believe that much of it should be done at the C level, where it can be done with speed and simplicity. Unfortunately, figuring out how to do it well, and implementing it well, will require a lot of intensive work. I suspect it won't get done unless we can figure out how to get a qualified person dedicated to it. Eric

...

Ryan

Darren Dale

9:42 p.m.

On Thu, Mar 18, 2010 at 5:12 PM, Eric Firing wrote:

...

Ryan May wrote:

...
On Thu, Mar 18, 2010 at 2:46 PM, Christopher Barker wrote:

...
Gael Varoquaux wrote:

...
On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:

...
sure -- that's kind of my point -- if EVERY numpy array were (potentially) masked, then folks would write code to deal with them appropriately. That's pretty much saying: "I have a complicated problem and I want every one else to have to deal with the full complexity of it, even if they have a simple problem". Well -- I did say it was a fantasy...

But I disagree -- having invalid data is a very common case. What we have now is a situation where we have two parallel systems, masked arrays and regular arrays. Each time someone does something new with masked arrays, they often find another missing feature, and have to solve that. Also, the fact that masked arrays are tacked on means that performance suffers.

Case in point, I just found a bug in np.gradient where it forces the output to be an ndarray. (http://projects.scipy.org/numpy/ticket/1435). Easy fix that doesn't actually require any special casing for masked arrays, just making sure to use the proper function to create a new array of the same subclass as the input. However, now for any place that I can't patch I have to use a custom function until a fixed numpy is released.

Maybe universal support for masked arrays (and masking invalid points) is a pipe dream, but every function in numpy should IMO deal properly with subclasses of ndarray.

1) This can't be done in general because subclasses can change things to the point where there is little one can count on. The matrix subclass, for example, redefines multiplication and iteration, making it difficult to write functions that will work for ndarrays or matrices.

I'm more optimistic that it can be done in general, if we provide a mechanism where the subclass with highest priority can customize the execution of the function (ufunc or not). In principle, the subclass could even override the buffer operation, like in the case of matrices. It still can put a lot of responsibility on the authors of the subclass, but what is gained is a framework where np.add (for example) could yield the appropriate result for any subclass, as opposed to the current situation of needing to know which add function can be used for a particular type of input. All speculative, of course. I'll start throwing some examples together when I get a chance. Darren

Pierre GM

19 Mar 19 Mar

5:58 p.m.

On Mar 18, 2010, at 4:12 PM, Eric Firing wrote:

...

Ryan May wrote:

...
On Thu, Mar 18, 2010 at 2:46 PM, Christopher Barker wrote:

...
Gael Varoquaux wrote:

...
On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:

...
sure -- that's kind of my point -- if EVERY numpy array were (potentially) masked, then folks would write code to deal with them appropriately. That's pretty much saying: "I have a complicated problem and I want every one else to have to deal with the full complexity of it, even if they have a simple problem". Well -- I did say it was a fantasy...

But I disagree -- having invalid data is a very common case. What we have now is a situation where we have two parallel systems, masked arrays and regular arrays. Each time someone does something new with masked arrays, they often find another missing feature, and have to solve that. Also, the fact that masked arrays are tacked on means that performance suffers.

Please keep in mind that MaskedArrays were always provided for convenience, that's all. If you need performance, you must implement a solution adapted to your problem (dropping missing values, filling them with some kind of interpolation...) and just use standard ndarrays. Anyway, the plan was since the beginning to have MaskedArrays implemented in C at one point or another. A few years back I checked how to subclass ndarrays in Cython, but ran into a lot of problems. Travis O advised me to focus on MaskedArrays instead, for good reasons. Now we have something that's pretty close to a ndarray (by opposition to the implementation in numeric), that works most of the time but could be optimized.

...

...
Case in point, I just found a bug in np.gradient where it forces the output to be an ndarray. (http://projects.scipy.org/numpy/ticket/1435). Easy fix that doesn't actually require any special casing for masked arrays, just making sure to use the proper function to create a new array of the same subclass as the input. However, now for any place that I can't patch I have to use a custom function until a fixed numpy is released.

Maybe universal support for masked arrays (and masking invalid points) is a pipe dream, but every function in numpy should IMO deal properly with subclasses of ndarray.

1) This can't be done in general because subclasses can change things to the point where there is little one can count on. The matrix subclass, for example, redefines multiplication and iteration, making it difficult to write functions that will work for ndarrays or matrices.

And one can always add a function to numpy.ma.extras...

...

2) There is a lot that can be done to improve the handling of masked arrays, and I still believe that much of it should be done at the C level, where it can be done with speed and simplicity. Unfortunately, figuring out how to do it well, and implementing it well, will require a lot of intensive work. I suspect it won't get done unless we can figure out how to get a qualified person dedicated to it.

I still can't speak C, but now that I'm unemployed, I should have plenty of free time to learn... Hire me ;)

josef.pktd＠gmail.com

18 Mar 18 Mar

8:15 p.m.

On Thu, Mar 18, 2010 at 3:46 PM, Christopher Barker wrote:

...

Gael Varoquaux wrote:

...
On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:

...
sure -- that's kind of my point -- if EVERY numpy array were (potentially) masked, then folks would write code to deal with them appropriately.

That's pretty much saying: "I have a complicated problem and I want every one else to have to deal with the full complexity of it, even if they have a simple problem".

Well -- I did say it was a fantasy...

But I disagree -- having invalid data is a very common case. What we have now is a situation where we have two parallel systems, masked arrays and regular arrays. Each time someone does something new with masked arrays, they often find another missing feature, and have to solve that. Also, the fact that masked arrays are tacked on means that performance suffers.

Maybe it would simply be too ugly, but If I were to start from the ground up with a scientific computing package, I would want to put in support for missing values from that start.

There are some cases where is it simply too complicated or to expensive to handle missing values -- fine, then an exception is raised.

You may be right about how complicated it would be, and what would happen is that everyone would simply put a:

if a.masked: raise ("I can't deal with masked dat")

stanza at the top of every new method they wrote, but I suspect that if the core infrastructure was in place, it would be used.

I'm facing this at the moment: not a big deal, but I'm using histogram2d on some data. I just realized that it may have some NaNs in it, and I have no idea how those are being handled. I also want to move to masked arrays and have no idea if histogram2d can deal with those. At the least, I need to do some testing, and I suspect I'll need to do some hacking on histogram2d (or just write my own).

I'm sure I'm not the only one in the world that needs to histogram some data that may have invalid values -- so wouldn't it be nice if that were already handled in a defined way?

histogram2d handles neither masked arrays nor arrays with nans correctly, but assuming you want to drop all columns that have at least one missing value, then it is just one small step. Unless you want to replace the missing value with the mean, or a conditional prediction, or by interpolation. This could be included in the histogram function.

...

...
...
x = np.ma.array([[1,2, 3],[2,1,1]], mask=[[0, 1,0], [0,0,0]]) np.histogram2d(x[0],x[1],bins=3) (array([[ 0., 0., 1.], [ 1., 0., 0.], [ 1., 0., 0.]]), array([ 1. , 1.66666667, 2.33333333, 3. ]), array([ 1. , 1.33333333, 1.66666667, 2. ]))

...

...
...
x2=x[:,~x.mask.any(0)] np.histogram2d(x2[0],x2[1],bins=3) (array([[ 0., 0., 1.], [ 0., 0., 0.], [ 1., 0., 0.]]), array([ 1. , 1.66666667, 2.33333333, 3. ]), array([ 1. , 1.33333333, 1.66666667, 2. ]))

...

...
...
x = np.array([[1.,np.nan, 3],[2,1,1]]) x array([[ 1., NaN, 3.], [ 2., 1., 1.]]) np.histogram2d(x[0],x[1],bins=3) (array([[ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.]]), array([ NaN, NaN, NaN, NaN]), array([ 1. , 1.33333333, 1.66666667, 2. ]))

...

...
...
x2=x[:,np.isfinite(x).all(0)] np.histogram2d(x2[0],x2[1],bins=3) (array([[ 0., 0., 1.], [ 0., 0., 0.], [ 1., 0., 0.]]), array([ 1. , 1.66666667, 2.33333333, 3. ]), array([ 1. , 1.33333333, 1.66666667, 2. ]))

Josef

...

-Chris

-- Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Christopher Barker

11:26 p.m.

josef.pktd@gmail.com wrote:

...

...
I'm facing this at the moment: not a big deal, but I'm using histogram2d on some data. I just realized that it may have some NaNs in it, and I have no idea how those are being handled.

...

histogram2d handles neither masked arrays nor arrays with nans correctly,

I really wasn't asking for help (yet) .. but thanks!

...

...
...
...
x2=x[:,np.isfinite(x).all(0)] np.histogram2d(x2[0],x2[1],bins=3) (array([[ 0., 0., 1.], [ 0., 0., 0.], [ 1., 0., 0.]]), array([ 1. , 1.66666667, 2.33333333, 3. ]), array([ 1. , 1.33333333, 1.66666667, 2. ]))

I'll probably do something like that for now. I guess the question is -- should this be built in to histogram2d (and other similar functions)? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

josef.pktd＠gmail.com

19 Mar 19 Mar

12:32 a.m.

On Thu, Mar 18, 2010 at 7:26 PM, Christopher Barker wrote:

...

josef.pktd@gmail.com wrote:

...
...
I'm facing this at the moment: not a big deal, but I'm using histogram2d on some data. I just realized that it may have some NaNs in it, and I have no idea how those are being handled.

...
histogram2d handles neither masked arrays nor arrays with nans correctly,

I really wasn't asking for help (yet) .. but thanks!

...
...
...
...
x2=x[:,np.isfinite(x).all(0)] np.histogram2d(x2[0],x2[1],bins=3) (array([[ 0., 0., 1.], [ 0., 0., 0.], [ 1., 0., 0.]]), array([ 1. , 1.66666667, 2.33333333, 3. ]), array([ 1. , 1.33333333, 1.66666667, 2. ]))

I'll probably do something like that for now. I guess the question is -- should this be built in to histogram2d (and other similar functions)?

I think yes, for all functions that are closer to actual data and where there is an obvious way to handle the missing values. But, it's work and adds a lot of code to a nice simple function. And if it's just one extra line for the user, than it is not too high on my priority. For example, I rewrote stats.zscore a while ago to handle also matrices and masked arrays, and Bruce rewrote geometric mean and others, but these are easy cases, for many of the other functions it's more work. Also. if a function gets too much overhead, I end up rewriting and inlining the core of the function over and over again when I need it inside a loop, for example for optimization, or I keep a copy of the function around that doesn't use the overhead. I actually do little profiling, so I don't really know what the cost would be in a loop. Josef

...

-Chris

-- Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Pierre GM

17 Mar 17 Mar

8:48 p.m.

On Mar 17, 2010, at 8:19 AM, Darren Dale wrote:

...

I started thinking about a third method called __input_prepare__ that would be called on the way into the ufunc, which would allow you to intercept the input and pass a somehow modified copy back to the ufunc. The total flow would be:

1) Call myufunc(x, y[, z]) 2) myufunc calls ?.__input_prepare__(myufunc, x, y), which returns x', y' (or simply passes through x,y by default) 3) myufunc creates the output array z (if not specified) and calls ?.__array_prepare__(z, (myufunc, x, y, ...)) 4) myufunc finally gets around to performing the calculation 5) myufunc calls ?.__array_wrap__(z, (myufunc, x, y, ...)) and returns the result to the caller

Is this general enough for your use case? I haven't tried to think about how to change some global state at one point and change it back at another, that seems like a bad idea and difficult to support.

Sounds like a good plan. If we could find a way to merge the first two (__input_prepare__ and __array_prepare__), that'd be ideal.

Darren Dale

9:13 p.m.

On Wed, Mar 17, 2010 at 4:48 PM, Pierre GM wrote:

...

On Mar 17, 2010, at 8:19 AM, Darren Dale wrote:

...
I started thinking about a third method called __input_prepare__ that would be called on the way into the ufunc, which would allow you to intercept the input and pass a somehow modified copy back to the ufunc. The total flow would be:

1) Call myufunc(x, y[, z]) 2) myufunc calls ?.__input_prepare__(myufunc, x, y), which returns x', y' (or simply passes through x,y by default) 3) myufunc creates the output array z (if not specified) and calls ?.__array_prepare__(z, (myufunc, x, y, ...)) 4) myufunc finally gets around to performing the calculation 5) myufunc calls ?.__array_wrap__(z, (myufunc, x, y, ...)) and returns the result to the caller

Is this general enough for your use case? I haven't tried to think about how to change some global state at one point and change it back at another, that seems like a bad idea and difficult to support.

Sounds like a good plan. If we could find a way to merge the first two (__input_prepare__ and __array_prepare__), that'd be ideal.

I think it is better to keep them separate, so we don't have one method that is trying to do too much. It would be easier to explain in the documentation. I may not have much time to look into this until after Monday. Is there a deadline we need to consider? Darren

Charles R Harris

9:43 p.m.

On Wed, Mar 17, 2010 at 3:13 PM, Darren Dale wrote:

...

On Wed, Mar 17, 2010 at 4:48 PM, Pierre GM wrote:

...
On Mar 17, 2010, at 8:19 AM, Darren Dale wrote:

...
I started thinking about a third method called __input_prepare__ that would be called on the way into the ufunc, which would allow you to intercept the input and pass a somehow modified copy back to the ufunc. The total flow would be:

1) Call myufunc(x, y[, z]) 2) myufunc calls ?.__input_prepare__(myufunc, x, y), which returns x', y' (or simply passes through x,y by default) 3) myufunc creates the output array z (if not specified) and calls ?.__array_prepare__(z, (myufunc, x, y, ...)) 4) myufunc finally gets around to performing the calculation 5) myufunc calls ?.__array_wrap__(z, (myufunc, x, y, ...)) and returns the result to the caller

Is this general enough for your use case? I haven't tried to think about how to change some global state at one point and change it back at another, that seems like a bad idea and difficult to support.

Sounds like a good plan. If we could find a way to merge the first two (__input_prepare__ and __array_prepare__), that'd be ideal.

I think it is better to keep them separate, so we don't have one method that is trying to do too much. It would be easier to explain in the documentation.

I may not have much time to look into this until after Monday. Is there a deadline we need to consider?

I don't think this should go into 2.0, I think it needs more thought. And 2.0 already has significant code churn. Is there any reason beyond a big hassle not to set/restore the error state around all the ufunc calls in ma? Beyond that, the PEP that you pointed to looks interesting. Maybe some sort of decorator around ufunc calls could also be made to work. Chuck

Darren Dale

11:26 p.m.

On Wed, Mar 17, 2010 at 5:43 PM, Charles R Harris wrote:

...

On Wed, Mar 17, 2010 at 3:13 PM, Darren Dale wrote:

...
On Wed, Mar 17, 2010 at 4:48 PM, Pierre GM wrote:

...
On Mar 17, 2010, at 8:19 AM, Darren Dale wrote:

...
I started thinking about a third method called __input_prepare__ that would be called on the way into the ufunc, which would allow you to intercept the input and pass a somehow modified copy back to the ufunc. The total flow would be:

1) Call myufunc(x, y[, z]) 2) myufunc calls ?.__input_prepare__(myufunc, x, y), which returns x', y' (or simply passes through x,y by default) 3) myufunc creates the output array z (if not specified) and calls ?.__array_prepare__(z, (myufunc, x, y, ...)) 4) myufunc finally gets around to performing the calculation 5) myufunc calls ?.__array_wrap__(z, (myufunc, x, y, ...)) and returns the result to the caller

Is this general enough for your use case? I haven't tried to think about how to change some global state at one point and change it back at another, that seems like a bad idea and difficult to support.

Sounds like a good plan. If we could find a way to merge the first two (__input_prepare__ and __array_prepare__), that'd be ideal.

I think it is better to keep them separate, so we don't have one method that is trying to do too much. It would be easier to explain in the documentation.

I may not have much time to look into this until after Monday. Is there a deadline we need to consider?

I don't think this should go into 2.0, I think it needs more thought.

Now that you mention it, I agree that it would be too rushed to try to get it in for 2.0. Concerning a later release, is there anything in particular that you think needs to be clarified or reconsidered?

...

And 2.0 already has significant code churn. Is there any reason beyond a big hassle not to set/restore the error state around all the ufunc calls in ma? Beyond that, the PEP that you pointed to looks interesting. Maybe some sort of decorator around ufunc calls could also be made to work.

I think the PEP is interesting, but it is languishing. There were some questions and criticisms on the mailing list that I do not think were satisfactorily addressed, and as far as I know the author of the PEP has not pursued the matter further. There was some interest on the python-dev mailing list in the numpy community's use case, but I think we need to consider what can be done now to meet the needs of ndarray subclasses. I don't see PEP 3124 happening in the near future. What I am proposing is a simple extension to our existing framework to let subclasses hook into ufuncs and customize their behavior based on the context of the operation (using the __array_priority__ of the inputs and/or outputs, and the identity of the ufunc). The steps I listed allow customization at the critical steps: prepare the input, prepare the output, populate the output (currently no proposal for customization here), and finalize the output. The only additional step proposed is to prepare the input. In the long run, we could consider if ufuncs should be instances of a class, perhaps implemented in Cython. This way the ufunc will be able to pass itself to the special array methods as part of the context tuple, as is currently done. Maybe an alternative approach would be for ufuncs to provide methods where subclasses could register routines for the various steps I specified based on the types of the inputs, similar to the PEP. This way, the ufunc would determine the context based on the input (rather than the current way of the ufunc determining part of the context based on the input by inspecting __array_priority__ and then the input with highest priority determining the context based on the identity of the ufunc and the rest of the input.) This new (half baked) approach could be backward-compatible with the old one: if the combination of inputs isn't found in the registry, it would fall back on the existing input-/array_prepare array_wrap mechanisms (which in principle could then be deprecated, and at that point __array_priority__ might no longer be necessary). I don't see anything to indicate that we would regret implementing a special __input_prepare__ method down the road. Darren

Charles R Harris

18 Mar 18 Mar

12:22 a.m.

On Wed, Mar 17, 2010 at 5:26 PM, Darren Dale wrote:

...

On Wed, Mar 17, 2010 at 5:43 PM, Charles R Harris wrote:

...
On Wed, Mar 17, 2010 at 3:13 PM, Darren Dale wrote:

...
On Wed, Mar 17, 2010 at 4:48 PM, Pierre GM wrote:

...
On Mar 17, 2010, at 8:19 AM, Darren Dale wrote:

...
I started thinking about a third method called __input_prepare__ that would be called on the way into the ufunc, which would allow you to intercept the input and pass a somehow modified copy back to the ufunc. The total flow would be:

1) Call myufunc(x, y[, z]) 2) myufunc calls ?.__input_prepare__(myufunc, x, y), which returns

x',

...
y' (or simply passes through x,y by default) 3) myufunc creates the output array z (if not specified) and calls ?.__array_prepare__(z, (myufunc, x, y, ...)) 4) myufunc finally gets around to performing the calculation 5) myufunc calls ?.__array_wrap__(z, (myufunc, x, y, ...)) and returns the result to the caller

Is this general enough for your use case? I haven't tried to think about how to change some global state at one point and change it back at another, that seems like a bad idea and difficult to support.

Sounds like a good plan. If we could find a way to merge the first two (__input_prepare__ and __array_prepare__), that'd be ideal.

I think it is better to keep them separate, so we don't have one method that is trying to do too much. It would be easier to explain in the documentation.

I may not have much time to look into this until after Monday. Is there a deadline we need to consider?

I don't think this should go into 2.0, I think it needs more thought.

Now that you mention it, I agree that it would be too rushed to try to get it in for 2.0. Concerning a later release, is there anything in particular that you think needs to be clarified or reconsidered?

...
And 2.0 already has significant code churn. Is there any reason beyond a big hassle not to set/restore the error state around all the ufunc calls in ma? Beyond that, the PEP that you pointed to looks interesting. Maybe some sort of decorator around ufunc calls could also be made to work.

I think the PEP is interesting, but it is languishing. There were some questions and criticisms on the mailing list that I do not think were satisfactorily addressed, and as far as I know the author of the PEP has not pursued the matter further. There was some interest on the python-dev mailing list in the numpy community's use case, but I think we need to consider what can be done now to meet the needs of ndarray subclasses. I don't see PEP 3124 happening in the near future.

What I am proposing is a simple extension to our existing framework to let subclasses hook into ufuncs and customize their behavior based on the context of the operation (using the __array_priority__ of the inputs and/or outputs, and the identity of the ufunc). The steps I listed allow customization at the critical steps: prepare the input, prepare the output, populate the output (currently no proposal for customization here), and finalize the output. The only additional step proposed is to prepare the input.

What bothers me here is the opposing desire to separate ufuncs from their ndarray dependency, having them operate on buffer objects instead. As I see it ufuncs would be split into layers, with a lower layer operating on buffer objects, and an upper layer tying them together with ndarrays where the "business" logic -- kinds, casting, etc -- resides. It is in that upper layer that what you are proposing would reside. Mind, I'm not sure that having matrices and masked arrays subclassing ndarray was the way to go, but given that they do one possible solution is to dump the whole mess onto the subtype with the highest priority. That subtype would then be responsible for casts and all the other stuff needed for the call and wrapping the result. There could be library routines to help with that. It seems to me that that would be the most general way to go. In that sense ndarrays themselves would just be another subtype with especially low priority. In the long run, we could consider if ufuncs should be instances of a

...

class, perhaps implemented in Cython. This way the ufunc will be able to pass itself to the special array methods as part of the context tuple, as is currently done. Maybe an alternative approach would be for ufuncs to provide methods where subclasses could register routines for the various steps I specified based on the types of the inputs, similar to the PEP. This way, the ufunc would determine the context based on the input (rather than the current way of the ufunc determining part of the context based on the input by inspecting __array_priority__ and then the input with highest priority determining the context based on the identity of the ufunc and the rest of the input.) This new (half baked) approach could be backward-compatible with the old one: if the combination of inputs isn't found in the registry, it would fall back on the existing input-/array_prepare array_wrap mechanisms (which in principle could then be deprecated, and at that point __array_priority__ might no longer be necessary). I don't see anything to indicate that we would regret implementing a special __input_prepare__ method down the road.

Chuck

Darren Dale

1:39 a.m.

On Wed, Mar 17, 2010 at 8:22 PM, Charles R Harris wrote:

...

On Wed, Mar 17, 2010 at 5:26 PM, Darren Dale wrote:

...
On Wed, Mar 17, 2010 at 5:43 PM, Charles R Harris wrote:

...
On Wed, Mar 17, 2010 at 3:13 PM, Darren Dale wrote:

...
On Wed, Mar 17, 2010 at 4:48 PM, Pierre GM wrote:

...
On Mar 17, 2010, at 8:19 AM, Darren Dale wrote:

...
I started thinking about a third method called __input_prepare__ that would be called on the way into the ufunc, which would allow you to intercept the input and pass a somehow modified copy back to the ufunc. The total flow would be:

1) Call myufunc(x, y[, z]) 2) myufunc calls ?.__input_prepare__(myufunc, x, y), which returns x', y' (or simply passes through x,y by default) 3) myufunc creates the output array z (if not specified) and calls ?.__array_prepare__(z, (myufunc, x, y, ...)) 4) myufunc finally gets around to performing the calculation 5) myufunc calls ?.__array_wrap__(z, (myufunc, x, y, ...)) and returns the result to the caller

Is this general enough for your use case? I haven't tried to think about how to change some global state at one point and change it back at another, that seems like a bad idea and difficult to support.

Sounds like a good plan. If we could find a way to merge the first two (__input_prepare__ and __array_prepare__), that'd be ideal.

I think it is better to keep them separate, so we don't have one method that is trying to do too much. It would be easier to explain in the documentation.

I may not have much time to look into this until after Monday. Is there a deadline we need to consider?

I don't think this should go into 2.0, I think it needs more thought.

Now that you mention it, I agree that it would be too rushed to try to get it in for 2.0. Concerning a later release, is there anything in particular that you think needs to be clarified or reconsidered?

...
And 2.0 already has significant code churn. Is there any reason beyond a big hassle not to set/restore the error state around all the ufunc calls in ma? Beyond that, the PEP that you pointed to looks interesting. Maybe some sort of decorator around ufunc calls could also be made to work.

I think the PEP is interesting, but it is languishing. There were some questions and criticisms on the mailing list that I do not think were satisfactorily addressed, and as far as I know the author of the PEP has not pursued the matter further. There was some interest on the python-dev mailing list in the numpy community's use case, but I think we need to consider what can be done now to meet the needs of ndarray subclasses. I don't see PEP 3124 happening in the near future.

What I am proposing is a simple extension to our existing framework to let subclasses hook into ufuncs and customize their behavior based on the context of the operation (using the __array_priority__ of the inputs and/or outputs, and the identity of the ufunc). The steps I listed allow customization at the critical steps: prepare the input, prepare the output, populate the output (currently no proposal for customization here), and finalize the output. The only additional step proposed is to prepare the input.

What bothers me here is the opposing desire to separate ufuncs from their ndarray dependency, having them operate on buffer objects instead. As I see it ufuncs would be split into layers, with a lower layer operating on buffer objects, and an upper layer tying them together with ndarrays where the "business" logic -- kinds, casting, etc -- resides. It is in that upper layer that what you are proposing would reside. Mind, I'm not sure that having matrices and masked arrays subclassing ndarray was the way to go, but given that they do one possible solution is to dump the whole mess onto the subtype with the highest priority. That subtype would then be responsible for casts and all the other stuff needed for the call and wrapping the result. There could be library routines to help with that. It seems to me that that would be the most general way to go. In that sense ndarrays themselves would just be another subtype with especially low priority.

I'm sorry, I didn't understand your point. What you described sounds identical to how things are currently done. What distinction are you making, aside from operating on the buffer object? How would adding a method to modify the input to a ufunc complicate the situation? Darren

Charles R Harris

2:16 a.m.

On Wed, Mar 17, 2010 at 7:39 PM, Darren Dale wrote:

...

On Wed, Mar 17, 2010 at 8:22 PM, Charles R Harris wrote:

...
On Wed, Mar 17, 2010 at 5:26 PM, Darren Dale wrote:

...
On Wed, Mar 17, 2010 at 5:43 PM, Charles R Harris wrote:

...
On Wed, Mar 17, 2010 at 3:13 PM, Darren Dale

...
...
...
...
On Wed, Mar 17, 2010 at 4:48 PM, Pierre GM wrote:

...
On Mar 17, 2010, at 8:19 AM, Darren Dale wrote: > > I started thinking about a third method called __input_prepare__ > that > would be called on the way into the ufunc, which would allow you to > intercept the input and pass a somehow modified copy back to the > ufunc. The total flow would be: > > 1) Call myufunc(x, y[, z]) > 2) myufunc calls ?.__input_prepare__(myufunc, x, y), which returns > x', > y' (or simply passes through x,y by default) > 3) myufunc creates the output array z (if not specified) and calls > ?.__array_prepare__(z, (myufunc, x, y, ...)) > 4) myufunc finally gets around to performing the calculation > 5) myufunc calls ?.__array_wrap__(z, (myufunc, x, y, ...)) and > returns > the result to the caller > > Is this general enough for your use case? I haven't tried to think > about how to change some global state at one point and change it > back > at another, that seems like a bad idea and difficult to support.

Sounds like a good plan. If we could find a way to merge the first two (__input_prepare__ and __array_prepare__), that'd be ideal.

I think it is better to keep them separate, so we don't have one method that is trying to do too much. It would be easier to explain in the documentation.

I may not have much time to look into this until after Monday. Is there a deadline we need to consider?

I don't think this should go into 2.0, I think it needs more thought.

Now that you mention it, I agree that it would be too rushed to try to get it in for 2.0. Concerning a later release, is there anything in particular that you think needs to be clarified or reconsidered?

...
And 2.0 already has significant code churn. Is there any reason beyond a big hassle not to set/restore the error state around all the ufunc calls in ma? Beyond that, the PEP that you pointed to looks interesting. Maybe some sort of decorator around ufunc calls could also be made to work.

I think the PEP is interesting, but it is languishing. There were some questions and criticisms on the mailing list that I do not think were satisfactorily addressed, and as far as I know the author of the PEP has not pursued the matter further. There was some interest on the python-dev mailing list in the numpy community's use case, but I think we need to consider what can be done now to meet the needs of ndarray subclasses. I don't see PEP 3124 happening in the near future.

What I am proposing is a simple extension to our existing framework to let subclasses hook into ufuncs and customize their behavior based on the context of the operation (using the __array_priority__ of the inputs and/or outputs, and the identity of the ufunc). The steps I listed allow customization at the critical steps: prepare the input, prepare the output, populate the output (currently no proposal for customization here), and finalize the output. The only additional step proposed is to prepare the input.

What bothers me here is the opposing desire to separate ufuncs from their ndarray dependency, having them operate on buffer objects instead. As I see it ufuncs would be split into layers, with a lower layer operating on buffer objects, and an upper layer tying them together with ndarrays where the "business" logic -- kinds, casting, etc -- resides. It is in that upper layer that what you are proposing would reside. Mind, I'm not sure that having matrices and masked arrays subclassing ndarray was the way to go, but given that they do one possible solution is to dump the whole mess onto

wrote: the

...
subtype with the highest priority. That subtype would then be responsible for casts and all the other stuff needed for the call and wrapping the result. There could be library routines to help with that. It seems to me that that would be the most general way to go. In that sense ndarrays themselves would just be another subtype with especially low priority.

I'm sorry, I didn't understand your point. What you described sounds identical to how things are currently done. What distinction are you making, aside from operating on the buffer object? How would adding a method to modify the input to a ufunc complicate the situation?

Just *one* function to rule them all and on the subtype dump it. No __array_wrap__, __input_prepare__, or __array_prepare__, just something like __handle_ufunc__. So it is similar but perhaps more radical. I'm proposing having the ufunc upper layer do nothing but decide which argument type will do all the rest of the work, casting, calling the low level ufunc base, providing buffers, wrapping, etc. Instead of pasting bits and pieces into the existing framework I would like to lay out a line of attack that ends up separating ufuncs into smaller pieces that provide low level routines that work on strided memory while leaving policy implementation to the subtype. There would need to be some default type (ndarray) when the functions are called on nested lists and scalars and I'm not sure of the best way to handle that. I'm just sort of thinking out loud, don't take it too seriously. Chuck

Pierre GM

4:07 p.m.

On Mar 17, 2010, at 9:16 PM, Charles R Harris wrote:

...

On Wed, Mar 17, 2010 at 7:39 PM, Darren Dale wrote: On Wed, Mar 17, 2010 at 8:22 PM, Charles R Harris wrote:

...
What bothers me here is the opposing desire to separate ufuncs from their ndarray dependency, having them operate on buffer objects instead. As I see it ufuncs would be split into layers, with a lower layer operating on buffer objects, and an upper layer tying them together with ndarrays where the "business" logic -- kinds, casting, etc -- resides. It is in that upper layer that what you are proposing would reside. Mind, I'm not sure that having matrices and masked arrays subclassing ndarray was the way to go, but given that they do one possible solution is to dump the whole mess onto the subtype with the highest priority. That subtype would then be responsible for casts and all the other stuff needed for the call and wrapping the result. There could be library routines to help with that. It seems to me that that would be the most general way to go. In that sense ndarrays themselves would just be another subtype with especially low priority.

I'm sorry, I didn't understand your point. What you described sounds identical to how things are currently done. What distinction are you making, aside from operating on the buffer object? How would adding a method to modify the input to a ufunc complicate the situation?

Just *one* function to rule them all and on the subtype dump it. No __array_wrap__, __input_prepare__, or __array_prepare__, just something like __handle_ufunc__. So it is similar but perhaps more radical. I'm proposing having the ufunc upper layer do nothing but decide which argument type will do all the rest of the work, casting, calling the low level ufunc base, providing buffers, wrapping, etc. Instead of pasting bits and pieces into the existing framework I would like to lay out a line of attack that ends up separating ufuncs into smaller pieces that provide low level routines that work on strided memory while leaving policy implementation to the subtype. There would need to be some default type (ndarray) when the functions are called on nested lists and scalars and I'm not sure of the best way to handle that.

I'm just sort of thinking out loud, don't take it too seriously.

Still, I like it. It sounds far cleaner than the current Harlequin's costume approach. In the thinking out loud department: * the upper layer should allow the user to modify the input on the fly (a current limitation of __array_prepare__), so that we can change values that we know will give invalid results before the lower layer processes them. * It'd be nice to have the domains defined in the functions. Right now, the domains are defined in numpy.ma.core for unary and binary functions. Maybe we could extend the 'context' of each ufunc ?

Darren Dale

19 Mar 19 Mar

1:37 p.m.

On Wed, Mar 17, 2010 at 10:16 PM, Charles R Harris wrote:

...

On Wed, Mar 17, 2010 at 7:39 PM, Darren Dale wrote:

...
On Wed, Mar 17, 2010 at 8:22 PM, Charles R Harris

...
What bothers me here is the opposing desire to separate ufuncs from their ndarray dependency, having them operate on buffer objects instead. As I see it ufuncs would be split into layers, with a lower layer operating on buffer objects, and an upper layer tying them together with ndarrays where the "business" logic -- kinds, casting, etc -- resides. It is in that upper layer that what you are proposing would reside. Mind, I'm not sure that having matrices and masked arrays subclassing ndarray was the way to go, but given that they do one possible solution is to dump the whole mess onto the subtype with the highest priority. That subtype would then be responsible for casts and all the other stuff needed for the call and wrapping the result. There could be library routines to help with that. It seems to me that that would be the most general way to go. In that sense ndarrays themselves would just be another subtype with especially low priority.

I'm sorry, I didn't understand your point. What you described sounds identical to how things are currently done. What distinction are you making, aside from operating on the buffer object? How would adding a method to modify the input to a ufunc complicate the situation?

Just *one* function to rule them all and on the subtype dump it. No __array_wrap__, __input_prepare__, or __array_prepare__, just something like __handle_ufunc__. So it is similar but perhaps more radical. I'm proposing having the ufunc upper layer do nothing but decide which argument type will do all the rest of the work, casting, calling the low level ufunc base, providing buffers, wrapping, etc. Instead of pasting bits and pieces into the existing framework I would like to lay out a line of attack that ends up separating ufuncs into smaller pieces that provide low level routines that work on strided memory while leaving policy implementation to the subtype. There would need to be some default type (ndarray) when the functions are called on nested lists and scalars and I'm not sure of the best way to handle that.

I'm just sort of thinking out loud, don't take it too seriously.

This is a seemingly simplified approach. I was taken with it last night but then I remembered that it will make subclassing difficult. A simple example can illustrate the problem. We have MaskedArray, which needs to customize some functions that operate on arrays or buffers, so we pass the function and the arguments to __handle_ufunc__ and it takes care of the whole shebang. But now I develop a MaskedQuantity that takes masked arrays and gives them the ability to handle units, and so it needs to customize those functions further. Maybe MaskedQuantity can modify the input passed to its __handle_ufunc__ and then pass everything on to super().__handle_ufunc__, such that MaskedQuantity does not have to reimplement MaskedArray's customizations to that particular function, but that is not enough flexibility for the general case. If a my subclass needs to call the low-level ufunc base, it can't rely on the superclass.__handle_ufunc__ because it *also* calls the ufunc base, so my subclass has to reimplement all of the superclass function customizations. The current scheme (__input_prepare__, ...) is better able to handle subclassing, although I agree that it could be improved. If the subclasses were responsible for calling the ufunc base, alternative bases could be provided (like the c routines for masked arrays). That still seems to require the high-level function to provide three or four entry points: 1) modify the input, 2) initialize the output (chance to deal with metadata), 3) call the function base, 4) finalize the output (deal with metadata that requires the ufunc results). Perhaps 2 and 4 would not both be needed, I'm not sure. Darren

Pierre GM

18 Mar 18 Mar

3:46 p.m.

On Mar 17, 2010, at 5:43 PM, Charles R Harris wrote:

...

On Wed, Mar 17, 2010 at 3:13 PM, Darren Dale wrote: On Wed, Mar 17, 2010 at 4:48 PM, Pierre GM wrote:

...
On Mar 17, 2010, at 8:19 AM, Darren Dale wrote:

...
I started thinking about a third method called __input_prepare__ that would be called on the way into the ufunc, which would allow you to intercept the input and pass a somehow modified copy back to the ufunc. The total flow would be:

1) Call myufunc(x, y[, z]) 2) myufunc calls ?.__input_prepare__(myufunc, x, y), which returns x', y' (or simply passes through x,y by default) 3) myufunc creates the output array z (if not specified) and calls ?.__array_prepare__(z, (myufunc, x, y, ...)) 4) myufunc finally gets around to performing the calculation 5) myufunc calls ?.__array_wrap__(z, (myufunc, x, y, ...)) and returns the result to the caller

Is this general enough for your use case? I haven't tried to think about how to change some global state at one point and change it back at another, that seems like a bad idea and difficult to support.

Sounds like a good plan. If we could find a way to merge the first two (__input_prepare__ and __array_prepare__), that'd be ideal.

I think it is better to keep them separate, so we don't have one method that is trying to do too much. It would be easier to explain in the documentation.

I may not have much time to look into this until after Monday. Is there a deadline we need to consider?

I don't think this should go into 2.0, I think it needs more thought. And 2.0 already has significant code churn. Is there any reason beyond a big hassle not to set/restore the error state around all the ufunc calls in ma?

Should be done with r8295. Please let me know whether I missed one. Otherwise, I agree with Chuck. Let's take some time to figure something. It'd be significant a change that it shouldn't go in 2.0

Bruce Southey

17 Mar 17 Mar

3:09 p.m.

On 03/17/2010 01:07 AM, Pierre GM wrote:

...

All, As you're probably aware, the current test suite for numpy.ma raises some nagging warnings such as "invalid value in ...". These warnings are only issued when a standard numpy ufunc (eg., np.sqrt) is called on a MaskedArray, instead of its numpy.ma (eg., np.ma.sqrt) equivalent. The reason is that the masked versions of the ufuncs temporarily set the numpy error status to 'ignore' before the operation takes place, and reset the status to its original value.

Perhaps naive question, what is really being tested here? That is, it appears that you are testing both the generation of the invalid values and function. So if the generation fails, then the function will also fail. However, the test for the generation of invalid values should be elsewhere so you have to assume that the generation of values will work correctly. I think that you should be only testing that the specific function passes the test. Why not just use 'invalid' values like np.inf directly? For example, in numpy/ma/tests/test_core.py We have this test: def test_fix_invalid(self): "Checks fix_invalid." data = masked_array(np.sqrt([-1., 0., 1.]), mask=[0, 0, 1]) data_fixed = fix_invalid(data) If that is to test that fix_invalid Why not create the data array as: data = masked_array([np.inf, 0., 1.]), mask=[0, 0, 1]) However, I am not sure the output should be for the test_ndarray_mask test because ma automatically masks the value resulting from sqrt(-1):

...

...
...
a = masked_array([-1, 0, 1, 2, 3], mask=[0, 0, 0, 0, 1]) np.sqrt(a) Warning: invalid value encountered in sqrt masked_array(data = [-- 0.0 1.0 1.41421356237 --], mask = [ True False False False True], fill_value = 999999)

Note the warning is important because it does indicate that the result might not be as expected. But if the -1 is replaced by np.inf, then it is not automatically masked:

...

...
...
b = masked_array([np.inf, 0, 1, 2, 3], mask=[0, 0, 0, 0, 1]) np.sqrt(b) masked_array(data = [inf 0.0 1.0 1.41421356237 --], mask = [False False False False True], fill_value = 1e+20)

Bruce

...

I thought I could use the new __array_prepare__ method to intercept the call of a standard ufunc. After actual testing, that can't work. __array_prepare only help to prepare the *output* of the operation, not to change the input on the fly, just for this operation. Actually, you can modify the input in place, but it's usually not what you want. Then, I tried to use __array_prepare__ to store the current error status in the input, force it to ignore divide/invalid errors and send the input to the ufunc. Doesn't work either: np.seterr in __array_prepare__ does change the error status, but as far as I understand, the ufunc is called is still called with the original error status. That means that if something goes wrong, your error status can stay stuck. Not a good idea either. I'm running out of ideas at this point. For the test suite, I'd suggest to disable the warnings in test_fix_invalid and test_basic_arithmetic. An additional issue is that if one of the error status is set to 'raise', the numpy ufunc will raise the exception (as expected), while its numpy.ma version will not. I'll put also a warning in the docs to that effect. Please send me your comments before I commit any changes. Cheers, P. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Pierre GM

9:20 p.m.

On Mar 17, 2010, at 11:09 AM, Bruce Southey wrote:

...

On 03/17/2010 01:07 AM, Pierre GM wrote:

...
All, As you're probably aware, the current test suite for numpy.ma raises some nagging warnings such as "invalid value in ...". These warnings are only issued when a standard numpy ufunc (eg., np.sqrt) is called on a MaskedArray, instead of its numpy.ma (eg., np.ma.sqrt) equivalent. The reason is that the masked versions of the ufuncs temporarily set the numpy error status to 'ignore' before the operation takes place, and reset the status to its original value.

Perhaps naive question, what is really being tested here?

That is, it appears that you are testing both the generation of the invalid values and function. So if the generation fails, then the function will also fail. However, the test for the generation of invalid values should be elsewhere so you have to assume that the generation of values will work correctly.

That's not really the point here. The issue is that when numpy ufuncs are called on a MaskedArray, a warning or an exception is raised when an invalid is met. With the numpy.ma version of those functions, the error is trapped and processed. Of course, using the numpy.ma version of the ufuncs is the right way to go

...

I think that you should be only testing that the specific function passes the test. Why not just use 'invalid' values like np.inf directly?

For example, in numpy/ma/tests/test_core.py We have this test: def test_fix_invalid(self): "Checks fix_invalid." data = masked_array(np.sqrt([-1., 0., 1.]), mask=[0, 0, 1]) data_fixed = fix_invalid(data)

If that is to test that fix_invalid Why not create the data array as: data = masked_array([np.inf, 0., 1.]), mask=[0, 0, 1])

Sure, that's nicer. But once again, that's not really the core of the issue.

Bruce Southey

18 Mar 18 Mar

4:03 p.m.

On 03/17/2010 04:20 PM, Pierre GM wrote:

...

On Mar 17, 2010, at 11:09 AM, Bruce Southey wrote:

...
On 03/17/2010 01:07 AM, Pierre GM wrote:

...
All, As you're probably aware, the current test suite for numpy.ma raises some nagging warnings such as "invalid value in ...". These warnings are only issued when a standard numpy ufunc (eg., np.sqrt) is called on a MaskedArray, instead of its numpy.ma (eg., np.ma.sqrt) equivalent. The reason is that the masked versions of the ufuncs temporarily set the numpy error status to 'ignore' before the operation takes place, and reset the status to its original value.

Perhaps naive question, what is really being tested here?

That is, it appears that you are testing both the generation of the invalid values and function. So if the generation fails, then the function will also fail. However, the test for the generation of invalid values should be elsewhere so you have to assume that the generation of values will work correctly.

That's not really the point here. The issue is that when numpy ufuncs are called on a MaskedArray, a warning or an exception is raised when an invalid is met. With the numpy.ma version of those functions, the error is trapped and processed. Of course, using the numpy.ma version of the ufuncs is the right way to go

...
I think that you should be only testing that the specific function passes the test. Why not just use 'invalid' values like np.inf directly?

For example, in numpy/ma/tests/test_core.py We have this test: def test_fix_invalid(self): "Checks fix_invalid." data = masked_array(np.sqrt([-1., 0., 1.]), mask=[0, 0, 1]) data_fixed = fix_invalid(data)

If that is to test that fix_invalid Why not create the data array as: data = masked_array([np.inf, 0., 1.]), mask=[0, 0, 1])

Sure, that's nicer. But once again, that's not really the core of the issue.

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

I needed to point out that your statement was not completely correct: 'These warnings are only issued when a standard numpy ufunc (eg., np.sqrt) is called on a MaskedArray...'. There are valid warnings for some of the tests because these are do not operate on masked arrays. As in the above example, the masked array only occurs *after* the square root has been taken and hence the warning. So some of the warnings in the tests should be eliminated just by changing the test input. Furthermore, this warning is technically valid for non-masked values:

...

...
...
np.sqrt(np.ma.array([-1], mask=[0])) Warning: invalid value encountered in sqrt masked_array(data = [--], mask = [ True], fill_value = 1e+20)

In contrast not warning is emitted with ma function:

...

...
...
np.ma.sqrt(np.ma.array([-1], mask=[0])) masked_array(data = [--], mask = [ True], fill_value = 1e+20)

But I fully agree that there is a problem when the value is masked because it should have ignored the operation:

...

...
...
np.sqrt(np.ma.array([-1], mask=[1])) Warning: invalid value encountered in sqrt masked_array(data = [--], mask = [ True], fill_value = 1e+20)

Here I understand your view is that if the operation is on a masked array then all 'invalid' operations like square root then these should be silently converted to masked values. Bruce

Pierre GM

4:09 p.m.

On Mar 18, 2010, at 11:03 AM, Bruce Southey wrote:

...

On 03/17/2010 04:20 PM, Pierre GM wrote:

...
On Mar 17, 2010, at 11:09 AM, Bruce Southey wrote:

...
On 03/17/2010 01:07 AM, Pierre GM wrote:

...
All, As you're probably aware, the current test suite for numpy.ma raises some nagging warnings such as "invalid value in ...". These warnings are only issued when a standard numpy ufunc (eg., np.sqrt) is called on a MaskedArray, instead of its numpy.ma (eg., np.ma.sqrt) equivalent. The reason is that the masked versions of the ufuncs temporarily set the numpy error status to 'ignore' before the operation takes place, and reset the status to its original value.

Perhaps naive question, what is really being tested here?

That is, it appears that you are testing both the generation of the invalid values and function. So if the generation fails, then the function will also fail. However, the test for the generation of invalid values should be elsewhere so you have to assume that the generation of values will work correctly.

That's not really the point here. The issue is that when numpy ufuncs are called on a MaskedArray, a warning or an exception is raised when an invalid is met. With the numpy.ma version of those functions, the error is trapped and processed. Of course, using the numpy.ma version of the ufuncs is the right way to go

...
I think that you should be only testing that the specific function passes the test. Why not just use 'invalid' values like np.inf directly?

For example, in numpy/ma/tests/test_core.py We have this test: def test_fix_invalid(self): "Checks fix_invalid." data = masked_array(np.sqrt([-1., 0., 1.]), mask=[0, 0, 1]) data_fixed = fix_invalid(data)

If that is to test that fix_invalid Why not create the data array as: data = masked_array([np.inf, 0., 1.]), mask=[0, 0, 1])

Sure, that's nicer. But once again, that's not really the core of the issue.

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

I needed to point out that your statement was not completely correct: 'These warnings are only issued when a standard numpy ufunc (eg., np.sqrt) is called on a MaskedArray...'.

There are valid warnings for some of the tests because these are do not operate on masked arrays. As in the above example, the masked array only occurs *after* the square root has been taken and hence the warning. So some of the warnings in the tests should be eliminated just by changing the test input.

Dang, I knew I was forgetting something... OK, I'll work on that. But anyway, I agree with you, some of the tests are not particularly well-thought....

5151

Age (days ago)

5153

Last active (days ago)

List overview

Download

35 comments

9 participants

participants (9)

Bruce Southey
Charles R Harris
Christopher Barker
Darren Dale
Eric Firing
Gael Varoquaux
josef.pktd＠gmail.com
Pierre GM
Ryan May

Warnings in numpy.ma.test()

Pierre GM

Darren Dale

Darren Dale

Darren Dale

Pierre GM

Darren Dale

Pierre GM

Pierre GM

Darren Dale

Darren Dale

Darren Dale

Pierre GM

Darren Dale

Pierre GM

Bruce Southey

Pierre GM

Bruce Southey

Pierre GM

tags

participants (9)