Using Reduce with Multi-dimensional Masked array

I posted the following inquiry to python-list@python.org earlier this week, but got no responses, so I thought I'd try a more focused group. I assume MA module falls under NumPy area. I am using 2 (and more) dimensional masked arrays with some numeric data, and using the reduce functionality on the arrays. I use the masking because some of the values in the arrays are 'missing' and should not be included in the results of the reduction. For example, assume a 5 x 2 array, with masked values for the 4th entry for both of the 2nd dimension cells. If I want to sum along the 2nd dimension, I would expect to get a 'missing' value for the 4th entry because both of the entries for the sum are 'missing'. Instead, I get 0, which might be a valid number in my data space, and the returned 1 dimensional array has no mask associated with it. Is this expected behavior for masked arrays or a bug or am I misusing the mask concept? Does anyone know how to get the reduction to produce a masked value? Example Code:

[dubois@ldorritt ~]$ pydoc MA.sum Python Library Documentation: function sum in MA sum(a, axis=0, fill_value=0) Sum of elements along a certain axis using fill_value for missing. If you use add.reduce, you'll get what you want.
In other words, sum(m, axis, fill_value) = add.reduce(filled(m, fill_value), axis) Surprising in your case. Still, both uses are quite common, so I probably was thinking to myself that since add.reduce already does one of the jobs, I might as well make sum do the other one. One could have just as well argued that one was a synonym for the other and so it is revolting to have them be different. Well, MA users, is this something I should change, or not? -----Original Message----- From: numpy-discussion-admin@lists.sourceforge.net [mailto:numpy-discussion-admin@lists.sourceforge.net] On Behalf Of Sue Giller Sent: Wednesday, November 28, 2001 9:03 AM To: numpy-discussion@lists.sourceforge.net Subject: [Numpy-discussion] Using Reduce with Multi-dimensional Masked array I posted the following inquiry to python-list@python.org earlier this week, but got no responses, so I thought I'd try a more focused group. I assume MA module falls under NumPy area. I am using 2 (and more) dimensional masked arrays with some numeric data, and using the reduce functionality on the arrays. I use the masking because some of the values in the arrays are 'missing' and should not be included in the results of the reduction. For example, assume a 5 x 2 array, with masked values for the 4th entry for both of the 2nd dimension cells. If I want to sum along the 2nd dimension, I would expect to get a 'missing' value for the 4th entry because both of the entries for the sum are 'missing'. Instead, I get 0, which might be a valid number in my data space, and the returned 1 dimensional array has no mask associated with it. Is this expected behavior for masked arrays or a bug or am I misusing the mask concept? Does anyone know how to get the reduction to produce a masked value? Example Code:
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion

My answer is yes: the difference between the two behaviors could be confusing for the user. If I can dare to express a "general rule", I would say that the masks in MA arrays should not disappear if not EXPLICITLY required to do so! Of course you can interpret a provided value for the fill_value parameter in the sum function as such a request... but if value is not provided, than I would say that the correct approach would be to keep the mask on (after all, what special about the value 0? For instance, if you have to take logarithm in the next step of the calculation, it is a rather bad choice!) Giulio. "Paul F. Dubois" wrote:

Thanks for the pointer. The example I gave using the sum operation is merely an example - I could also be doing other manipulations such as min, max, average, etc. I see that the MA.<op>.reduce functions will do what I want, but to do an average, I will need to do two steps since the MA.average function will have the original 'unexpected' behavior that I don't want. That raises the question of how to determine a count of valid values in a masked array. Can I assume that I can do 'math' on the mask array itself, for example to sum along a given axis and have the masked cells add up? In my original example, I would expect a sum along the second axis to return [0,0,0,2,0]. Can I rely on this? I would suggest that a .count operator would be very useful in working with masked arrays (count valid and count masked).
To add an opinion on the question from Paul about 'expected' behavior, I was working off the documentation for Numerical Python, and there were no caveats in there about MA.<op> working one way, and MA.<op>.reduce working another. The answer is always in the documentation, especially for users like me who don't have time or knkowledge to go reading thru all the code modules to try and figure out what is happening. From a purely user standpoint, I would expect a masked array to retain it's mask-edness at all times, unless I explicitly tell it not to. In that case, I would still expect it to replace the 'masked' cells with the original masked value, and not just arbitrarily assign some other value, such as 0. Thanks again for the prompt reply.

Actually masked arrays already have a count method that does what you want: Python 2.2b2 (#26, Nov 16 2001, 11:44:11) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
count(self, axis=None) method of MA.MA.MaskedArray instance Count of the non-masked elements in a, or along a certain axis.
x.count() 10

You have misread my reply. It is not true that MA.op works one way and MA.op.reduce is different. sum and add.reduce are different, and the documentation for sum DOES say the right thing for sum. The function sum is a special case in that its native meaning was the same as add.reduce and so the function is redundant. I believe you are in error wrt average; average works the way you want. Function count can tell you the number of non-masked values either in the whole array or axis-wise if you give an axis argument. Function size gives you the total number, so #invalid is size(x)-count(x). maximum and minimum (don't use max and min, they are built-ins that don't know about Numeric) have two forms. When called with one argument they return the overall max or min of the whole array, returning masked only if all entries are masked. For two arguments, you get element-wise extrema, and the mask is on where any one of the arguments was masked.
Thanks for the pointer. The example I gave using the sum operation is merely an example - I could also be doing other manipulations such as min, max, average, etc. I see that the MA.<op>.reduce functions will do what I want, but to do an average, I will need to do two steps since the MA.average function will have the original 'unexpected' behavior that I don't want. That raises the question of how to determine a count of valid values in a masked array. Can I assume that I can do 'math' on the mask array itself, for example to sum along a given axis and have the masked cells add up? In my original example, I would expect a sum along the second axis to return [0,0,0,2,0]. Can I rely on this? I would suggest that a .count operator would be very useful in working with masked arrays (count valid and count masked).
To add an opinion on the question from Paul about 'expected' behavior, I was working off the documentation for Numerical Python, and there were no caveats in there about MA.<op> working one way, and MA.<op>.reduce working another. The answer is always in the documentation, especially for users like me who don't have time or knkowledge to go reading thru all the code modules to try and figure out what is happening. From a purely user standpoint, I would expect a masked array to retain it's mask-edness at all times, unless I explicitly tell it not to. In that case, I would still expect it to replace the 'masked' cells with the original masked value, and not just arbitrarily assign some other value, such as 0. Thanks again for the prompt reply. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion

Paul, Well, you're right. I did misunderstand your reply, as well as what the various functions were supposed to do. I was mis-using the sum, minimum, maximum as tho they were MA.<op>.reduce, and my test case didn't point out the difference. I should always have been doing the .reduce version. I apologize for this! I found a section on page 45 of the Numerical Python text (PDF form, July 13, 2001) that defines sum as 'The sum function is a synonym for the reduce method of the add ufunc. It returns the sum of all the elements in the sequence given along the specified axis (first axis by default).' This is where I would expect to see a caveat about it not retaining any mask-edness. I was misussing the MA.minimum and MA.maximum as tho they were .reduce version. My bad. The MA.average does produce a masked array, but it has changed the 'missing value' to fill_value=[ 1.00000002e+020,]). I do find this a bit odd, since the other reductions didn't change the fill value. Anyway, I can now get the stats I want in a format I want, and I understand better the various functions for array/masked array. Thanks for the comments/input. sue

[dubois@ldorritt ~]$ pydoc MA.sum Python Library Documentation: function sum in MA sum(a, axis=0, fill_value=0) Sum of elements along a certain axis using fill_value for missing. If you use add.reduce, you'll get what you want.
In other words, sum(m, axis, fill_value) = add.reduce(filled(m, fill_value), axis) Surprising in your case. Still, both uses are quite common, so I probably was thinking to myself that since add.reduce already does one of the jobs, I might as well make sum do the other one. One could have just as well argued that one was a synonym for the other and so it is revolting to have them be different. Well, MA users, is this something I should change, or not? -----Original Message----- From: numpy-discussion-admin@lists.sourceforge.net [mailto:numpy-discussion-admin@lists.sourceforge.net] On Behalf Of Sue Giller Sent: Wednesday, November 28, 2001 9:03 AM To: numpy-discussion@lists.sourceforge.net Subject: [Numpy-discussion] Using Reduce with Multi-dimensional Masked array I posted the following inquiry to python-list@python.org earlier this week, but got no responses, so I thought I'd try a more focused group. I assume MA module falls under NumPy area. I am using 2 (and more) dimensional masked arrays with some numeric data, and using the reduce functionality on the arrays. I use the masking because some of the values in the arrays are 'missing' and should not be included in the results of the reduction. For example, assume a 5 x 2 array, with masked values for the 4th entry for both of the 2nd dimension cells. If I want to sum along the 2nd dimension, I would expect to get a 'missing' value for the 4th entry because both of the entries for the sum are 'missing'. Instead, I get 0, which might be a valid number in my data space, and the returned 1 dimensional array has no mask associated with it. Is this expected behavior for masked arrays or a bug or am I misusing the mask concept? Does anyone know how to get the reduction to produce a masked value? Example Code:
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion

My answer is yes: the difference between the two behaviors could be confusing for the user. If I can dare to express a "general rule", I would say that the masks in MA arrays should not disappear if not EXPLICITLY required to do so! Of course you can interpret a provided value for the fill_value parameter in the sum function as such a request... but if value is not provided, than I would say that the correct approach would be to keep the mask on (after all, what special about the value 0? For instance, if you have to take logarithm in the next step of the calculation, it is a rather bad choice!) Giulio. "Paul F. Dubois" wrote:

Thanks for the pointer. The example I gave using the sum operation is merely an example - I could also be doing other manipulations such as min, max, average, etc. I see that the MA.<op>.reduce functions will do what I want, but to do an average, I will need to do two steps since the MA.average function will have the original 'unexpected' behavior that I don't want. That raises the question of how to determine a count of valid values in a masked array. Can I assume that I can do 'math' on the mask array itself, for example to sum along a given axis and have the masked cells add up? In my original example, I would expect a sum along the second axis to return [0,0,0,2,0]. Can I rely on this? I would suggest that a .count operator would be very useful in working with masked arrays (count valid and count masked).
To add an opinion on the question from Paul about 'expected' behavior, I was working off the documentation for Numerical Python, and there were no caveats in there about MA.<op> working one way, and MA.<op>.reduce working another. The answer is always in the documentation, especially for users like me who don't have time or knkowledge to go reading thru all the code modules to try and figure out what is happening. From a purely user standpoint, I would expect a masked array to retain it's mask-edness at all times, unless I explicitly tell it not to. In that case, I would still expect it to replace the 'masked' cells with the original masked value, and not just arbitrarily assign some other value, such as 0. Thanks again for the prompt reply.

Actually masked arrays already have a count method that does what you want: Python 2.2b2 (#26, Nov 16 2001, 11:44:11) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
count(self, axis=None) method of MA.MA.MaskedArray instance Count of the non-masked elements in a, or along a certain axis.
x.count() 10

You have misread my reply. It is not true that MA.op works one way and MA.op.reduce is different. sum and add.reduce are different, and the documentation for sum DOES say the right thing for sum. The function sum is a special case in that its native meaning was the same as add.reduce and so the function is redundant. I believe you are in error wrt average; average works the way you want. Function count can tell you the number of non-masked values either in the whole array or axis-wise if you give an axis argument. Function size gives you the total number, so #invalid is size(x)-count(x). maximum and minimum (don't use max and min, they are built-ins that don't know about Numeric) have two forms. When called with one argument they return the overall max or min of the whole array, returning masked only if all entries are masked. For two arguments, you get element-wise extrema, and the mask is on where any one of the arguments was masked.
Thanks for the pointer. The example I gave using the sum operation is merely an example - I could also be doing other manipulations such as min, max, average, etc. I see that the MA.<op>.reduce functions will do what I want, but to do an average, I will need to do two steps since the MA.average function will have the original 'unexpected' behavior that I don't want. That raises the question of how to determine a count of valid values in a masked array. Can I assume that I can do 'math' on the mask array itself, for example to sum along a given axis and have the masked cells add up? In my original example, I would expect a sum along the second axis to return [0,0,0,2,0]. Can I rely on this? I would suggest that a .count operator would be very useful in working with masked arrays (count valid and count masked).
To add an opinion on the question from Paul about 'expected' behavior, I was working off the documentation for Numerical Python, and there were no caveats in there about MA.<op> working one way, and MA.<op>.reduce working another. The answer is always in the documentation, especially for users like me who don't have time or knkowledge to go reading thru all the code modules to try and figure out what is happening. From a purely user standpoint, I would expect a masked array to retain it's mask-edness at all times, unless I explicitly tell it not to. In that case, I would still expect it to replace the 'masked' cells with the original masked value, and not just arbitrarily assign some other value, such as 0. Thanks again for the prompt reply. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion

Paul, Well, you're right. I did misunderstand your reply, as well as what the various functions were supposed to do. I was mis-using the sum, minimum, maximum as tho they were MA.<op>.reduce, and my test case didn't point out the difference. I should always have been doing the .reduce version. I apologize for this! I found a section on page 45 of the Numerical Python text (PDF form, July 13, 2001) that defines sum as 'The sum function is a synonym for the reduce method of the add ufunc. It returns the sum of all the elements in the sequence given along the specified axis (first axis by default).' This is where I would expect to see a caveat about it not retaining any mask-edness. I was misussing the MA.minimum and MA.maximum as tho they were .reduce version. My bad. The MA.average does produce a masked array, but it has changed the 'missing value' to fill_value=[ 1.00000002e+020,]). I do find this a bit odd, since the other reductions didn't change the fill value. Anyway, I can now get the stats I want in a format I want, and I understand better the various functions for array/masked array. Thanks for the comments/input. sue
participants (4)
-
Giulio Bottazzi
-
Paul F. Dubois
-
Reggie Dugard
-
sag@hydrosphere.com