Question about numpy.ma masking
Hello, I have the following arrays read as masked array. I[10]: basic.data['Air_Temp'].mask O[10]: array([ True, False, False, ..., False, False, False], dtype=bool) [12]: basic.data['Press_Alt'].mask O[12]: False I[13]: len basic.data['Air_Temp'] -----> len(basic.data['Air_Temp']) O[13]: 1758 The first item data['Air_Temp'] has only the first element masked and this result with mask attribute being created an equal data length bool array. On the other hand data['Press_Alt'] has no elements to mask yielding a 'False' scalar. Is this a documented behavior or intentionally designed this way? This is the only case out of 20 that breaks my code as following: :) IndexError Traceback (most recent call last) 130 for k in range(len(shorter)): 131 if (serialh.data['dccnTempSF'][k] != 0) \ --> 132 and (basic.data['Air_Temp'].mask[k+diff] == False): 133 dccnConAmb[k] = serialc.data['dccnConc'][k] * \ 134 physical.data['STATIC_PR'][k+diff] * \ IndexError: invalid index to scalar variable. since mask is a scalar in this case, nothing to loop terminating with an IndexError. -- Gökhan
On May 4, 2010, at 8:38 PM, Gökhan Sever wrote:
Hello,
I have the following arrays read as masked array.
I[10]: basic.data['Air_Temp'].mask O[10]: array([ True, False, False, ..., False, False, False], dtype=bool)
[12]: basic.data['Press_Alt'].mask O[12]: False
I[13]: len basic.data['Air_Temp'] -----> len(basic.data['Air_Temp']) O[13]: 1758
The first item data['Air_Temp'] has only the first element masked and this result with mask attribute being created an equal data length bool array. On the other hand data['Press_Alt'] has no elements to mask yielding a 'False' scalar. Is this a documented behavior or intentionally designed this way? This is the only case out of 20 that breaks my code as following: :)
IndexError Traceback (most recent call last)
130 for k in range(len(shorter)): 131 if (serialh.data['dccnTempSF'][k] != 0) \ --> 132 and (basic.data['Air_Temp'].mask[k+diff] == False): 133 dccnConAmb[k] = serialc.data['dccnConc'][k] * \ 134 physical.data['STATIC_PR'][k+diff] * \
IndexError: invalid index to scalar variable.
since mask is a scalar in this case, nothing to loop terminating with an IndexError.
Gokhan, Sorry for not getting back sooner, web connectivity was limited on my side. I must admit I can't really see what you're tring to do here, but I'll throw some random comments: * If you're using structured MaskedArrays, it's a really bad idea to call one of the fields "data", as it may interact in a non-obvious way with the actual "data" property (the one that outputs a view of the array as a pure ndarray). * if you need to test whether an array has some masked elements, try something like
myarray.mask is nomask If True, no item is masked, the mask is a boolean and you can move on. If False, then the mask is a ndarray w/ as many elements as the array and you can index it.
On Fri, May 7, 2010 at 3:28 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
Hello,
I have the following arrays read as masked array.
I[10]: basic.data['Air_Temp'].mask O[10]: array([ True, False, False, ..., False, False, False], dtype=bool)
[12]: basic.data['Press_Alt'].mask O[12]: False
I[13]: len basic.data['Air_Temp'] -----> len(basic.data['Air_Temp']) O[13]: 1758
The first item data['Air_Temp'] has only the first element masked and
On May 4, 2010, at 8:38 PM, Gökhan Sever wrote: this result with mask attribute being created an equal data length bool array. On the other hand data['Press_Alt'] has no elements to mask yielding a 'False' scalar. Is this a documented behavior or intentionally designed this way? This is the only case out of 20 that breaks my code as following: :)
IndexError Traceback (most recent call
last)
130 for k in range(len(shorter)): 131 if (serialh.data['dccnTempSF'][k] != 0) \ --> 132 and (basic.data['Air_Temp'].mask[k+diff] == False): 133 dccnConAmb[k] = serialc.data['dccnConc'][k] * \ 134 physical.data['STATIC_PR'][k+diff] * \
IndexError: invalid index to scalar variable.
since mask is a scalar in this case, nothing to loop terminating with an
IndexError.
Gokhan, Sorry for not getting back sooner, web connectivity was limited on my side. I must admit I can't really see what you're tring to do here, but I'll throw some random comments: * If you're using structured MaskedArrays, it's a really bad idea to call one of the fields "data", as it may interact in a non-obvious way with the actual "data" property (the one that outputs a view of the array as a pure ndarray).
Hello Pierre, basic.data is a dictionary containing all masked array items. When I read the original data into scripts, my main constructor-reader class automatically converts data to masked arrays. basic.data['Air_Temp'] is a masked array itself, little confusing for sure it also has 'data' attribute. In the above example I check one condition looping in mask value. When mask attribute isn't an bool-array (when there is no missing value in data) the condition fails asserting an IndexError. I was wondering why it doesn't yield a bool-array instead of giving me a scalar False. -- Gökhan
On 05/09/2010 09:01 AM, Gökhan Sever wrote:
On Fri, May 7, 2010 at 3:28 PM, Pierre GM <pgmdevlist@gmail.com <mailto:pgmdevlist@gmail.com>> wrote:
On May 4, 2010, at 8:38 PM, Gökhan Sever wrote: > Hello, > > I have the following arrays read as masked array. > > I[10]: basic.data['Air_Temp'].mask > O[10]: array([ True, False, False, ..., False, False, False], dtype=bool) > > [12]: basic.data['Press_Alt'].mask > O[12]: False > > I[13]: len basic.data['Air_Temp'] > -----> len(basic.data['Air_Temp']) > O[13]: 1758 > > > The first item data['Air_Temp'] has only the first element masked and this result with mask attribute being created an equal data length bool array. On the other hand data['Press_Alt'] has no elements to mask yielding a 'False' scalar. Is this a documented behavior or intentionally designed this way? This is the only case out of 20 that breaks my code as following: :) > > IndexError Traceback (most recent call last) > > 130 for k in range(len(shorter)): > 131 if (serialh.data['dccnTempSF'][k] != 0) \ > --> 132 and (basic.data['Air_Temp'].mask[k+diff] == False): > 133 dccnConAmb[k] = serialc.data['dccnConc'][k] * \ > 134 physical.data['STATIC_PR'][k+diff] * \ > > IndexError: invalid index to scalar variable. > > since mask is a scalar in this case, nothing to loop terminating with an IndexError.
Gokhan, Sorry for not getting back sooner, web connectivity was limited on my side. I must admit I can't really see what you're tring to do here, but I'll throw some random comments: * If you're using structured MaskedArrays, it's a really bad idea to call one of the fields "data", as it may interact in a non-obvious way with the actual "data" property (the one that outputs a view of the array as a pure ndarray).
Hello Pierre,
basic.data is a dictionary containing all masked array items. When I read the original data into scripts, my main constructor-reader class automatically converts data to masked arrays. basic.data['Air_Temp'] is a masked array itself, little confusing for sure it also has 'data' attribute.
In the above example I check one condition looping in mask value. When mask attribute isn't an bool-array (when there is no missing value in data) the condition fails asserting an IndexError. I was wondering why it doesn't yield a bool-array instead of giving me a scalar False.
The mask attribute can be a full array, or it can be a scalar to indicate that nothing is masked. This is an optimization in masked arrays; it adds complexity, but it can save space and/or processing time. You can always access a full mask array by using np.ma.getmaskarray(). Or you can ensure the internal mask is an array, not a scalar, by using the shrink=False kwarg when making the masked array with np.ma.array(). Offhand, I suspect your loop can be eliminated by vectorization. Something like this: ns = len(shorter) slice0 = slice(ns) slice1 = slice(diff, diff+ns) cond1 = serialh.data['dccnTempSF'][slice0] != 0 cond2 = np.ma.getmaskarray(basic.data['Air_Temp'][slice1]) == False cond = cond1 & cond2 dccnConAmb[slice0][cond] = (serialc.data['dccnConc'][slice0][cond] * physical.data['STATIC_PR'][slice1][cond]) Eric
On Sun, May 9, 2010 at 2:42 PM, Eric Firing <efiring@hawaii.edu> wrote:
The mask attribute can be a full array, or it can be a scalar to indicate that nothing is masked. This is an optimization in masked arrays; it adds complexity, but it can save space and/or processing time. You can always access a full mask array by using np.ma.getmaskarray(). Or you can ensure the internal mask is an array, not a scalar, by using the shrink=False kwarg when making the masked array with np.ma.array().
shrink=False fits perfect for my use-case. I was guessing that leaving the mask as scalar should something to do with optimization. Probably not many people around write loops and check conditions based on the mask content like I do :) I hope someone in SciPy10 will present a Numpy.MA talk or tutorial describing all the nitty details of the module usage.
Offhand, I suspect your loop can be eliminated by vectorization. Something like this:
ns = len(shorter) slice0 = slice(ns) slice1 = slice(diff, diff+ns) cond1 = serialh.data['dccnTempSF'][slice0] != 0 cond2 = np.ma.getmaskarray(basic.data['Air_Temp'][slice1]) == False cond = cond1 & cond2 dccnConAmb[slice0][cond] = (serialc.data['dccnConc'][slice0][cond] * physical.data['STATIC_PR'][slice1][cond])
Bonus help :) My gmail has over 400 Python tagged e-mails collected over a year. I get responses here (in mailing lists general) most of the time faster than I get locally around my department. This (especially no-appointments feature) doubles triples my learning experience. Just a personal thanks to you and all who make these great mediums possible. Anyways back to the topic again. The snippet I share is about a year old from the times that I didn't know much about vectorization. Your version looks good to my eyes, but it is little harder to read in general. Also I don't know how would you debug this code. Sometimes I need to pause the execution of scripts and step-by-step move through the lines and see how values are changing in each iteration. Lastly, this dccnConAmb is my CCN concentration normalized at ambient pressure and temperature that I use to estimate C and k parameters from power-law relationship using scipy's curve_fit() in case someone is curious what I am after.
Eric
Gökhan
participants (3)
-
Eric Firing
-
Gökhan Sever
-
Pierre GM