Are masked arrays slower for processing than ndarrays?
Hi,
I'm using masked arrays to compute largescale standard deviation, multiplication, gaussian, and weighted averages. At first I thought using the masked arrays would be a great way to sidestep looping (which it is), but it's still slower than expected. Here's a snippet of the code that I'm using it for.
# Computing nearest neighbor distances. # Output will be about 270,000 rows long for the index # and 270,000x50 for the dist array. tree = ann.kd_tree(np.column_stack([l,b])) index, dist = tree.search(np.column_stack([l,b]),k=nth)
# Clipping bad values by replacing them acceptable values av[np.where(av<=10)] = 10 av[np.where(av>=50)] = 50
# Distance clipping and creating mask dist_arcsec = np.sqrt(dist)*3600 mask = dist_arcsec <= d_thresh
# Creating masked array av_good = ma.array(av[index],mask=mask) dist_good = ma.array(dist_arcsec,mask=mask)
# Reason why I'm using masked arrays. If these were # ndarrays with nan's, then the output would be nan. Std = np.array(np.std(av_good,axis=1)) Var = Std*Std
Rho = np.zeros( (len(av), nth) ) Rho2 = np.zeros( (len(av), nth) )
dist_std = np.std(dist_good,axis=1)
for j in range(nth): Rho[:,j] = dist_std Rho2[:,j] = Var
# This part takes about 20 seconds to compute for a 270,000x50 masked array. # Using ndarrays of the same size takes about 2 second spatial_weight = 1.0 / (Rho*np.sqrt(2*np.pi)) * np.exp(  dist_good / (2*Rho**2))
# Like the spatial_weight section, this takes about 20 seconds W = spatial_weight / Rho2
# Takes less than one second. Ave = np.average(av_good,axis=1,weights=W)
Any ideas on why it would take such a long time for processing? Especially the spatial_weight and W variables? Would there be a faster way to do this? Or is there a way that numpy.std can process ignore nan's when processing?
Thanks,
Eli Bressert
Eli Bressert wrote:
Hi,
I'm using masked arrays to compute largescale standard deviation, multiplication, gaussian, and weighted averages. At first I thought using the masked arrays would be a great way to sidestep looping (which it is), but it's still slower than expected. Here's a snippet of the code that I'm using it for.
# Computing nearest neighbor distances. # Output will be about 270,000 rows long for the index # and 270,000x50 for the dist array. tree = ann.kd_tree(np.column_stack([l,b])) index, dist = tree.search(np.column_stack([l,b]),k=nth)
# Clipping bad values by replacing them acceptable values av[np.where(av<=10)] = 10 av[np.where(av>=50)] = 50
# Distance clipping and creating mask dist_arcsec = np.sqrt(dist)*3600 mask = dist_arcsec <= d_thresh
# Creating masked array av_good = ma.array(av[index],mask=mask) dist_good = ma.array(dist_arcsec,mask=mask)
# Reason why I'm using masked arrays. If these were # ndarrays with nan's, then the output would be nan. Std = np.array(np.std(av_good,axis=1)) Var = Std*Std
Rho = np.zeros( (len(av), nth) ) Rho2 = np.zeros( (len(av), nth) )
dist_std = np.std(dist_good,axis=1)
for j in range(nth): Rho[:,j] = dist_std Rho2[:,j] = Var
# This part takes about 20 seconds to compute for a 270,000x50 masked array. # Using ndarrays of the same size takes about 2 second spatial_weight = 1.0 / (Rho*np.sqrt(2*np.pi)) * np.exp(  dist_good / (2*Rho**2))
# Like the spatial_weight section, this takes about 20 seconds W = spatial_weight / Rho2
The short answer to your subject line is "yes". A simple illustration of division:
In [11]:x = np.ones((270000,50), float)
In [12]:y = np.ones((270000,50), float)
In [13]:timeit x/y 10 loops, best of 3: 199 ms per loop
In [14]:x = np.ma.ones((270000,50), float)
In [15]:y = np.ma.ones((270000,50), float)
In [16]:x[1,1] = np.ma.masked
In [17]:y[1,2] = np.ma.masked
In [18]:timeit x/y 10 loops, best of 3: 2.45 s per loop
So it is slower by more than a factor of 10. That's much worse than I expected for division (and multiplication is similar). It makes me suspect there is might be a simple way to improve it greatly, but I haven't looked.
# Takes less than one second. Ave = np.average(av_good,axis=1,weights=W)
Any ideas on why it would take such a long time for processing? Especially the spatial_weight and W variables? Would there be a faster way to do this? Or is there a way that numpy.std can process ignore nan's when processing?
There is a numpy.nansum; and see the following thread: http://www.mailarchive.com/numpydiscussion@scipy.org/msg09407.html
Eric
Thanks,
Eli Bressert _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
Eli Bressert wrote:
Hi,
I'm using masked arrays to compute largescale standard deviation, multiplication, gaussian, and weighted averages. At first I thought using the masked arrays would be a great way to sidestep looping (which it is), but it's still slower than expected. Here's a snippet of the code that I'm using it for.
[...]
# Like the spatial_weight section, this takes about 20 seconds W = spatial_weight / Rho2
# Takes less than one second. Ave = np.average(av_good,axis=1,weights=W)
Any ideas on why it would take such a long time for processing?
A part of the slowdown is what looks to me like unnecessary copying in _MaskedBinaryOperation.__call__. It is using getdata, which applies numpy.array to its input, forcing a copy. I think the copy is actually unintentional, in at least one sense, and possibly two: first, because the default argument of getattr is always evaluated, even if it is not needed; and second, because the call to np.array is used where np.asarray or equivalent would suffice.
The first file attached below shows the kernprof in the case of multiplying two masked arrays, shape (100000,50), with no masked elements; 2/3 of the time is taken copying the data.
Now, if there are actually masked elements in the arrays, it gets much worse: see the second attachment. The total time has increased by more than a factor of 3, and the culprit is numpy.which(), a very slow function. It looks to me like it is doing nothing useful at all; the numpy binary operation is still being executed for all elements, regardless of mask, contrary to the intention implied by the comment in the code.
The third attached file has a patch that fixes the getdata problem and eliminates the which(). With this patch applied we get the profile in the 4th file, to be compared to the second profile. Much better. I am pretty sure it could still be sped up quite a bit, though. It looks like the masks are essentially being calculated twice for no good reason, but I don't completely understand all the mask considerations, so at this point I am not trying to fix that problem.
Eric
Especially the spatial_weight and W variables? Would there be a faster way to do this? Or is there a way that numpy.std can process ignore nan's when processing?
Thanks,
Eli Bressert _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
efiring@manini:~/test$ kernprof.py lv profile_ma_mult.py Wrote profile results to profile_ma_mult.py.lprof Timer unit: 1e06 s
File: /usr/local/lib/python2.6/distpackages/numpy/ma/core.py Function: __call__ at line 689 Total time: 0.189224 s
Line # Hits Time Per Hit % Time Line Contents ============================================================== 689 @profile 690 def __call__ (self, a, b, *args, **kwargs): 691 "Execute the call behavior." 692 1 52 52.0 0.0 m = mask_or(getmask(a), getmask(b), shrink=False) 693 1 122614 122614.0 64.8 (da, db) = (getdata(a), getdata(b)) 694 # Easy case: there's no mask... 695 1 7 7.0 0.0 if m is nomask: 696 1 66282 66282.0 35.0 result = self.f(da, db, *args, **kwargs) 697 # There are some masked elements: run only on the unmasked 698 else: 699 result = np.where(m, da, self.f(da, db, *args, **kwargs)) 700 # Transforms to a (subclass of) MaskedArray if we don't have a scalar 701 1 10 10.0 0.0 if result.shape: 702 1 124 124.0 0.1 result = result.view(get_masked_subclass(a, b)) 703 # If we have a mask, make sure it's broadcasted properly 704 1 57 57.0 0.0 if m.any(): 705 result._mask = mask_or(getmaskarray(a), getmaskarray(b)) 706 # If some initial masks where not shrunk, don't shrink the result 707 1 4 4.0 0.0 elif m.shape: 708 result._mask = make_mask_none(result.shape, result.dtype) 709 1 4 4.0 0.0 if isinstance(a, MaskedArray): 710 1 33 33.0 0.0 result._update_from(a) 711 1 4 4.0 0.0 if isinstance(b, MaskedArray): 712 1 30 30.0 0.0 result._update_from(b) 713 # ... or return masked if we have a scalar and the common mask is True 714 elif m: 715 return masked 716 1 3 3.0 0.0 return result
efiring@manini:~/test$ kernprof.py lv profile_ma_mult.py Wrote profile results to profile_ma_mult.py.lprof Timer unit: 1e06 s
File: /usr/local/lib/python2.6/distpackages/numpy/ma/core.py Function: __call__ at line 689 Total time: 0.649928 s
Line # Hits Time Per Hit % Time Line Contents ============================================================== 689 @profile 690 def __call__ (self, a, b, *args, **kwargs): 691 "Execute the call behavior." 692 1 30012 30012.0 4.6 m = mask_or(getmask(a), getmask(b), shrink=False) 693 1 120784 120784.0 18.6 (da, db) = (getdata(a), getdata(b)) 694 # Easy case: there's no mask... 695 1 7 7.0 0.0 if m is nomask: 696 result = self.f(da, db, *args, **kwargs) 697 # There are some masked elements: run only on the unmasked 698 else: 699 1 418399 418399.0 64.4 result = np.where(m, da, self.f(da, db, *args, **kwargs)) 700 # Transforms to a (subclass of) MaskedArray if we don't have a scalar 701 1 10 10.0 0.0 if result.shape: 702 1 111 111.0 0.0 result = result.view(get_masked_subclass(a, b)) 703 # If we have a mask, make sure it's broadcasted properly 704 1 25419 25419.0 3.9 if m.any(): 705 1 55096 55096.0 8.5 result._mask = mask_or(getmaskarray(a), getmaskarray(b)) 706 # If some initial masks where not shrunk, don't shrink the result 707 elif m.shape: 708 result._mask = make_mask_none(result.shape, result.dtype) 709 1 7 7.0 0.0 if isinstance(a, MaskedArray): 710 1 46 46.0 0.0 result._update_from(a) 711 1 4 4.0 0.0 if isinstance(b, MaskedArray): 712 1 30 30.0 0.0 result._update_from(b) 713 # ... or return masked if we have a scalar and the common mask is True 714 elif m: 715 return masked 716 1 3 3.0 0.0 return result
Wrote profile results to profile_ma_mult.py.lprof Timer unit: 1e06 s
File: /usr/local/lib/python2.6/distpackages/numpy/ma/core.py Function: __call__ at line 692 Total time: 0.176606 s
Line # Hits Time Per Hit % Time Line Contents ============================================================== 692 @profile 693 def __call__ (self, a, b, *args, **kwargs): 694 "Execute the call behavior." 695 1 30007 30007.0 17.0 m = mask_or(getmask(a), getmask(b), shrink=False) 696 1 34 34.0 0.0 (da, db) = (getdata(a), getdata(b)) 697 1 65855 65855.0 37.3 result = self.f(da, db, *args, **kwargs) 698 # Transforms to a (subclass of) MaskedArray if we don't have a scalar 699 1 11 11.0 0.0 if result.shape: 700 1 136 136.0 0.1 result = result.view(get_masked_subclass(a, b)) 701 # If we have a mask, make sure it's broadcasted properly 702 1 25425 25425.0 14.4 if m.any(): 703 1 55048 55048.0 31.2 result._mask = mask_or(getmaskarray(a), getmaskarray(b)) 704 # If some initial masks where not shrunk, don't shrink the result 705 elif m.shape: 706 result._mask = make_mask_none(result.shape, result.dtype) 707 1 7 7.0 0.0 if isinstance(a, MaskedArray): 708 1 45 45.0 0.0 result._update_from(a) 709 1 4 4.0 0.0 if isinstance(b, MaskedArray): 710 1 31 31.0 0.0 result._update_from(b) 711 # ... or return masked if we have a scalar and the common mask is True 712 elif m: 713 return masked 714 1 3 3.0 0.0 return result
Eric Firing wrote:
Pierre,
... I pressed "send" too soon. There are test failures with the patch I attached to my last message. I think the basic ideas are correct, but evidently there are wrinkles to be worked out. Maybe putmask() has to be used instead of where() (putmask is much faster) to maintain the ability to do *= and similar, and maybe there are other adjustments. Somehow, though, it should be possible to get decent speed for simple multiplication and division; a 10x penalty relative to ndarray operations is just too much.
Eric
Eli Bressert wrote:
Hi,
I'm using masked arrays to compute largescale standard deviation, multiplication, gaussian, and weighted averages. At first I thought using the masked arrays would be a great way to sidestep looping (which it is), but it's still slower than expected. Here's a snippet of the code that I'm using it for.
[...]
# Like the spatial_weight section, this takes about 20 seconds W = spatial_weight / Rho2
# Takes less than one second. Ave = np.average(av_good,axis=1,weights=W)
Any ideas on why it would take such a long time for processing?
A part of the slowdown is what looks to me like unnecessary copying in _MaskedBinaryOperation.__call__. It is using getdata, which applies numpy.array to its input, forcing a copy. I think the copy is actually unintentional, in at least one sense, and possibly two: first, because the default argument of getattr is always evaluated, even if it is not needed; and second, because the call to np.array is used where np.asarray or equivalent would suffice.
The first file attached below shows the kernprof in the case of multiplying two masked arrays, shape (100000,50), with no masked elements; 2/3 of the time is taken copying the data.
Now, if there are actually masked elements in the arrays, it gets much worse: see the second attachment. The total time has increased by more than a factor of 3, and the culprit is numpy.which(), a very slow function. It looks to me like it is doing nothing useful at all; the numpy binary operation is still being executed for all elements, regardless of mask, contrary to the intention implied by the comment in the code.
The third attached file has a patch that fixes the getdata problem and eliminates the which(). With this patch applied we get the profile in the 4th file, to be compared to the second profile. Much better. I am pretty sure it could still be sped up quite a bit, though. It looks like the masks are essentially being calculated twice for no good reason, but I don't completely understand all the mask considerations, so at this point I am not trying to fix that problem.
Eric
Especially the spatial_weight and W variables? Would there be a faster way to do this? Or is there a way that numpy.std can process ignore nan's when processing?
Thanks,
Eli Bressert _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
Numpydiscussion mailing list Numpydiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
On May 9, 2009, at 8:17 PM, Eric Firing wrote:
Eric Firing wrote:
A part of the slowdown is what looks to me like unnecessary copying in _MaskedBinaryOperation.__call__. It is using getdata, which applies numpy.array to its input, forcing a copy. I think the copy is actually unintentional, in at least one sense, and possibly two: first, because the default argument of getattr is always evaluated, even if it is not needed; and second, because the call to np.array is used where np.asarray or equivalent would suffice.
Yep, good call. the try/except should be better, and yes, I forgot to force copy=False (thought it was on by default...). I didn't know that getattr always evaluated the default, the docs are scarce on that subject...
Pierre,
... I pressed "send" too soon. There are test failures with the patch I attached to my last message. I think the basic ideas are correct, but evidently there are wrinkles to be worked out. Maybe putmask() has to be used instead of where() (putmask is much faster) to maintain the ability to do *= and similar, and maybe there are other adjustments. Somehow, though, it should be possible to get decent speed for simple multiplication and division; a 10x penalty relative to ndarray operations is just too much.
Quite agreed. It was a shock to realize that we were that slow. I gonna have to start testing w/ large arrays...
I'm confident we can significantly speed up the _MaskedOperations without losing any of the features. Yes, putmask may be a better option. We could probably use the following MO: * result = a.data/b.data * putmask(result, m, a)
However, I gonna need a good couple of weeks before being able to really look into it...
All, I just committed (r6994) some modifications to numpy.ma.getdata (Eric Firing's patch) and to the ufunc wrappers that were too slow with large arrays. We're roughly 3 times faster than we used to, but still slower than the equivalent classic ufuncs (no surprise here).
Here's the catch: it's basically cheating. I got rid of the pre processing (where a mask was calculated depending on the domain and the input set to a filling value depending on this mask, before the actual computation). Instead, I force np.seterr(divide='ignore',invalid='ignore') before calling the ufunc on the .data part, then mask the invalid values (if any) and reset the corresponding entries in .data to the input. Finally, I reset the error status. All in all, we're still datafriendly, meaning that the value below a masked entry is the same as the input, but we can't say that values initially masked are discarded (they're used in the computation but reset to their initial value)...
This playing around with the error status may (or may not, I don't know) cause some problems down the road. It's still faaar faster than computing the domain (especially _DomainSafeDivide) when the inputs are large... I'd be happy if you could give it a try and send some feedback.
Cheers P.
On May 9, 2009, at 8:17 PM, Eric Firing wrote:
Eric Firing wrote:
Pierre,
... I pressed "send" too soon. There are test failures with the patch I attached to my last message. I think the basic ideas are correct, but evidently there are wrinkles to be worked out. Maybe putmask() has to be used instead of where() (putmask is much faster) to maintain the ability to do *= and similar, and maybe there are other adjustments. Somehow, though, it should be possible to get decent speed for simple multiplication and division; a 10x penalty relative to ndarray operations is just too much.
Eric
Eli Bressert wrote:
Hi,
I'm using masked arrays to compute largescale standard deviation, multiplication, gaussian, and weighted averages. At first I thought using the masked arrays would be a great way to sidestep looping (which it is), but it's still slower than expected. Here's a snippet of the code that I'm using it for.
[...]
# Like the spatial_weight section, this takes about 20 seconds W = spatial_weight / Rho2
# Takes less than one second. Ave = np.average(av_good,axis=1,weights=W)
Any ideas on why it would take such a long time for processing?
A part of the slowdown is what looks to me like unnecessary copying in _MaskedBinaryOperation.__call__. It is using getdata, which applies numpy.array to its input, forcing a copy. I think the copy is actually unintentional, in at least one sense, and possibly two: first, because the default argument of getattr is always evaluated, even if it is not needed; and second, because the call to np.array is used where np.asarray or equivalent would suffice. The first file attached below shows the kernprof in the case of multiplying two masked arrays, shape (100000,50), with no masked elements; 2/3 of the time is taken copying the data. Now, if there are actually masked elements in the arrays, it gets much worse: see the second attachment. The total time has increased by more than a factor of 3, and the culprit is numpy.which(), a very slow function. It looks to me like it is doing nothing useful at all; the numpy binary operation is still being executed for all elements, regardless of mask, contrary to the intention implied by the comment in the code. The third attached file has a patch that fixes the getdata problem and eliminates the which(). With this patch applied we get the profile in the 4th file, to be compared to the second profile. Much better. I am pretty sure it could still be sped up quite a bit, though. It looks like the masks are essentially being calculated twice for no good reason, but I don't completely understand all the mask considerations, so at this point I am not trying to fix that problem. Eric
Especially the spatial_weight and W variables? Would there be a faster way to do this? Or is there a way that numpy.std can process ignore nan's when processing?
Thanks,
Eli Bressert _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
Numpydiscussion mailing list Numpydiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
Hi Pierre
2009/5/14 Pierre GM pgmdevlist@gmail.com:
This playing around with the error status may (or may not, I don't know) cause some problems down the road.
I see the buildbot is complaining on SPARC. Not sure if it is complaining about your commit, but might be worth checking out nontheless.
Cheers Stéfan
Hi Pierre,
Here's the catch: it's basically cheating. I got rid of the pre processing (where a mask was calculated depending on the domain and the input set to a filling value depending on this mask, before the actual computation). Instead, I force np.seterr(divide='ignore',invalid='ignore') before calling the ufunc
This isn't a thread safe approach and could cause wierd side effects in a multithreaded application. I think modifying global options/variables inside any function where it generally wouldn't be expected by the user is a bad idea.
 Matt
On May 13, 2009, at 7:36 PM, Matt Knox wrote:
Here's the catch: it's basically cheating. I got rid of the pre processing (where a mask was calculated depending on the domain and the input set to a filling value depending on this mask, before the actual computation). Instead, I force np.seterr(divide='ignore',invalid='ignore') before calling the ufunc
This isn't a thread safe approach and could cause wierd side effects in a multithreaded application. I think modifying global options/ variables inside any function where it generally wouldn't be expected by the user is a bad idea.
Whine. I was afraid of something like that... 2 options, then: * We revert to computing a mask beforehand. That looks like the part that takes the most time w/ domained operations (according to Robert K's profiler. Robert, you deserve a statue for this tool). And that doesn't solve the pb of power, anyway: how do you compute the domain of power ? * We reimplement masked versions of the ufuncs in C. Won't happen from me anytime soon (this fall or winter, maybe...) Also, importing numpy.ma currently calls numpy.seterr(all='ignore') anyway...
So that's a 1 from Matt. Anybody else ?
Hi,
Whine. I was afraid of something like that... 2 options, then:
 We revert to computing a mask beforehand. That looks like the part
that takes the most time w/ domained operations (according to Robert K's profiler. Robert, you deserve a statue for this tool). And that doesn't solve the pb of power, anyway: how do you compute the domain of power ?
 We reimplement masked versions of the ufuncs in C. Won't happen from
me anytime soon (this fall or winter, maybe...) Also, importing numpy.ma currently calls numpy.seterr(all='ignore') anyway...
I'm afraid I don't know the code at all, so count this as seems good, but I had the feeling that the change is good for speed but possibly bad for stability / readability?
In that case it seems right not to do that, and wait until someone needs speed enough to write it in C or similar...
Best,
Matthew
Pierre GM wrote:
On May 13, 2009, at 7:36 PM, Matt Knox wrote:
Here's the catch: it's basically cheating. I got rid of the pre processing (where a mask was calculated depending on the domain and the input set to a filling value depending on this mask, before the actual computation). Instead, I force np.seterr(divide='ignore',invalid='ignore') before calling the ufunc
This isn't a thread safe approach and could cause wierd side effects in a multithreaded application. I think modifying global options/ variables inside any function where it generally wouldn't be expected by the user is a bad idea.
Whine. I was afraid of something like that... 2 options, then:
 We revert to computing a mask beforehand. That looks like the part
that takes the most time w/ domained operations (according to Robert K's profiler. Robert, you deserve a statue for this tool). And that doesn't solve the pb of power, anyway: how do you compute the domain of power ?
 We reimplement masked versions of the ufuncs in C. Won't happen from
me anytime soon (this fall or winter, maybe...)
Pierre,
I have implemented masked versions of all binary ufuncs in C, using slight modifications of the numpy code generation machinery. I suspect that the way I have done it will not be the final method, and as of this moment I have just gotten it compiled and minimally checked (numpy imports, multiply_m(x, y, mask, out) puts x*y in out only where mask is False), but it is enough to make me think that we should be able to make it work in numpy.ma.
In the present implementation, the masked versions of the ufuncs take a single mask, and they live in the same namespace as the unmasked versions. Masked versions of the unary ufuncs need to be added. Binary versions taking two masks and returning the resulting mask can also be added, but with considerably more effort, so I view that as something to be done only after all the wrinkles are worked out with the singlemask implementation.
I view these masked versions of ufuncs as perfectly good standalone entities, which will enable a huge speedup in numpy.ma, but which may also be useful independently of masked arrays.
I have made no attempt at this point to address domain checking, but certainly this needs to be moved into the C stage also, with separate ufuncs while we have only the singlemask binary ufuncs, but directly into the doublemask binary ufuncs whenever those are implemented.
Example:
In [1]:import numpy as np
In [2]:x = np.arange(3)
In [3]:y = np.arange(3) + 2
In [4]:x Out[4]:array([0, 1, 2])
In [5]:y Out[5]:array([2, 3, 4])
In [6]:mask = np.array([False, True, False])
In [7]:np.multiply_m(x, y, mask, x) Out[7]:array([0, 1, 8])
In [8]:x = np.arange(1000000, dtype=float)
In [9]:y = np.sin(x)
In [10]:mask = y > 0
In [11]:z = np.zeros_like(x)
In [12]:timeit np.multiply(x,y,z) 100 loops, best of 3: 10.5 ms per loop
In [13]:timeit np.multiply_m(x,y,mask,z) 100 loops, best of 3: 12 ms per loop
Eric
On Wed, May 13, 2009 at 18:36, Matt Knox mattknox.ca@gmail.com wrote:
Hi Pierre,
Here's the catch: it's basically cheating. I got rid of the pre processing (where a mask was calculated depending on the domain and the input set to a filling value depending on this mask, before the actual computation). Instead, I force np.seterr(divide='ignore',invalid='ignore') before calling the ufunc
This isn't a thread safe approach and could cause wierd side effects in a multithreaded application. I think modifying global options/variables inside any function where it generally wouldn't be expected by the user is a bad idea.
seterr() uses threadlocal storage.
Robert Kern <robert.kern <at> gmail.com> writes:
seterr() uses threadlocal storage.
Oh. I stand corrected. Ignore my earlier objections then.
Pierre GM <pgmdevlist <at> gmail.com> writes:
Also, importing numpy.ma currently calls numpy.seterr(all='ignore') anyway...
hmm. While this doesn't affect me personally... I wonder if everyone is aware of this. Importing modules generally shouldn't have side effects either I would think. Has this always been the case for the masked array module?
 Matt
On May 13, 2009, at 8:07 PM, Matt Knox wrote:
hmm. While this doesn't affect me personally... I wonder if everyone is aware of this. Importing modules generally shouldn't have side effects either I would think. Has this always been the case for the masked array module?
Well, can't remember, actually... I was indeed surprised to see it was there. I guess I must have added when working on the power section. I will get of rid on the next commit, this is clearly bad practice from my part. Bad, bad Pierre.
Short answer to the subject: Oh yes. Basically, MaskedArrays in its current implementation is more of a convenience class than anything. Most of the functions manipulating masked arrays create a lot of temporaries. When performance is needed, I must advise you to work directly on the data and the mask.
For example, let's examine the division of 2 MaskedArrays a & b. * We take the 2 ndarrays of data (da and db) and the 2 ndarrays of mask (ma and mb) * we create a new array for db using np.where, putting 1 where db==0 and keeping db otherwise (if we were not doing that, we would get some NaNs down the road) * we create a new mask by combining ma and mb * we create the result array using np.where, using da where m is True, da/db otherwise (if we were not doing that, we would be processing the masked data and we may not want that) * Then, we add the mask to the result array.
I suspect that the np.where functions are suboptimal, and there might be a smarter way to achieve the same result while keeping all the functionalities (no NaNs (even masked) in the result, data kept when it should). I agree that these functionalities might be a bit overkill in simpler cases, such as yours. You may then want to use something like
ma.masked_array(a.data/b.data, mask=(a.mask  b.mask  (b.data==0))
Using Eric's example, I have 229ms/loop when dividing 2 ndarrays, 2.83s/loop when dividing 2 masked arrays, and down to 493ms/loop when using the quickanddirty function above). So anyway, you'll still be slower using MA than ndarrays, but not as slow...
On May 9, 2009, at 5:22 PM, Eli Bressert wrote:
Hi,
I'm using masked arrays to compute largescale standard deviation, multiplication, gaussian, and weighted averages. At first I thought using the masked arrays would be a great way to sidestep looping (which it is), but it's still slower than expected. Here's a snippet of the code that I'm using it for.
# Computing nearest neighbor distances. # Output will be about 270,000 rows long for the index # and 270,000x50 for the dist array. tree = ann.kd_tree(np.column_stack([l,b])) index, dist = tree.search(np.column_stack([l,b]),k=nth)
# Clipping bad values by replacing them acceptable values av[np.where(av<=10)] = 10 av[np.where(av>=50)] = 50
# Distance clipping and creating mask dist_arcsec = np.sqrt(dist)*3600 mask = dist_arcsec <= d_thresh
# Creating masked array av_good = ma.array(av[index],mask=mask) dist_good = ma.array(dist_arcsec,mask=mask)
# Reason why I'm using masked arrays. If these were # ndarrays with nan's, then the output would be nan. Std = np.array(np.std(av_good,axis=1)) Var = Std*Std
Rho = np.zeros( (len(av), nth) ) Rho2 = np.zeros( (len(av), nth) )
dist_std = np.std(dist_good,axis=1)
for j in range(nth): Rho[:,j] = dist_std Rho2[:,j] = Var
# This part takes about 20 seconds to compute for a 270,000x50 masked array. # Using ndarrays of the same size takes about 2 second spatial_weight = 1.0 / (Rho*np.sqrt(2*np.pi)) * np.exp(  dist_good / (2*Rho**2))
# Like the spatial_weight section, this takes about 20 seconds W = spatial_weight / Rho2
# Takes less than one second. Ave = np.average(av_good,axis=1,weights=W)
Any ideas on why it would take such a long time for processing? Especially the spatial_weight and W variables? Would there be a faster way to do this? Or is there a way that numpy.std can process ignore nan's when processing?
Thanks,
Eli Bressert _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
participants (7)

Eli Bressert

Eric Firing

Matt Knox

Matthew Brett

Pierre GM

Robert Kern

Stéfan van der Walt