All, The latest version of maskedarray has just been released on the scipy SVN sandbox. This version fixes the inconsistencies in filling (see below) and introduces some minor modifications for optimization purposes (see below as well). Many thanks to Eric Firing and Matt Knox for the fruitful discussions at the origin of this release!
In addition, a bench.py file has been introduced, to compare the speed of numpy.ma and maskedarray. Once again, thanks to Eric for his first draft.
Please feel free to try it and send me some feedback.
Modifications: * Consistent filling ! In numpy.ma, the division of array A by array B works in several steps: - A is filled w/ 0 - B is filled w/ 1 - A/B is computed - the output mask is updated as the combination of A.mask, B.mask and the domain mask (B==0) The problems with this approach are that (i) it's not useful to fill A and B beforehand if the values will be masked anyway; (ii) nothing prevents infs to show up, as the domain is taken into account at the end only.
In this latest version of maskedarray, the same division is decomposed as: - a copy of B._data is filled with 1 with the domain (B==0) - the division of A._data by this copy is computed - the output mask is updated as the combination of A.mask, B.mask and the domain mask (B==0).
Prefilling on the domain avoids the presence of nans/infs. However, this comes with the price of making some functions and methods slower than their numpy.ma counterparts, as you'll be able to observe for sqrt and log with the bench.py file. An alternative would be to avoid filling at all, at the risk of leaving nans and infs.
* masked_invalid / fix_invalid Two new functions are introduced. masked_invalid(x) masks x where x is nan or inf. fix_invalid(x) returns (a copy of) x, where invalid values (nans & infs) are replaced by fill_value.
* No mask shrinking Following Paul Dubois and Sasha's example, I eventually had to get rid of the semi-automatic shrinking of the mask in __getitem__, which appeared to be a major bottleneck. In other words, one can end up with an array full of False instead of nomask, which may slow things down a bit. You can force a mask back to nomask with the new shrink_mask method.
*_sharedmask Here again, I followed Paul and Sasha's ideas and reintroduce the _sharedmask flag to prevent inadequate propagation of the mask. When creating a new array with x=masked_array(data, mask=m), x._mask is initially a reference to m and x._sharedmask is True. When x is modified, x._mask is copied to prevent a propagation back to m.