![](https://secure.gravatar.com/avatar/7fdc9b298a3db0dda9ab44a306959baa.jpg?s=120&d=mm&r=g)
Folks, I just posted on the scipy/developers zone wiki (http://projects.scipy.org/scipy/numpy/wiki/MaskedArray) a reimplementation of the masked_array mopdule, motivated by some problems I ran into while subclassing MaskedArray. The main differences with the initial numpy.core.ma package are that MaskedArray is now a subclass of ndarray and that the _data section can now be any subclass of ndarray (well, it should work in most cases, some tweaking might required here and there). Apart from a couple of issues listed below, the behavior of the new MaskedArray class reproduces the old one. It is quite likely to be significantly slower, though: I was more interested in a clear organization than in performance, so I tended to use wrappers liberally. I'm sure we can improve that rather easily. The new module, along with a test suite and some utilities, are available here: http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/maskedarra... http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/masked_tes... http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/test_maske... Please note that it's still a work in progress (even if it seems to work quite OK when I use it). Suggestions, comments, improvements and general feedback are more than welcome ! ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
Does this new MA class allow masking of rearray like arrays? The numpy (1.0b5) version does not seem to. e.g. from numpy import * desc = [('name','S30'),('age',int8),('weight',float32)] a = array([('Bill',31,260.0),('Fred', 15, 145.0)], dtype=desc) print a[0] print a['name'] a2 = ma.array([('Bill',31,260.0),('Fred', 15, 145.0)], dtype=desc, mask=ma.nomask) print a2[0] print a2['name'] a3 = ma.array([('Bill',31,260.0),('Fred', 15, 145.0)], dtype=desc, mask=[[True,False,False],[False,False, False]]) print a3[0] print a3['name'] -- output ('Bill', 31, 260.0) [Bill Fred] ('Bill', 31, 260.0) [Bill Fred] Traceback (most recent call last): File "D:\eclipse\Table\scripts\testrecarray.py", line 12, in ? a3 = ma.array([('Bill',31,260.0),('Fred', 15, 145.0)], dtype=desc, mask=[[True,False,False],[False,False, False]]) File "C:\Python23\Lib\site-packages\numpy\core\ma.py", line 592, in __init__ raise MAError, "Mask and data not compatible." numpy.core.ma.MAError: Mask and data not compatible. On 10/16/06, Pierre GM <pgmdevlist@mailcan.com> wrote:
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/7fdc9b298a3db0dda9ab44a306959baa.jpg?s=120&d=mm&r=g)
On Monday 16 October 2006 22:08, Michael Sorich wrote:
Does this new MA class allow masking of rearray like arrays?
Excellent question! Which is to say, I have no idea... I don't use recordarray, so I didn't think about testing them. So, a first test indicates that it doesn't work either. The mask for a3 is understood as having a size 6, when the data is seen as size 2 only (exactly the same problem as the original ma module). I gonna poke around and check whether I can come with a workaround. Just to make it clear: you want to be able to mask some fields in some records, right ? Not mask all the fields of a record ? Thanks for the feedback, I'll keep you posted. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
On 10/17/06, Pierre GM <pgmdevlist@mailcan.com> wrote:
Yes, that is correct. It would be preferable to have more control over masking than simply masking an entire record. I view the heterogenouse arrays as 2D arrays in which it should be possible to mask individual values. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/7fdc9b298a3db0dda9ab44a306959baa.jpg?s=120&d=mm&r=g)
Folks, I updated the alternative implementation of MaskedArray on the wiki, mainly to correct a couple of bugs. (http://projects.scipy.org/scipy/numpy/wiki/MaskedArray) In addition, I attached another file, maskedrecordarray, which introduce a new class, MaskedRecord, as a subclass of recarray and MaskedArray. An instance of this class accepts a recarray as data, and uses two masks: the 'recordmask' has as many entries as records in the array, each entry with the same fields as a record, but of boolean types, indicating whether a field is masked or not; an entry is flagged as masked in the 'mask' array if at least one field is masked. The 'mask' object is introduced mostly for compatibilty with MaskedArray, only 'recordmask' is really useful. A few examples in the file should give you an idea of what can be done. In particular, you can define a new maskedrecord array as simply as ; a = masked_record([('Alan',29,200.), ('Bill',31,260.0)], dtype=[('name','S30'),('age',int_),('weight',float_)], mask=[(1,0,0), (0,0,0)]) Note that maskedrecordarray is still quite experimental. As I'm not a regular user of records, I don't really know what should be implemented... The file can be accessed at http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/maskedreco... Once again, I need your comments and suggestions ! Thanks. Pierre ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
I am currently running numpy rc2 (I haven't tried your reimplementation yet as I am still using python 2.3). I am wondering whether the new maskedarray is able to handle construction of arrays from masked scalar values (not sure if this is the correct term). I ran across a situation recently when I was picking individual values from a masked array, collecting them in a list and then subsequently constructing an array with these values. This does not work if any of the values choosen are masked. See example below On a more general note I am interested to find out whether there are any other languages that handle masked/missing data well and if so how this is done. My only experience is with R, which I have found to be quite good (there is a special value NA this signifies a masked value - this can be mixed in with non-masked values when defining an array). from numpy import * a = ma.array([1,2,3], mask=[True, False, False]) print a[0], type(a[0]) print a[1], type(a[1]) print list(a) a = ma.array(list(a)) -- output -- -- <class 'numpy.core.ma.MaskedArray'> 2 <type 'numpy.int32'> [array(data = 999999, mask = True, fill_value=999999) , 2, 3] C:\Python23\Lib\site-packages\numpy\core\ma.py:604: UserWarning: Cannot automatically convert masked array to numeric because data is masked in one or more locations. warnings.warn("Cannot automatically convert masked array to "\ Traceback (most recent call last): File "D:\eclipse\Table\scripts\testrecarray.py", line 23, in ? a = ma.array(list(a)) File "C:\Python23\Lib\site-packages\numpy\core\ma.py", line 562, in __init__ c = numeric.array(data, dtype=tc, copy=True, order=order) TypeError: an integer is required On 10/16/06, Pierre GM <pgmdevlist@mailcan.com> wrote:
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/56b215661867f3b4f4a3b28077de66b3.jpg?s=120&d=mm&r=g)
On Tuesday 24 October 2006 02:50, Michael Sorich wrote:
The answer is no, unfortunately: both the new and old implementations fail at the same point, raising a TypeError: lists are processed through numpy.core.numeric.array, which doesn't handle that. But there's a workaround: creating an empty masked array, and filling it by hand: b = list(a) A = MA.array(N.empty(len(b))) for (k,v) in enumerate(b): A[k] = v should work in most cases. I guess I could plug something along those lines in the new implementation... But there's a problem anyway: if you don't precise a type at the creation of the empty array, the type will be determined automatically, which may be a problem if you mix numbers and strings: the maskedarray is detected as a string in that case. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
On 10/25/06, Pierre GM <pgmdevlist@gmail.com> wrote:
I have finally gotten around to upgrading to python 2.4 and have had a chance to play with your new version of the MaskedArray. It is great to see that someone is working on this. I have a few thoughts on masked arrays that may or may no warrant discusion 1. It would be nice if the masked_singleton could be passed into a ndarray, as this would allow it to be passed into the MaskedArray e.g. import numpy as N import ma.maskedarray as MA test = N.array([1,2,MA.masked])
ValueError: setting an array element with a sequence
object [1 2 masked]
[False False True]
True
If the masked_singleton was implemented as an object that is not a MakedArray (which is a sequence that numpy.array chokes on), then a valid numpy array with an object dtype could be produced. e.g. class MaskedScalar: def __str__(self): return 'masked' masked = MaskedScalar() test = N.array([1,2,masked]) print test.dtype, test print test == masked print test[2] == masked print test[2] is masked
True
Then it would be possible to alternatively define a masked array as MA.array([1,2,masked]) or MA.array(N.array([1,2,masked])). In the __init__ of the MaskedArray if the ndarray has an object dtype simply calculate the mask from a==masked. 2. What happens if a masked array is passed into a ndarray or passed into a MaskedArray with another mask? test_ma1 = MA.array([1,2,3], mask=[False, False, True]) print test_ma1
[1 2 --]
print N.array(test_ma1)
[1 2 3]
test_ma2 = MA.array(test_ma1, mask=[True, False, False]) print test_ma2
[-- 2 3]
I suppose it depends on whether you are masking useful data, or the masks represent missing data. In the former it may make sense to change or remove the mask. However in the latter case the original data entered is a bogus value which should never be unmasked. In this case, when converting to a ndarray I think it make more sense to make an object ndarray with the missing value containing the masked singleton. Additionally, if the MaskedArray is holding missing data, it does not make much sense to be able to pass in to the MA constructor both an existing ma and a mask. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/56b215661867f3b4f4a3b28077de66b3.jpg?s=120&d=mm&r=g)
Michael, First of all, thanks for your interest in the exercise of style the new implementation of MaskedArray is basically nothing but. On Tuesday 07 November 2006 20:11, Michael Sorich wrote:
I like your idea, but not its implementation. If MA.masked_singleton is defined as an object, as you suggest, then the dtype of the ndarray it is passed to becomes 'object', as you pointed out, and that is not something one would naturally expec, as basic numerical functions don't work well with the 'object' dtype (just try N.sqrt(N.array([1],dtype=N.object)) to see what I mean). Even if we can construct a mask rather easily at the creation of the masked array, following your 'a==masked' suggestion, we still need to get the dtype of the non-masked section, and that doesn't seem trivial... I guess that a simple solution is to use MA.masked_values. Make sure to use a numerical value for the masked data, else you'll end up with yet another object array.
Let me precise that my objective was to get an implementation as close to the original numpy.core.ma as possible, for 'backward compatibility'. I'm not sure it'd be wise to change it at this point, but that could be discussed. As you've noticed, when creating a new masked array from an existing one, the 'mask' argument supersedes the initial mask. That's ideal when you want to focus on a fraction of the initial data: you just mask what you don't need, and are still able to retrieve it when you need it. I agree that this default behavior is a bit strange when you have missing data: in that case, one would expect the new mask to be a combination of the 'mask' argument and the old mask. A possibility would then be to add a 'keep_mask' flag: a default of False would give the current behavior, a value of True would force the new mask to be a combination. I think that feelings are mixed on that list about extra flags, but the users of maskedarray are only a minority anyway (hopefully, only for the moment). About the conversion to ndarray: By default, the result should have the same dtype as the _data section. For this reason, I disagree with your idea of "(returning) an object ndarray with the missing value containing the masked singleton". If you really want an object ndarray, you can use the filled method or the filled function, with your own definition of the filling value (such as your MaskedScalar). ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/ab7e74f2443b81e5175638d72be65e07.jpg?s=120&d=mm&r=g)
On 08/11/06, Pierre GM <pgmdevlist@gmail.com> wrote:
A good candidate for "should be masked" marked is NaN. It is supposed to mean, more or less, "no sensible value". Unfortunately, integer types do not have such a special value. It's also conceivable that some user might want to keep NaNs in their array separate from the mask. Finally, on some hardware, operations with NaN are very slow (so leaving them in the array, even masked, might not be a good idea). The reason I suggest this is that in the last major application I had for numpy, one stage of the problem would occasionally result in NaNs for certain values, but the best thing I could do was leave them in place to represent "no data". Switching to a MaskedArray might have been a better idea, but the NaNs were a rare occurrence.
If you've got floating point, you can again fill in NaNs, but you have a good point about wanting to extract the original values that were masked out. Depending on what one is doing, one might want one or the other. A. M. Archibald ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/5b2449484c19f8e037c5d9c71e429508.jpg?s=120&d=mm&r=g)
A. M. Archibald wrote:
It has always been my experience (on various flavors or Pentium) that operating on NANs is extremely slow. Does anyone know on what hardware NANs are *not* slow? Of course it's always possible I just never notice NANs on hardware where they aren't slow. [SNIP] -tim ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/ab7e74f2443b81e5175638d72be65e07.jpg?s=120&d=mm&r=g)
On 08/11/06, Tim Hochberg <tim.hochberg@ieee.org> wrote:
On an opteron machine I have access to, they appear to be no slower (and even faster for some transcendental functions) than ordinary floats: In [13]: a=zeros(1000000) In [14]: %time for i in xrange(1000): a += 1.1 CPU times: user 6.87 s, sys: 0.00 s, total: 6.87 s Wall time: 6.87 In [15]: a *= NaN In [16]: %time for i in xrange(1000): a += 1.1 CPU times: user 6.86 s, sys: 0.00 s, total: 6.86 s Wall time: 6.87 On my Pentium M, they are indeed significantly slower (three times? I didn't really do enough testing to say how much). I am actually rather offended by this unfair discrimination against a citizen in good standing of the IEEE floating point community. A. M. Archibald ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/5b2449484c19f8e037c5d9c71e429508.jpg?s=120&d=mm&r=g)
A. M. Archibald wrote:
If they're only 3x slower you're doing better than I am. On my core duo box they appear to be nearly 20x slower for both addition and multiplication. This more or less matches what I recall from earlier boxes. >>> Timer("a.prod()", "import numpy as np; a = np.ones(4096); a[0] = np.nan").timeit(10000) 5.6495738983585397 >>> Timer("a.prod()", "import numpy as np; a = np.ones(4096)").timeit(10000) 0.34439041833525152 >>> Timer("a.sum()", "import numpy as np; a = np.ones(4096); a[0] = np.nan").timeit(10000) 6.0985655998179027 >>> Timer("a.sum()", "import numpy as np; a = np.ones(4096)").timeit(10000) 0.32354363473564263 I've been told that operations on NANs are slow because they aren't always implemented in the FPU hardware. Instead they are trapped and implemented software or firmware or something or other. That may well be bogus 42nd hand information though. -tim ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
Tim Hochberg wrote:
which still doesn't make sense -- doesn't ANY operation with a NaN return NaN? how hard could that be? I share in the disappointment that NaN's are not first class citizens in common hardware.... -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/5b2449484c19f8e037c5d9c71e429508.jpg?s=120&d=mm&r=g)
Christopher Barker wrote:
Well, it's not free. You still have to recognize a particular bit pattern corresponding to QNaN and then copy the result in that case. All that probably burns more silicon for a case that most people don't care about. Do NaNs make Half Life run any faster? No I didn't think so. I suspect that instead they just look for the case where the exponent is all 1s, in which case the result is either +-Inf, SNaN or QNaN and then drop into software to handle those infrequent cases.
I share in the disappointment that NaN's are not first class citizens in common hardware....
It would be nice to see IEEE 754 supported more faithfully, but there seems to be more of a problem on the software side than on the hardware side in practice. I still can't help feeling that using NaNs for mask values would be abuse even if it weren't slow and eventually it would reach around and bite you someplace tender. -tim ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/04b319ee3cbd0b128fa68ee85756de27.jpg?s=120&d=mm&r=g)
Disappointed in NaN land? Since the function of old retired persons is to tell youngsters stories around the campfile: A supercomputer hardware designer told me that when forced to add IEEE arithmetic to his designs that it decreased performance substantially, maybe 25-30%; it wasn't that doing the operations took so much longer, it was that the increased physical space needed for that circuitry pushed the memory farther away. Doubtless this inspired doing some of it in software instead. No standard for controlling the behaviors exists, either, so you can find out the hard way that underflow-to-zero is being done in software by default, and that you are doing a lot of it. Or that your code doesn't have the same behavior on different platforms. To my mind, all that was really accomplished was to convince said youngsters that somehow this NaN stuff was the solution to some problem. In reality, computing for 3 days and having it print out 1000 NaNs is not exactly satisfying. I think this stuff was essentially a mistake in the sense that it is a nice ivory-tower idea that costs more in practice than it is worth. I do not think a properly thought-out and coded algorithm ought to do anything that this stuff is supposed to 'help' with, and if it does do it, the code should stop executing. Anyway, if I thought it would do the job I wouldn't have written MA in the first place. Rant off. Nap. Grumble. Stupid kids. (:-> -- Paul ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/ab7e74f2443b81e5175638d72be65e07.jpg?s=120&d=mm&r=g)
On 09/11/06, Paul Dubois <pfdubois@gmail.com> wrote:
Since the function of old retired persons is to tell youngsters stories around the campfile:
I'll pull up a log. But since I'm uppity, and not especially young, I hope you won't mind if I heckle.
The goal of IEEE floats is not to be faster but to make doing correct numerical programming easier. (Numerical analysts can write robust numerical code for almost any kind of float, but the vast majority of numerical code is written by scientists who know the bare minimum or less about numerical analysis, so a good system makes such code work as well as possible.) The other reason IEEE floats are good is, no disrespect to your hardware designer friend intended, that they were designed with some care. Reimplementations are liable to get some of the finicky details wrong (round-to-even, denormalized numbers, what have you...). I found Kahan's "How Java's Floating-point Hurts Everyone Everywhere" ( http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf ) very educational when it comes to "what can go wrong with platform-dependent floating-point".
Well, if the platform is IEEE-compliant, you can trust it to behave the same way logicially, even if some applications are slower on some systems than others. Or did you mean that there's no standard way to set the various IEEE flags? That seems like a language/compiler issue, to me (which is not to minimize the pain it causes!)
Well, then turn on floating-point exceptions. I find a few NaNs in my calculations relatively benign. If I'm doing a calculation where they'll propagate and destroy everything, I trap them. But it happens just as often that I launch a job, a NaN appears early on, but I come back in the morning to find that instead of aborting, the job has completed except for a few bad data points. If you want an algorithm that takes advantage of NaNs, I have one. I was simulating light rays around a black hole, generating a map of the bending for various view angles. So I naturally did a lot of exploring of grazing incidence, where the calculations would often yield NaNs. So I used the NaNs to fill in the gap between the internal bending of light and the external bending of the light; no extra programming effort required, since that was what came out of the ray tracer anyway. The rendering step just returned NaNs for the image pixels that came from interpolating in the NaN-filled regions, which came out black; just what I wanted. I just wrote the program ignoring what would happen if a NaN appeared, and it came out right. Which is, I think, the goal - to save the programmer having to think too much about numerical-analysis headaches.
Anyway, if I thought it would do the job I wouldn't have written MA in the first place.
It's certainly not always a good solution; in particular it's hard to control the behaviour of things like where(). There's also no NaN for integer types, which makes MaskedArray more useful. Anyway, I am pleased that numpy has both controllable IEEE floats and MaskedArray, and I apologize for a basically off-topic post. A. M. Archibald ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/7e9e53dbe9781722d56e308c32387078.jpg?s=120&d=mm&r=g)
It looks like on my Pentium M multiplication with NaNs is slow, but using a masked array ranges from slightly faster (with only one value masked) to twice as slow (with all values masked): In [15]:Timer("a.prod()", "import numpy as np; aa = np.ones(4096); a = np.ma.masked_greater(aa,0)").timeit(10000) Out[15]:9.4012830257415771 In [16]:Timer("a.prod()", "import numpy as np; a = np.ones(4096); a[0]= np.nan").timeit(10000) Out[16]:5.5737850666046143 In [17]:Timer("a.prod()", "import numpy as np; a = np.ones(4096)").timeit(10000) Out[17]:0.40796804428100586 In [18]:Timer("a.prod()", "import numpy as np; aa = np.ones(4096); aa[0] = 2; a = np.ma.masked_greater(aa,1)").timeit(10000) Out[18]:4.1544749736785889 In [19]:Timer("a.prod()", "import numpy as np; a = np.ones(4096); a[:]= np.nan").timeit(10000)Out[19]:5.8589630126953125 For transcendentals, nans or masks don't make much difference, although masks are slightly faster than nans: In [20]:Timer("np.sin(a)", "import numpy as np; a = np.ones(4096); a[:]= np.nan").timeit(10000) Out[20]:4.5575671195983887 In [21]:Timer("np.ma.sin(a)", "import numpy as np; aa = np.ones(4096); a = np.ma.masked_greater(aa,0)").timeit(10000) Out[21]:4.4125270843505859 In [22]:Timer("b=np.sin(a)", "import numpy as np; a = np.ones(4096)").timeit(10000) Out[22]:3.5793929100036621 Eric Tim Hochberg wrote:
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/56b215661867f3b4f4a3b28077de66b3.jpg?s=120&d=mm&r=g)
A good candidate for "should be masked" marked is NaN. It is supposed to mean, more or less, "no sensible value".
Which might turn out out to be the best indeed. Michael's application would then look like
In any case, I think we should stick to the numpy.core.ma default behavior for backwards compatibility. If you really wanht to distinguish between several kind of masks (one for missing data, one for data to discard temporarily), that could be done by defining a special subclass. But is it really needed ? A smart use of filled and masked_values should do the trick in most cases. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/04b319ee3cbd0b128fa68ee85756de27.jpg?s=120&d=mm&r=g)
2 cents from the author of the first folio: The intent was to allow creation of masked arrays with copy=no, so that the original data could be retrieved from it if desired. But I was quite, quite rigorous about NEVER assuming the data in a masked slot made any sense whatsoever. The intention was that there are two ways to get a numeric array out of a masked one: 1. Get the data field 2. m.filled() (1) is strictly at your own risk. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
While playing with the new implementation I noticed that the nonzero method did not seem to work properly. I was not able to trace down why this is the case - I could not find where the nonzero method of the new maskedarray is actually implemented :) import maskedarray as ma test_ma1 = ma.array([True, False, False], mask=[0,0,1]) print test_ma1 print test_ma1.nonzero()[0] print ma.nonzero(test_ma1)[0] --output-- [True False --] [True False --] [0] On 10/16/06, Pierre GM <pgmdevlist@mailcan.com> wrote:
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
I think that the new implementation is making a copy of the data with indexing a MA. This is different from both ndarray and the existing numpy ma version. e.g. testma = ma.array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]], mask=ma.nomask) testma2 = testma[1] testma2[1] = 20 print testma print testma2 --output-- [[1 2 3 4 5] [1 2 3 4 5] [1 2 3 4 5]] [ 1 20 3 4 5] Having subviews of the mask seems complicated with the mask being nomask. What happens if the view sets a new masked value and hence changes from nomask to an boolean array? How does the parent mask get updated? I think the numpy implementation gets away with this by returning a view of only the _data part if the ma mask is nomask - I don't like this solution as I would expect a ma to be returned. Also I suspect that if the ma is to be a view of another ma, then in __new__ a mask that is a boolean array of all False cannot be converted to nomask. I like the new implementation of maskedarray, especially the focus on simplicity. The only simple solution I see is to have the mask be a boolean array at all times. Or at least as soon as a view is required of a ma if mask is nomask then it should be converted to a boolean array and then both a view of the boolean array mask as well as the data is used in the view ma returned.
![](https://secure.gravatar.com/avatar/56b215661867f3b4f4a3b28077de66b3.jpg?s=120&d=mm&r=g)
On Tuesday 21 November 2006 21:11, Michael Sorich wrote:
Michael, If you check the definition of MaskedArray.__new__, you'll see that the "copy" argument is set to True by default. Setting it to "false" seems to give what you expect. Should I make the default ?
Having subviews of the mask seems complicated with the mask being nomask.
Why ? nomask is just a trick to avoid unnecessary computations on a mask full of False that doesn't need updating.
Both implementations work the same way: the parent mask is not updated.
I think the numpy implementation gets away with this by returning a view of only the _data part if the ma mask is nomask
By numpy implementation, you mean numpy.core.ma, right ? If so, then yes: `self.__getitem__[i]` returns `self._data[i]` if the mask is nomask. In maskedarray, if the mask is nomask, then `self.__getitem__[i]` returns `self._data[i]` only if `self._data[i].size==1`, else it returns a masked array.
I'm not following you here: there's no `__new__` in numpy.core.ma (that's one of the reason why a masked array in numpy.core.ma is basically different from a ndarray...). And in maskedarray, a mask as array of `False` is set to `nomask` by default, but you can use the `flag` option: please check the documentation of `maskedarray.masked_array`: flag=True converts the mask, flags=False keeps an array of boolean. One thing to remember is that masks tend to be copied more often than not. And I don't think it's advisable to modify the mask of the parent: it's no longer the same object, as the mask is now different ! In other terms, you could share data, you shouldn't share a mask. And I keep getting bitten with data sharing, that's why I had set the 'copy' flag to True by default.
You haven't convinced me yet of why a mask of False is better than `nomask`. What don't you like in maskedarray (aka the new implementation) ?
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
Perhaps an example will help explain what I mean For the case of an ndarray if you select a row and then alter the new array, the old array is also changed. from numpy import * a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]]) suba = a[2] suba[1] = 10 print a print suba --output-- [[ 1 2 3 4 5] [ 1 2 3 4 5] [ 1 10 3 4 5]] [ 1 10 3 4 5] In the current version of maskedarray in numpy, changes in the row array affect the parent array. Here whenever you select a single row or column and mask is nomask a ndarray is returned not a masked array from numpy.core.ma import * a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]], mask=nomask) suba = a[2] suba[1] = 10 print a print suba print type(suba) --output-- [[1 2 3 4 5] [1 2 3 4 5] [1 10 3 4 5]] [ 1 10 3 4 5] <type 'numpy.ndarray'> However if mask is anything other than nomask (even if the mask is an boolean array in which all the values are false- which means the same thing as nomask) a masked array is returned. Once again the data is shared between the arrays from numpy.core.ma import * a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]], mask=[[0,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0]]) suba = a[2] suba[1] = 10 print a print suba print type(suba) --output-- [[1 2 3 4 5] [1 2 3 4 5] [1 10 3 4 5]] [1 10 3 4 5] <class 'numpy.core.ma.MaskedArray'> Unfortunately if the value is changed to masked, this is not updated in the parent array. This seems very inconsistent. I don't view masked values any different than any other value. a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]], mask=[[0,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0]]) suba = a[2] suba[1] = masked print a print suba print type(suba) --output-- [[1 2 3 4 5] [1 2 3 4 5] [1 2 3 4 5]] [1 -- 3 4 5] <class 'numpy.core.ma.MaskedArray'> With the new implementation, the data is not shared for any of the 3 variations above. I am happy that the arrays acts consistently, however this is different from the ndarray. It would be nice if the behavior was the same as the ndarray, but it is better than the numpy.core.ma implementation. If this is on purpose, then it should be documented. from numpyext.maskedarray import * a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]], mask=nomask) suba = a[2] suba[1] = 10 print a print suba print type(suba) --output-- [[1 2 3 4 5] [1 2 3 4 5] [1 2 3 4 5]] [ 1 10 3 4 5] <class 'numpyext.maskedarray.MaskedArray'> On 11/22/06, Pierre GM <pgmdevlist@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
[[1 -- 3 4 5] [6 7 8 9 10]]
If I make some minor changes (below) to MaskedArray get and setitem from numpyext.maskedarray import * a = array([[1,2,3,4,5],[6,7,8,9,10]], mask=nomask) suba = a[0] suba[1] = masked print a print suba
[1 -- 3 4 5]
[[1 -- 3 4 5] [6 10 8 9 10]]
suba = a[1] suba[1] = 10 print a print suba
[6 10 8 9 10]
[[1 -- 3 4 5] [6 -- 8 9 10]]
a = array([[1,2,3,4,5],[6,7,8,9,10]], mask=[[0,0,0,0,0],[0,1,0,0,0]]) suba = a[0] suba[1] = masked print a print suba
[1 -- 3 4 5]
[[1 -- 3 4 5] [6 10 8 9 10]]
suba = a[1] suba[1] = 10 print a print suba
[6 10 8 9 10]
### in MaskedArray ### def __getitem__(self, i): """x.__getitem__(y) <==> x[y] Returns the item described by i. Not a copy as in previous versions. """ dout = self._data[i] if self._mask is nomask: if numeric.size(dout)==1: return dout else: ### made changes here self._mask = numeric.zeros(self.shape, dtype=MaskType) mout = self._mask[i] return self.__class__(dout, mask=mout, fill_value=self._fill_value, copy=False, flag=False) ### ------------- #.... # m = self._mask.copy() m = self._mask mi = m[i] if mi.size == 1: if mi: return masked else: return dout else: ### made changes here return self.__class__(dout, mask=mi, fill_value=self._fill_value, copy=False, flag=False) def __setitem__(self, index, value): """x.__setitem__(i, y) <==> x[i]=y Sets item described by index. If value is masked, masks those locations. """ d = self._data if self is masked: raise MAError, 'Cannot alter the masked element.' #.... if value is masked: if self._mask is nomask: _mask = make_mask_none(d.shape) else: _mask = self._mask #=============================================================================== # #why does the mask need to be copied? # _mask = self._mask.copy() #=============================================================================== _mask[index] = True self._mask = _mask return #.... m = getmask(value) value = filled(value).astype(d.dtype) d[index] = value if m is nomask: if self._mask is not nomask: #=============================================================================== # #why does the mask need to be copied? # _mask = self._mask.copy() #=============================================================================== _mask = self._mask _mask[index] = False else: _mask = nomask else: if self._mask is nomask: _mask = make_mask_none(d.shape) else: _mask = self._mask #=============================================================================== # #why does the mask need to be copied? # _mask = self._mask.copy() #=============================================================================== _mask[index] = m self._mask = _mask On 11/22/06, Michael Sorich <michael.sorich@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/56b215661867f3b4f4a3b28077de66b3.jpg?s=120&d=mm&r=g)
With the new implementation, the data is not shared for any of the 3 variations above.
The 'copy=True' in MaskedArray.__new__ ensures that data is always copied by default (that's the way I liked it). Most methods that return new masked arrays (__getitem__ and __setitem__, for example) do not specifically use a particular value of this flag, they just rely on the default, "copy=True" , which is "don't share". Once again, it's not so much because I'm selfish than because I regularly forget that data can be shared, and that modifying one array can have repercussions on another one the other side of the code. But there's a quick fix: in the definition of MaskedArray.__new__, just change the default copy flag from True to False, that'll work. Now, Michael, you're right, and it should be the default. I'll fix that. [Thinking about it, we could store the copy flag in the array, and use that value in __getitem__ and __setitem__. That might be overkill, though].
Inconsistent, maybe, useful definitely: Masking a view and getting the original masked accordingly could be useful, but I strongly feel that unmasking a view and getting an unmasked orginal is dangerous. Besides, in order to make the mask updatable, you need to get it in array form, as you suggested in your version of __getitem__. But then, you drag this array all along, and I'm not sure it's worthwhile. All in all, I'd far prefer a status-quo, with unsharable masks. What does the rest of the list think (er, the MA users) ?
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
The use of nan in float ndarrays has behaviour I would consider expected (at least coming from a numpy perspective). I think that the closer the maskedarrays mimic the behavior of ndarrays the easier it will be to remember how to use them and what to expect. from numpy import * a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]], dtype='f') suba = a[2] suba[1] = nan print a print suba
[ 1. -1.#IND 3. 4. 5. ]
a[2,2] = nan print a print suba
[ 1. -1.#IND -1.#IND 4. 5. ]
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
Does this new MA class allow masking of rearray like arrays? The numpy (1.0b5) version does not seem to. e.g. from numpy import * desc = [('name','S30'),('age',int8),('weight',float32)] a = array([('Bill',31,260.0),('Fred', 15, 145.0)], dtype=desc) print a[0] print a['name'] a2 = ma.array([('Bill',31,260.0),('Fred', 15, 145.0)], dtype=desc, mask=ma.nomask) print a2[0] print a2['name'] a3 = ma.array([('Bill',31,260.0),('Fred', 15, 145.0)], dtype=desc, mask=[[True,False,False],[False,False, False]]) print a3[0] print a3['name'] -- output ('Bill', 31, 260.0) [Bill Fred] ('Bill', 31, 260.0) [Bill Fred] Traceback (most recent call last): File "D:\eclipse\Table\scripts\testrecarray.py", line 12, in ? a3 = ma.array([('Bill',31,260.0),('Fred', 15, 145.0)], dtype=desc, mask=[[True,False,False],[False,False, False]]) File "C:\Python23\Lib\site-packages\numpy\core\ma.py", line 592, in __init__ raise MAError, "Mask and data not compatible." numpy.core.ma.MAError: Mask and data not compatible. On 10/16/06, Pierre GM <pgmdevlist@mailcan.com> wrote:
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/7fdc9b298a3db0dda9ab44a306959baa.jpg?s=120&d=mm&r=g)
On Monday 16 October 2006 22:08, Michael Sorich wrote:
Does this new MA class allow masking of rearray like arrays?
Excellent question! Which is to say, I have no idea... I don't use recordarray, so I didn't think about testing them. So, a first test indicates that it doesn't work either. The mask for a3 is understood as having a size 6, when the data is seen as size 2 only (exactly the same problem as the original ma module). I gonna poke around and check whether I can come with a workaround. Just to make it clear: you want to be able to mask some fields in some records, right ? Not mask all the fields of a record ? Thanks for the feedback, I'll keep you posted. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
On 10/17/06, Pierre GM <pgmdevlist@mailcan.com> wrote:
Yes, that is correct. It would be preferable to have more control over masking than simply masking an entire record. I view the heterogenouse arrays as 2D arrays in which it should be possible to mask individual values. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/7fdc9b298a3db0dda9ab44a306959baa.jpg?s=120&d=mm&r=g)
Folks, I updated the alternative implementation of MaskedArray on the wiki, mainly to correct a couple of bugs. (http://projects.scipy.org/scipy/numpy/wiki/MaskedArray) In addition, I attached another file, maskedrecordarray, which introduce a new class, MaskedRecord, as a subclass of recarray and MaskedArray. An instance of this class accepts a recarray as data, and uses two masks: the 'recordmask' has as many entries as records in the array, each entry with the same fields as a record, but of boolean types, indicating whether a field is masked or not; an entry is flagged as masked in the 'mask' array if at least one field is masked. The 'mask' object is introduced mostly for compatibilty with MaskedArray, only 'recordmask' is really useful. A few examples in the file should give you an idea of what can be done. In particular, you can define a new maskedrecord array as simply as ; a = masked_record([('Alan',29,200.), ('Bill',31,260.0)], dtype=[('name','S30'),('age',int_),('weight',float_)], mask=[(1,0,0), (0,0,0)]) Note that maskedrecordarray is still quite experimental. As I'm not a regular user of records, I don't really know what should be implemented... The file can be accessed at http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/maskedreco... Once again, I need your comments and suggestions ! Thanks. Pierre ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
I am currently running numpy rc2 (I haven't tried your reimplementation yet as I am still using python 2.3). I am wondering whether the new maskedarray is able to handle construction of arrays from masked scalar values (not sure if this is the correct term). I ran across a situation recently when I was picking individual values from a masked array, collecting them in a list and then subsequently constructing an array with these values. This does not work if any of the values choosen are masked. See example below On a more general note I am interested to find out whether there are any other languages that handle masked/missing data well and if so how this is done. My only experience is with R, which I have found to be quite good (there is a special value NA this signifies a masked value - this can be mixed in with non-masked values when defining an array). from numpy import * a = ma.array([1,2,3], mask=[True, False, False]) print a[0], type(a[0]) print a[1], type(a[1]) print list(a) a = ma.array(list(a)) -- output -- -- <class 'numpy.core.ma.MaskedArray'> 2 <type 'numpy.int32'> [array(data = 999999, mask = True, fill_value=999999) , 2, 3] C:\Python23\Lib\site-packages\numpy\core\ma.py:604: UserWarning: Cannot automatically convert masked array to numeric because data is masked in one or more locations. warnings.warn("Cannot automatically convert masked array to "\ Traceback (most recent call last): File "D:\eclipse\Table\scripts\testrecarray.py", line 23, in ? a = ma.array(list(a)) File "C:\Python23\Lib\site-packages\numpy\core\ma.py", line 562, in __init__ c = numeric.array(data, dtype=tc, copy=True, order=order) TypeError: an integer is required On 10/16/06, Pierre GM <pgmdevlist@mailcan.com> wrote:
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/56b215661867f3b4f4a3b28077de66b3.jpg?s=120&d=mm&r=g)
On Tuesday 24 October 2006 02:50, Michael Sorich wrote:
The answer is no, unfortunately: both the new and old implementations fail at the same point, raising a TypeError: lists are processed through numpy.core.numeric.array, which doesn't handle that. But there's a workaround: creating an empty masked array, and filling it by hand: b = list(a) A = MA.array(N.empty(len(b))) for (k,v) in enumerate(b): A[k] = v should work in most cases. I guess I could plug something along those lines in the new implementation... But there's a problem anyway: if you don't precise a type at the creation of the empty array, the type will be determined automatically, which may be a problem if you mix numbers and strings: the maskedarray is detected as a string in that case. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
On 10/25/06, Pierre GM <pgmdevlist@gmail.com> wrote:
I have finally gotten around to upgrading to python 2.4 and have had a chance to play with your new version of the MaskedArray. It is great to see that someone is working on this. I have a few thoughts on masked arrays that may or may no warrant discusion 1. It would be nice if the masked_singleton could be passed into a ndarray, as this would allow it to be passed into the MaskedArray e.g. import numpy as N import ma.maskedarray as MA test = N.array([1,2,MA.masked])
ValueError: setting an array element with a sequence
object [1 2 masked]
[False False True]
True
If the masked_singleton was implemented as an object that is not a MakedArray (which is a sequence that numpy.array chokes on), then a valid numpy array with an object dtype could be produced. e.g. class MaskedScalar: def __str__(self): return 'masked' masked = MaskedScalar() test = N.array([1,2,masked]) print test.dtype, test print test == masked print test[2] == masked print test[2] is masked
True
Then it would be possible to alternatively define a masked array as MA.array([1,2,masked]) or MA.array(N.array([1,2,masked])). In the __init__ of the MaskedArray if the ndarray has an object dtype simply calculate the mask from a==masked. 2. What happens if a masked array is passed into a ndarray or passed into a MaskedArray with another mask? test_ma1 = MA.array([1,2,3], mask=[False, False, True]) print test_ma1
[1 2 --]
print N.array(test_ma1)
[1 2 3]
test_ma2 = MA.array(test_ma1, mask=[True, False, False]) print test_ma2
[-- 2 3]
I suppose it depends on whether you are masking useful data, or the masks represent missing data. In the former it may make sense to change or remove the mask. However in the latter case the original data entered is a bogus value which should never be unmasked. In this case, when converting to a ndarray I think it make more sense to make an object ndarray with the missing value containing the masked singleton. Additionally, if the MaskedArray is holding missing data, it does not make much sense to be able to pass in to the MA constructor both an existing ma and a mask. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/56b215661867f3b4f4a3b28077de66b3.jpg?s=120&d=mm&r=g)
Michael, First of all, thanks for your interest in the exercise of style the new implementation of MaskedArray is basically nothing but. On Tuesday 07 November 2006 20:11, Michael Sorich wrote:
I like your idea, but not its implementation. If MA.masked_singleton is defined as an object, as you suggest, then the dtype of the ndarray it is passed to becomes 'object', as you pointed out, and that is not something one would naturally expec, as basic numerical functions don't work well with the 'object' dtype (just try N.sqrt(N.array([1],dtype=N.object)) to see what I mean). Even if we can construct a mask rather easily at the creation of the masked array, following your 'a==masked' suggestion, we still need to get the dtype of the non-masked section, and that doesn't seem trivial... I guess that a simple solution is to use MA.masked_values. Make sure to use a numerical value for the masked data, else you'll end up with yet another object array.
Let me precise that my objective was to get an implementation as close to the original numpy.core.ma as possible, for 'backward compatibility'. I'm not sure it'd be wise to change it at this point, but that could be discussed. As you've noticed, when creating a new masked array from an existing one, the 'mask' argument supersedes the initial mask. That's ideal when you want to focus on a fraction of the initial data: you just mask what you don't need, and are still able to retrieve it when you need it. I agree that this default behavior is a bit strange when you have missing data: in that case, one would expect the new mask to be a combination of the 'mask' argument and the old mask. A possibility would then be to add a 'keep_mask' flag: a default of False would give the current behavior, a value of True would force the new mask to be a combination. I think that feelings are mixed on that list about extra flags, but the users of maskedarray are only a minority anyway (hopefully, only for the moment). About the conversion to ndarray: By default, the result should have the same dtype as the _data section. For this reason, I disagree with your idea of "(returning) an object ndarray with the missing value containing the masked singleton". If you really want an object ndarray, you can use the filled method or the filled function, with your own definition of the filling value (such as your MaskedScalar). ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/ab7e74f2443b81e5175638d72be65e07.jpg?s=120&d=mm&r=g)
On 08/11/06, Pierre GM <pgmdevlist@gmail.com> wrote:
A good candidate for "should be masked" marked is NaN. It is supposed to mean, more or less, "no sensible value". Unfortunately, integer types do not have such a special value. It's also conceivable that some user might want to keep NaNs in their array separate from the mask. Finally, on some hardware, operations with NaN are very slow (so leaving them in the array, even masked, might not be a good idea). The reason I suggest this is that in the last major application I had for numpy, one stage of the problem would occasionally result in NaNs for certain values, but the best thing I could do was leave them in place to represent "no data". Switching to a MaskedArray might have been a better idea, but the NaNs were a rare occurrence.
If you've got floating point, you can again fill in NaNs, but you have a good point about wanting to extract the original values that were masked out. Depending on what one is doing, one might want one or the other. A. M. Archibald ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/5b2449484c19f8e037c5d9c71e429508.jpg?s=120&d=mm&r=g)
A. M. Archibald wrote:
It has always been my experience (on various flavors or Pentium) that operating on NANs is extremely slow. Does anyone know on what hardware NANs are *not* slow? Of course it's always possible I just never notice NANs on hardware where they aren't slow. [SNIP] -tim ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/ab7e74f2443b81e5175638d72be65e07.jpg?s=120&d=mm&r=g)
On 08/11/06, Tim Hochberg <tim.hochberg@ieee.org> wrote:
On an opteron machine I have access to, they appear to be no slower (and even faster for some transcendental functions) than ordinary floats: In [13]: a=zeros(1000000) In [14]: %time for i in xrange(1000): a += 1.1 CPU times: user 6.87 s, sys: 0.00 s, total: 6.87 s Wall time: 6.87 In [15]: a *= NaN In [16]: %time for i in xrange(1000): a += 1.1 CPU times: user 6.86 s, sys: 0.00 s, total: 6.86 s Wall time: 6.87 On my Pentium M, they are indeed significantly slower (three times? I didn't really do enough testing to say how much). I am actually rather offended by this unfair discrimination against a citizen in good standing of the IEEE floating point community. A. M. Archibald ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/5b2449484c19f8e037c5d9c71e429508.jpg?s=120&d=mm&r=g)
A. M. Archibald wrote:
If they're only 3x slower you're doing better than I am. On my core duo box they appear to be nearly 20x slower for both addition and multiplication. This more or less matches what I recall from earlier boxes. >>> Timer("a.prod()", "import numpy as np; a = np.ones(4096); a[0] = np.nan").timeit(10000) 5.6495738983585397 >>> Timer("a.prod()", "import numpy as np; a = np.ones(4096)").timeit(10000) 0.34439041833525152 >>> Timer("a.sum()", "import numpy as np; a = np.ones(4096); a[0] = np.nan").timeit(10000) 6.0985655998179027 >>> Timer("a.sum()", "import numpy as np; a = np.ones(4096)").timeit(10000) 0.32354363473564263 I've been told that operations on NANs are slow because they aren't always implemented in the FPU hardware. Instead they are trapped and implemented software or firmware or something or other. That may well be bogus 42nd hand information though. -tim ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
Tim Hochberg wrote:
which still doesn't make sense -- doesn't ANY operation with a NaN return NaN? how hard could that be? I share in the disappointment that NaN's are not first class citizens in common hardware.... -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/5b2449484c19f8e037c5d9c71e429508.jpg?s=120&d=mm&r=g)
Christopher Barker wrote:
Well, it's not free. You still have to recognize a particular bit pattern corresponding to QNaN and then copy the result in that case. All that probably burns more silicon for a case that most people don't care about. Do NaNs make Half Life run any faster? No I didn't think so. I suspect that instead they just look for the case where the exponent is all 1s, in which case the result is either +-Inf, SNaN or QNaN and then drop into software to handle those infrequent cases.
I share in the disappointment that NaN's are not first class citizens in common hardware....
It would be nice to see IEEE 754 supported more faithfully, but there seems to be more of a problem on the software side than on the hardware side in practice. I still can't help feeling that using NaNs for mask values would be abuse even if it weren't slow and eventually it would reach around and bite you someplace tender. -tim ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/04b319ee3cbd0b128fa68ee85756de27.jpg?s=120&d=mm&r=g)
Disappointed in NaN land? Since the function of old retired persons is to tell youngsters stories around the campfile: A supercomputer hardware designer told me that when forced to add IEEE arithmetic to his designs that it decreased performance substantially, maybe 25-30%; it wasn't that doing the operations took so much longer, it was that the increased physical space needed for that circuitry pushed the memory farther away. Doubtless this inspired doing some of it in software instead. No standard for controlling the behaviors exists, either, so you can find out the hard way that underflow-to-zero is being done in software by default, and that you are doing a lot of it. Or that your code doesn't have the same behavior on different platforms. To my mind, all that was really accomplished was to convince said youngsters that somehow this NaN stuff was the solution to some problem. In reality, computing for 3 days and having it print out 1000 NaNs is not exactly satisfying. I think this stuff was essentially a mistake in the sense that it is a nice ivory-tower idea that costs more in practice than it is worth. I do not think a properly thought-out and coded algorithm ought to do anything that this stuff is supposed to 'help' with, and if it does do it, the code should stop executing. Anyway, if I thought it would do the job I wouldn't have written MA in the first place. Rant off. Nap. Grumble. Stupid kids. (:-> -- Paul ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/ab7e74f2443b81e5175638d72be65e07.jpg?s=120&d=mm&r=g)
On 09/11/06, Paul Dubois <pfdubois@gmail.com> wrote:
Since the function of old retired persons is to tell youngsters stories around the campfile:
I'll pull up a log. But since I'm uppity, and not especially young, I hope you won't mind if I heckle.
The goal of IEEE floats is not to be faster but to make doing correct numerical programming easier. (Numerical analysts can write robust numerical code for almost any kind of float, but the vast majority of numerical code is written by scientists who know the bare minimum or less about numerical analysis, so a good system makes such code work as well as possible.) The other reason IEEE floats are good is, no disrespect to your hardware designer friend intended, that they were designed with some care. Reimplementations are liable to get some of the finicky details wrong (round-to-even, denormalized numbers, what have you...). I found Kahan's "How Java's Floating-point Hurts Everyone Everywhere" ( http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf ) very educational when it comes to "what can go wrong with platform-dependent floating-point".
Well, if the platform is IEEE-compliant, you can trust it to behave the same way logicially, even if some applications are slower on some systems than others. Or did you mean that there's no standard way to set the various IEEE flags? That seems like a language/compiler issue, to me (which is not to minimize the pain it causes!)
Well, then turn on floating-point exceptions. I find a few NaNs in my calculations relatively benign. If I'm doing a calculation where they'll propagate and destroy everything, I trap them. But it happens just as often that I launch a job, a NaN appears early on, but I come back in the morning to find that instead of aborting, the job has completed except for a few bad data points. If you want an algorithm that takes advantage of NaNs, I have one. I was simulating light rays around a black hole, generating a map of the bending for various view angles. So I naturally did a lot of exploring of grazing incidence, where the calculations would often yield NaNs. So I used the NaNs to fill in the gap between the internal bending of light and the external bending of the light; no extra programming effort required, since that was what came out of the ray tracer anyway. The rendering step just returned NaNs for the image pixels that came from interpolating in the NaN-filled regions, which came out black; just what I wanted. I just wrote the program ignoring what would happen if a NaN appeared, and it came out right. Which is, I think, the goal - to save the programmer having to think too much about numerical-analysis headaches.
Anyway, if I thought it would do the job I wouldn't have written MA in the first place.
It's certainly not always a good solution; in particular it's hard to control the behaviour of things like where(). There's also no NaN for integer types, which makes MaskedArray more useful. Anyway, I am pleased that numpy has both controllable IEEE floats and MaskedArray, and I apologize for a basically off-topic post. A. M. Archibald ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/7e9e53dbe9781722d56e308c32387078.jpg?s=120&d=mm&r=g)
It looks like on my Pentium M multiplication with NaNs is slow, but using a masked array ranges from slightly faster (with only one value masked) to twice as slow (with all values masked): In [15]:Timer("a.prod()", "import numpy as np; aa = np.ones(4096); a = np.ma.masked_greater(aa,0)").timeit(10000) Out[15]:9.4012830257415771 In [16]:Timer("a.prod()", "import numpy as np; a = np.ones(4096); a[0]= np.nan").timeit(10000) Out[16]:5.5737850666046143 In [17]:Timer("a.prod()", "import numpy as np; a = np.ones(4096)").timeit(10000) Out[17]:0.40796804428100586 In [18]:Timer("a.prod()", "import numpy as np; aa = np.ones(4096); aa[0] = 2; a = np.ma.masked_greater(aa,1)").timeit(10000) Out[18]:4.1544749736785889 In [19]:Timer("a.prod()", "import numpy as np; a = np.ones(4096); a[:]= np.nan").timeit(10000)Out[19]:5.8589630126953125 For transcendentals, nans or masks don't make much difference, although masks are slightly faster than nans: In [20]:Timer("np.sin(a)", "import numpy as np; a = np.ones(4096); a[:]= np.nan").timeit(10000) Out[20]:4.5575671195983887 In [21]:Timer("np.ma.sin(a)", "import numpy as np; aa = np.ones(4096); a = np.ma.masked_greater(aa,0)").timeit(10000) Out[21]:4.4125270843505859 In [22]:Timer("b=np.sin(a)", "import numpy as np; a = np.ones(4096)").timeit(10000) Out[22]:3.5793929100036621 Eric Tim Hochberg wrote:
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/56b215661867f3b4f4a3b28077de66b3.jpg?s=120&d=mm&r=g)
A good candidate for "should be masked" marked is NaN. It is supposed to mean, more or less, "no sensible value".
Which might turn out out to be the best indeed. Michael's application would then look like
In any case, I think we should stick to the numpy.core.ma default behavior for backwards compatibility. If you really wanht to distinguish between several kind of masks (one for missing data, one for data to discard temporarily), that could be done by defining a special subclass. But is it really needed ? A smart use of filled and masked_values should do the trick in most cases. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
![](https://secure.gravatar.com/avatar/04b319ee3cbd0b128fa68ee85756de27.jpg?s=120&d=mm&r=g)
2 cents from the author of the first folio: The intent was to allow creation of masked arrays with copy=no, so that the original data could be retrieved from it if desired. But I was quite, quite rigorous about NEVER assuming the data in a masked slot made any sense whatsoever. The intention was that there are two ways to get a numeric array out of a masked one: 1. Get the data field 2. m.filled() (1) is strictly at your own risk. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
While playing with the new implementation I noticed that the nonzero method did not seem to work properly. I was not able to trace down why this is the case - I could not find where the nonzero method of the new maskedarray is actually implemented :) import maskedarray as ma test_ma1 = ma.array([True, False, False], mask=[0,0,1]) print test_ma1 print test_ma1.nonzero()[0] print ma.nonzero(test_ma1)[0] --output-- [True False --] [True False --] [0] On 10/16/06, Pierre GM <pgmdevlist@mailcan.com> wrote:
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
I think that the new implementation is making a copy of the data with indexing a MA. This is different from both ndarray and the existing numpy ma version. e.g. testma = ma.array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]], mask=ma.nomask) testma2 = testma[1] testma2[1] = 20 print testma print testma2 --output-- [[1 2 3 4 5] [1 2 3 4 5] [1 2 3 4 5]] [ 1 20 3 4 5] Having subviews of the mask seems complicated with the mask being nomask. What happens if the view sets a new masked value and hence changes from nomask to an boolean array? How does the parent mask get updated? I think the numpy implementation gets away with this by returning a view of only the _data part if the ma mask is nomask - I don't like this solution as I would expect a ma to be returned. Also I suspect that if the ma is to be a view of another ma, then in __new__ a mask that is a boolean array of all False cannot be converted to nomask. I like the new implementation of maskedarray, especially the focus on simplicity. The only simple solution I see is to have the mask be a boolean array at all times. Or at least as soon as a view is required of a ma if mask is nomask then it should be converted to a boolean array and then both a view of the boolean array mask as well as the data is used in the view ma returned.
![](https://secure.gravatar.com/avatar/56b215661867f3b4f4a3b28077de66b3.jpg?s=120&d=mm&r=g)
On Tuesday 21 November 2006 21:11, Michael Sorich wrote:
Michael, If you check the definition of MaskedArray.__new__, you'll see that the "copy" argument is set to True by default. Setting it to "false" seems to give what you expect. Should I make the default ?
Having subviews of the mask seems complicated with the mask being nomask.
Why ? nomask is just a trick to avoid unnecessary computations on a mask full of False that doesn't need updating.
Both implementations work the same way: the parent mask is not updated.
I think the numpy implementation gets away with this by returning a view of only the _data part if the ma mask is nomask
By numpy implementation, you mean numpy.core.ma, right ? If so, then yes: `self.__getitem__[i]` returns `self._data[i]` if the mask is nomask. In maskedarray, if the mask is nomask, then `self.__getitem__[i]` returns `self._data[i]` only if `self._data[i].size==1`, else it returns a masked array.
I'm not following you here: there's no `__new__` in numpy.core.ma (that's one of the reason why a masked array in numpy.core.ma is basically different from a ndarray...). And in maskedarray, a mask as array of `False` is set to `nomask` by default, but you can use the `flag` option: please check the documentation of `maskedarray.masked_array`: flag=True converts the mask, flags=False keeps an array of boolean. One thing to remember is that masks tend to be copied more often than not. And I don't think it's advisable to modify the mask of the parent: it's no longer the same object, as the mask is now different ! In other terms, you could share data, you shouldn't share a mask. And I keep getting bitten with data sharing, that's why I had set the 'copy' flag to True by default.
You haven't convinced me yet of why a mask of False is better than `nomask`. What don't you like in maskedarray (aka the new implementation) ?
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
Perhaps an example will help explain what I mean For the case of an ndarray if you select a row and then alter the new array, the old array is also changed. from numpy import * a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]]) suba = a[2] suba[1] = 10 print a print suba --output-- [[ 1 2 3 4 5] [ 1 2 3 4 5] [ 1 10 3 4 5]] [ 1 10 3 4 5] In the current version of maskedarray in numpy, changes in the row array affect the parent array. Here whenever you select a single row or column and mask is nomask a ndarray is returned not a masked array from numpy.core.ma import * a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]], mask=nomask) suba = a[2] suba[1] = 10 print a print suba print type(suba) --output-- [[1 2 3 4 5] [1 2 3 4 5] [1 10 3 4 5]] [ 1 10 3 4 5] <type 'numpy.ndarray'> However if mask is anything other than nomask (even if the mask is an boolean array in which all the values are false- which means the same thing as nomask) a masked array is returned. Once again the data is shared between the arrays from numpy.core.ma import * a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]], mask=[[0,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0]]) suba = a[2] suba[1] = 10 print a print suba print type(suba) --output-- [[1 2 3 4 5] [1 2 3 4 5] [1 10 3 4 5]] [1 10 3 4 5] <class 'numpy.core.ma.MaskedArray'> Unfortunately if the value is changed to masked, this is not updated in the parent array. This seems very inconsistent. I don't view masked values any different than any other value. a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]], mask=[[0,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0]]) suba = a[2] suba[1] = masked print a print suba print type(suba) --output-- [[1 2 3 4 5] [1 2 3 4 5] [1 2 3 4 5]] [1 -- 3 4 5] <class 'numpy.core.ma.MaskedArray'> With the new implementation, the data is not shared for any of the 3 variations above. I am happy that the arrays acts consistently, however this is different from the ndarray. It would be nice if the behavior was the same as the ndarray, but it is better than the numpy.core.ma implementation. If this is on purpose, then it should be documented. from numpyext.maskedarray import * a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]], mask=nomask) suba = a[2] suba[1] = 10 print a print suba print type(suba) --output-- [[1 2 3 4 5] [1 2 3 4 5] [1 2 3 4 5]] [ 1 10 3 4 5] <class 'numpyext.maskedarray.MaskedArray'> On 11/22/06, Pierre GM <pgmdevlist@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
[[1 -- 3 4 5] [6 7 8 9 10]]
If I make some minor changes (below) to MaskedArray get and setitem from numpyext.maskedarray import * a = array([[1,2,3,4,5],[6,7,8,9,10]], mask=nomask) suba = a[0] suba[1] = masked print a print suba
[1 -- 3 4 5]
[[1 -- 3 4 5] [6 10 8 9 10]]
suba = a[1] suba[1] = 10 print a print suba
[6 10 8 9 10]
[[1 -- 3 4 5] [6 -- 8 9 10]]
a = array([[1,2,3,4,5],[6,7,8,9,10]], mask=[[0,0,0,0,0],[0,1,0,0,0]]) suba = a[0] suba[1] = masked print a print suba
[1 -- 3 4 5]
[[1 -- 3 4 5] [6 10 8 9 10]]
suba = a[1] suba[1] = 10 print a print suba
[6 10 8 9 10]
### in MaskedArray ### def __getitem__(self, i): """x.__getitem__(y) <==> x[y] Returns the item described by i. Not a copy as in previous versions. """ dout = self._data[i] if self._mask is nomask: if numeric.size(dout)==1: return dout else: ### made changes here self._mask = numeric.zeros(self.shape, dtype=MaskType) mout = self._mask[i] return self.__class__(dout, mask=mout, fill_value=self._fill_value, copy=False, flag=False) ### ------------- #.... # m = self._mask.copy() m = self._mask mi = m[i] if mi.size == 1: if mi: return masked else: return dout else: ### made changes here return self.__class__(dout, mask=mi, fill_value=self._fill_value, copy=False, flag=False) def __setitem__(self, index, value): """x.__setitem__(i, y) <==> x[i]=y Sets item described by index. If value is masked, masks those locations. """ d = self._data if self is masked: raise MAError, 'Cannot alter the masked element.' #.... if value is masked: if self._mask is nomask: _mask = make_mask_none(d.shape) else: _mask = self._mask #=============================================================================== # #why does the mask need to be copied? # _mask = self._mask.copy() #=============================================================================== _mask[index] = True self._mask = _mask return #.... m = getmask(value) value = filled(value).astype(d.dtype) d[index] = value if m is nomask: if self._mask is not nomask: #=============================================================================== # #why does the mask need to be copied? # _mask = self._mask.copy() #=============================================================================== _mask = self._mask _mask[index] = False else: _mask = nomask else: if self._mask is nomask: _mask = make_mask_none(d.shape) else: _mask = self._mask #=============================================================================== # #why does the mask need to be copied? # _mask = self._mask.copy() #=============================================================================== _mask[index] = m self._mask = _mask On 11/22/06, Michael Sorich <michael.sorich@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/56b215661867f3b4f4a3b28077de66b3.jpg?s=120&d=mm&r=g)
With the new implementation, the data is not shared for any of the 3 variations above.
The 'copy=True' in MaskedArray.__new__ ensures that data is always copied by default (that's the way I liked it). Most methods that return new masked arrays (__getitem__ and __setitem__, for example) do not specifically use a particular value of this flag, they just rely on the default, "copy=True" , which is "don't share". Once again, it's not so much because I'm selfish than because I regularly forget that data can be shared, and that modifying one array can have repercussions on another one the other side of the code. But there's a quick fix: in the definition of MaskedArray.__new__, just change the default copy flag from True to False, that'll work. Now, Michael, you're right, and it should be the default. I'll fix that. [Thinking about it, we could store the copy flag in the array, and use that value in __getitem__ and __setitem__. That might be overkill, though].
Inconsistent, maybe, useful definitely: Masking a view and getting the original masked accordingly could be useful, but I strongly feel that unmasking a view and getting an unmasked orginal is dangerous. Besides, in order to make the mask updatable, you need to get it in array form, as you suggested in your version of __getitem__. But then, you drag this array all along, and I'm not sure it's worthwhile. All in all, I'd far prefer a status-quo, with unsharable masks. What does the rest of the list think (er, the MA users) ?
![](https://secure.gravatar.com/avatar/25899bc1947e1ce40b16e55631c2c94a.jpg?s=120&d=mm&r=g)
The use of nan in float ndarrays has behaviour I would consider expected (at least coming from a numpy perspective). I think that the closer the maskedarrays mimic the behavior of ndarrays the easier it will be to remember how to use them and what to expect. from numpy import * a = array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]], dtype='f') suba = a[2] suba[1] = nan print a print suba
[ 1. -1.#IND 3. 4. 5. ]
a[2,2] = nan print a print suba
[ 1. -1.#IND -1.#IND 4. 5. ]
participants (8)
-
A. M. Archibald
-
Christopher Barker
-
Eric Firing
-
Michael Sorich
-
Paul Dubois
-
Pierre GM
-
Pierre GM
-
Tim Hochberg