Mailman 3 documentation for masked arrays? - NumPy-Discussion

documentation for masked arrays?

older
Generically Creating Intermediate...

Chris Withers

March 19, 2008

7:02 a.m.

Hi All, Where can I find docs for masked arrays? The "paid for" book doesn't even contain the phrase "masked_where" :-( cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk

Show replies by date

Chris Withers

March 2008

7:29 a.m.

New subject: bug with with fill_values in masked arrays?

OK, my specific problem with masked arrays is as follows:

...

...
...
a = numpy.array([1,numpy.nan,2]) aa = numpy.ma.masked_where(numpy.isnan(a),a) aa array(data = [ 1.00000000e+00 1.00000000e+20 2.00000000e+00], mask = [False True False], fill_value=1e+020)

...

...
...
numpy.ma.set_fill_value(aa,0) aa array(data = [ 1. 0. 2.], mask = [False True False], fill_value=0)

OK, so this looks like I want it to, however:

...

...
...
[v for v in aa] [1.0, array(data = 999999, mask = True, fill_value=999999) , 2.0]

Two questions: 1. why am I not getting my NaN's back? 2. why is the wrong fill value being used here? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk

Matt Knox

7:47 p.m.

New subject: [Numpy-discussion] bug with with fill_values in masked arrays?

...

OK, my specific problem with masked arrays is as follows:

...
...
...
a = numpy.array([1,numpy.nan,2]) aa = numpy.ma.masked_where(numpy.isnan(a),a) aa array(data = [ 1.00000000e+00 1.00000000e+20 2.00000000e+00], mask = [False True False], fill_value=1e+020)

...
...
...
numpy.ma.set_fill_value(aa,0) aa array(data = [ 1. 0. 2.], mask = [False True False], fill_value=0)

OK, so this looks like I want it to, however:

...
...
...
[v for v in aa] [1.0, array(data = 999999, mask = True, fill_value=999999) , 2.0]

Two questions:

1. why am I not getting my NaN's back?

when iterating over a masked array, you get the "ma.masked" constant for elements that were masked (same as what you would get if you indexed the masked array at that element). If you are referring specifically to the .data portion of the array... it looks like the latest version of the numpy.ma sub-module preserves nan's in the data portion of the masked array, but the old version perhaps doesn't based on the output you are showing.

...

2. why is the wrong fill value being used here?

the second element in the array iteration here is actually the numpy.ma.masked constant, which always has the same fill value (which I guess is 999999). This is independent of the fill value for your specific array. - Matt

Pierre GM

10:17 a.m.

New subject: bug with with fill_values in masked arrays?

Folks, Sorry for my delayed answers: I'm on the road these days and can't connect to the web as often and well I'd like to. On Wednesday 19 March 2008 19:47:37 Matt Knox wrote:

...

...
1. why am I not getting my NaN's back?

Because they're gone when you create your masked array. The idea here is to get rid of the nan in your data to avoid potential problems while keeping track of where the nans were in the first place. So, the .data part of your masked array should be nan-free, and the mask tells you where the nans were.

...

...
2. why is the wrong fill value being used here?

the second element in the array iteration here is actually the numpy.ma.masked constant, which always has the same fill value...

Couldn't say it better.

Chris Withers

12:55 p.m.

New subject: bug with with fill_values in masked arrays?

Pierre GM wrote:

...

On Wednesday 19 March 2008 19:47:37 Matt Knox wrote:

...
...
1. why am I not getting my NaN's back?

Because they're gone when you create your masked array.

Really? At least one other post has disagreed with that. And it does seem odd that a value, even if it's a nan, would be destroyed...

...

The idea here is to get rid of the nan in your data

No, it's to mask them, otherwise I would have used a normal array, not a ma.

...

to avoid potential problems while keeping track of where the nans were in the first place.

...like plotting them on a graph, which the current behaviour makes unworkable, that you end up doing a myarray.filled(0) to get around it, with imperfect results.

...

So, the .data part of your masked array should be nan-free,

Why? Surely that should be the source data, of which nan is a valid part?

...

and the mask tells you where the nans were.

Right, but why when the masked array is cast back to a list of numbers if the fill_value of the ma not respected?

...

...
...
2. why is the wrong fill value being used here? the second element in the array iteration here is actually the numpy.ma.masked constant, which always has the same fill value...

...and that's a bug. cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk

Matt Knox

3:08 p.m.

New subject: [Numpy-discussion] bug with with fill_values in masked arrays?

Chris, The behaviour you are seeing is intentional. Pierre is correct in asserting that it is not a bug. Now, you may disagree with the behaviour, but the behaviour is by design and is not a bug. Perhaps you are misunderstanding how to use masked arrays, which is understandable because the documentation is currently sparse. Take a look at the following example (using the latest svn version of numpy and matplotlib 0.91.2). ###################################################### import numpy as np from numpy import ma import pylab data = [1., 2., 3., np.nan, 5., 6.] mask = [0, 0, 0, 1, 0, 0] marr = ma.array(data, mask=mask) marr.set_fill_value(55) print marr.data print marr.mask print marr[0] is ma.masked # False print marr[3] # ma.masked constant print marr.mask[3] # True print marr.data[3] # is a nan value with svn numpy, not sure about 1.0.4 print marr[3] is ma.masked # True print marr.data[3] is ma.masked # False filled_arr = marr.filled() print filled_arr # nan value is replaced with fill value of 55 pylab.plot(marr) # masked value shows up as a gap in the plot pylab.show() ###################################################### All of the behaviour outlined above is (as far as I know) by design, and makes sense to me at least. If you disagree with some of the above behaviour, then I'm sure people would be happy to hear your opinion, but it is incorrect to flatly call this a bug. - Matt

Chris Withers

2:47 p.m.

New subject: bug with with fill_values in masked arrays?

Matt Knox wrote:

...

data = [1., 2., 3., np.nan, 5., 6.] mask = [0, 0, 0, 1, 0, 0]

I'm creating the ma with ma.masked_where...

...

marr = ma.array(data, mask=mask) marr.set_fill_value(55)

...

print marr[0] is ma.masked # False print marr[3] # ma.masked constant

Yeah, and this is where I have the problem. The masked constant has a fill value of 99999, rather than 55. That is annoying.

...

filled_arr = marr.filled() print filled_arr # nan value is replaced with fill value of 55

Right, and this is how I currently work around the problem. cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk

Pierre GM

8:24 p.m.

New subject: bug with with fill_values in masked arrays?

On Friday 21 March 2008 12:55:11 Chris Withers wrote:

...

Pierre GM wrote:

...
On Wednesday 19 March 2008 19:47:37 Matt Knox wrote:

...
...
1. why am I not getting my NaN's back?

Because they're gone when you create your masked array.

Really? At least one other post has disagreed with that.

Well, yeah, my bad, that depends on whether you use masked_invalid or fix_invalid or just build a basic masked array. Example:

...

...
...
import numpy as np import numpy.ma as ma x = np.array([1,np.nan,3]) # Basic construction y=ma.array(x) masked_array(data = [ 1. NaN 3.], mask = False, fill_value=1e+20) y=ma.masked_invalid(x) masked_array(data = [1.0 -- 3.0], mask = [False True False], fill_value=1e+20) y._data array([ 1., NaN, 3.]) y=ma.fix_invalid(x) masked_array(data = [1.0 -- 3.0], mask = [False True False], fill_value=1e+20) y._data array([ 1.00000000e+00, 1.00000000e+20, 3.00000000e+00])

...

And it does seem odd that a value, even if it's a nan, would be destroyed...

Having NaNs in an array usually reduces performance: the option we follow w/ fix_invalid is to clear the masked array of the NaNs, and keeping track of where they were by setting the mask to True at the appropriate location. That way, you don't have the drop of performance of having NaNs in your underlying array. Oh, and NaNs will be transformed to 0 if you use ints...

...

...
The idea here is to get rid of the nan in your data

No, it's to mask them, otherwise I would have used a normal array, not a ma.

Nope, the idea is really is to make things as efficient as possible. Now, you can still have your nans if you're ready to eat them.

...

...
to avoid potential problems while keeping track of where the nans were in the first place.

...like plotting them on a graph, which the current behaviour makes unworkable, that you end up doing a myarray.filled(0) to get around it, with imperfect results.

Send an example. I don't seem to have this problem: x = np.arange(10,dtype=np.float) x[5]=np.nan y=ma.masked_invalid(x) plot(x,'ok-') plot(y,'sr-')

...

Right, but why when the masked array is cast back to a list of numbers if the fill_value of the ma not respected?

Because in your particular case, you're inspecting elements one by one, and then, your masked data becomes the masked singleton which is a special value. That has nothing to do w/ the filling.

...

...
...
...
2. why is the wrong fill value being used here?

the second element in the array iteration here is actually the numpy.ma.masked constant, which always has the same fill value...

...and that's a bug.

And once again, it's not. numpy.ma.masked is a special value, like numpy.nan or numpy.inf

Chris Withers

10:33 a.m.

New subject: bug with with fill_values in masked arrays?

Pierre GM wrote:

...

Well, yeah, my bad, that depends on whether you use masked_invalid or fix_invalid or just build a basic masked array.

Yeah, well, if there were any docs I'd have a *clue* what you were talking about ;-)

...

...
...
...
y=ma.fix_invalid(x)

I've never done this ;-)

...

Having NaNs in an array usually reduces performance: the option we follow w/ fix_invalid is to clear the masked array of the NaNs, and keeping track of where they were by setting the mask to True at the appropriate location.

That's good to know....

...

That way, you don't have the drop of performance of having NaNs in your underlying array. Oh, and NaNs will be transformed to 0 if you use ints...

"use ints" in what context?

...

Nope, the idea is really is to make things as efficient as possible.

For you, maybe. And for me, yes, except I wanted the NaNs to stick around...

...

y=ma.masked_invalid(x)

I'm not using masked_invalid. I didn't even know it existed.

...

Because in your particular case, you're inspecting elements one by one, and then, your masked data becomes the masked singleton which is a special value.

I'd argue that the masked singleton having a different fill value to the ma it comes from is a bug.

...

And once again, it's not. numpy.ma.masked is a special value, like numpy.nan or numpy.inf

...which is silly, since that forces it to have a fixed fill value, which it should not. cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk

Pierre GM

11:12 a.m.

New subject: bug with with fill_values in masked arrays?

On Tuesday 25 March 2008 10:33:58 Chris Withers wrote:

...

Pierre GM wrote:

...
Well, yeah, my bad, that depends on whether you use masked_invalid or fix_invalid or just build a basic masked array.

Yeah, well, if there were any docs I'd have a *clue* what you were talking about ;-)

My bad, I neglected an overall doc for the functions and their docstring. But you know what ? As you're now at an intermediary level, you'll be able to help: just write down the problems you encountered, and the solutions you came up with, so that we could use your experience as the backbone for a proper MaskedArray documentation

...

...
Oh, and NaNs will be transformed to 0 if you use ints...

"use ints" in what context?

Try that:

...

...
...
x = numpy.ma.array([0,1,2,3,]) x[-1] = numpy.nan print x [0 1 2 0] See? No NaNs with an int array.

...

...
Nope, the idea is really is to make things as efficient as possible.

For you, maybe. And for me, yes, except I wanted the NaNs to stick around...

Well, no problem, they should stick around. Note that if a NaN/Inf should normally show up as the result of some operation (divide by zero for example), it'll probably won't:

...

...
...
x = numpy.ma.array([0,1,2,numpy.nan],dtype=float) print 1./x [-- 1.0 0.5 nan] print (1./x)._data [ 1. 1. 0.5 NaN] print 1./x._data [ Inf 1. 0.5 NaN]

...

I'd argue that the masked singleton having a different fill value to the ma it comes from is a bug.

"It's not a bug, it's a feature"TM

...

...
And once again, it's not. numpy.ma.masked is a special value, like numpy.nan or numpy.inf

...which is silly, since that forces it to have a fixed fill value, which it should not.

The fill_value for the mask singleton is meaningless, correct. However, having numpy.ma.masked as a constant is really helpful to test whether a particular value is masked, or to mask a particular value:

...

...
...
x = numpy.ma.array([0,1,2,3]) x[-1] = masked x[-1] is masked True

Chris Withers

3:42 p.m.

New subject: bug with with fill_values in masked arrays?

Pierre GM wrote:

...

My bad, I neglected an overall doc for the functions and their docstring. But you know what ? As you're now at an intermediary level,

That's pretty unkind to your userbase. I know a lot about python, but I'm a total novice with numpy and even the maths it's based on.

...

help: just write down the problems you encountered, and the solutions you came up with, so that we could use your experience as the backbone for a proper MaskedArray documentation

Blind leading the blind seems like a terrible idea to me...

...

Try that:

...
...
...
x = numpy.ma.array([0,1,2,3,]) x[-1] = numpy.nan print x [0 1 2 0] See? No NaNs with an int array.

Right. "Array types" and whatever a dtype is are things that could be much better documented too :-(

...

Well, no problem, they should stick around. Note that if a NaN/Inf should normally show up as the result of some operation (divide by zero for example), it'll probably won't:

...
...
...
x = numpy.ma.array([0,1,2,numpy.nan],dtype=float) print 1./x [-- 1.0 0.5 nan]

NaN/inf is still NaN in my books, so why would I be surprised by this?

...

...
I'd argue that the masked singleton having a different fill value to the ma it comes from is a bug.

"It's not a bug, it's a feature"TM

One which sucks and is unintuitive.

...

The fill_value for the mask singleton is meaningless, correct. However, having numpy.ma.masked as a constant is really helpful to test whether a particular value is masked, or to mask a particular value:

...
...
...
x = numpy.ma.array([0,1,2,3]) x[-1] = masked x[-1] is masked True

I may not know much about maths, but I know about these funny things in python we have called "classes" to solve exactly this problem ;-)

...

...
...
x[-1] = Masked(fill_value=50) isinstance(x[-1],Masked) True

...which gives you what you want without forcing me to experience the resultant suck. cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk

Pierre GM

4:40 p.m.

New subject: bug with with fill_values in masked arrays?

On Wednesday 26 March 2008 15:42:41 Chris Withers wrote:

...

Pierre GM wrote:

...
My bad, I neglected an overall doc for the functions and their docstring. But you know what ? As you're now at an intermediary level,

That's pretty unkind to your userbase. I know a lot about python, but I'm a total novice with numpy and even the maths it's based on.

My bosses have different priorities and keep on recalling me that spending time writing Python code is not what I was hired to do, and that should be writing scientific papers by the dozen. Let's say that I'm just playing middle ground to the best of my capacities. And time.

...

...
help: just write down the problems you encountered, and the solutions you came up with, so that we could use your experience as the backbone for a proper MaskedArray documentation

Blind leading the blind seems like a terrible idea to me...

You're no longer a complete neophyte, so you're not that blind, but are still experiencing the tough part of the learning curve. I took things for granted nowadays (for example, dtypes) that are not obvious for the absolute beginners, that's exactly where you can play your role: remind me what it is to be blind so that I can help you more, start some simple doc pages on the wiki that the community can edit/append.

...

NaN/inf is still NaN in my books, so why would I be surprised by this?

Because with a regular ndarray with no NaNs initially, you could end up with NaNs and Infs with some operations. With MaskedArray, you don't.

...

...
...
I'd argue that the masked singleton having a different fill value to the ma it comes from is a bug.

"It's not a bug, it's a feature"TM

One which sucks and is unintuitive.

I can understand the unintuitive part to a certain extent, I won't comment on the first aspect however, you know, tastes, colors, snails, oysters, that kind of thing. On top of that, I could kick into touch and say that it's needed for backwards compatibility.

...

...
...
...
x[-1] = Masked(fill_value=50) isinstance(x[-1],Masked)

True

...which gives you what you want without forcing me to experience the resultant suck.

Yeah, that's a possibility. Feel free to implement it so that we can compare the two approaches. I still don understand why you really need to have a particular fill_value for the masked constant anyway: what are you trying to do exactly ?

Hans Meine

April 2008

4:47 a.m.

New subject: bug with with fill_values in masked arrays?

Am Dienstag, 25. März 2008 15:33:58 schrieb Chris Withers:

...

...
Because in your particular case, you're inspecting elements one by one, and then, your masked data becomes the masked singleton which is a special value.

I'd argue that the masked singleton having a different fill value to the ma it comes from is a bug.

Note that there's no "ma it comes from". It's a singleton. A special value. And your suggestion with isinstance would surely be less efficient than the current solution, since using the "is" operator for identity checking is as efficient as it gets. Just ignore the fill_value, which is only there for technical reason; it's unused in any case. Thanks to this discussion, I finally got an impression of ma. -- Ciao, / / /--/ / / ANS

Chris Withers

March 2008

7:06 p.m.

New subject: bug with with fill_values in masked arrays?

Matt Knox wrote:

...

...
1. why am I not getting my NaN's back?

when iterating over a masked array, you get the "ma.masked" constant for elements that were masked (same as what you would get if you indexed the masked array at that element). If you are referring specifically to the .data portion of the array... it looks like the latest version of the numpy.ma sub-module preserves nan's in the data portion of the masked array, but the old version perhaps doesn't based on the output you are showing.

OK, when's this going to make it into a release?

...

...
2. why is the wrong fill value being used here?

the second element in the array iteration here is actually the numpy.ma.masked constant, which always has the same fill value (which I guess is 999999).

This sucks to the point of feeling like a bug :-( Why is it desirable for it to behave like this? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk

Pierre GM

8:24 p.m.

New subject: bug with with fill_values in masked arrays?

On Thursday 20 March 2008 19:06:32 Chris Withers wrote:

...

...
the second element in the array iteration here is actually the numpy.ma.masked constant, which always has the same fill value (which I guess is 999999).

This sucks to the point of feeling like a bug :-(

It is not.

...

Why is it desirable for it to behave like this?

Because that way, you can compare anything to masked and see whether a value is masked or not. Anyway, in your case, it's just mean your value is masked. You don't care about the filling_value for this one.

Chris Withers

12:52 p.m.

New subject: bug with with fill_values in masked arrays?

Pierre GM wrote:

...

...
This sucks to the point of feeling like a bug :-(

It is not.

Ignoring the fill value of masked array feels like a bug to me...

...

...
Why is it desirable for it to behave like this?

Because that way, you can compare anything to masked and see whether a value is masked or not. Anyway, in your case, it's just mean your value is masked. You don't care about the filling_value for this one.

Where I cared was when trying to do a filled line plot in matplotlib and the nans, rather than being omitted, were being shown on the y-axis at 999999, totally wrecking the plot. I'll buy your argument *iff* the masked arrays used the fill value from the parent ma. cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk

Pierre GM

7:58 p.m.

New subject: bug with with fill_values in masked arrays?

On Friday 21 March 2008 12:52:45 Chris Withers wrote:

...

Pierre GM wrote:

...
...
This sucks to the point of feeling like a bug :-(

It is not.

Ignoring the fill value of masked array feels like a bug to me...

You're right with masked arrays, but here we're talking the masked singleton, a special value.

...

Where I cared was when trying to do a filled line plot in matplotlib and the nans, rather than being omitted, were being shown on the y-axis at 999999, totally wrecking the plot.

You're losing me there. Send a simple example/script so that I can have a better idea of what you're trying to do.

...

I'll buy your argument *iff* the masked arrays used the fill value from the parent ma.

What parent ma ?

Bill Spotz

10:28 a.m.

I have found that any search on that document containing an underscore will turn up zero matches. Substitute a space instead. On Mar 19, 2008, at 5:02 AM, Chris Withers wrote:

...

Where can I find docs for masked arrays? The "paid for" book doesn't even contain the phrase "masked_where" :-(

** Bill Spotz ** ** Sandia National Laboratories Voice: (505)845-0170 ** ** P.O. Box 5800 Fax: (505)284-0154 ** ** Albuquerque, NM 87185-0370 Email: wfspotz@sandia.gov **

Chris Withers

11:45 a.m.

Bill Spotz wrote:

...

I have found that any search on that document containing an underscore will turn up zero matches. Substitute a space instead.

That's not been my experience. I found the *one* mention of fill_value just fine, the coverage of masked arrays is woeful :-( Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk

Jarrod Millman

3:55 a.m.

On Wed, Mar 19, 2008 at 8:45 AM, Chris Withers <chris@simplistix.co.uk> wrote:

...

That's not been my experience. I found the *one* mention of fill_value just fine, the coverage of masked arrays is woeful :-(

There is a documentation day on Friday. If you have some time, it would be great if you could help out with writing NumPy docstrings. There more people who contribute, the faster this will happen. Thanks, -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/

Chris Withers

7:07 p.m.

Jarrod Millman wrote:

...

There is a documentation day on Friday. If you have some time, it would be great if you could help out with writing NumPy docstrings. There more people who contribute, the faster this will happen.

It's a catch 22, I don't have the knowledge to usefully do this :-( Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk

6161

Age (days ago)

6181

Last active (days ago)

List overview

Download

20 comments

6 participants

participants (6)

Bill Spotz
Chris Withers
Hans Meine
Jarrod Millman
Matt Knox
Pierre GM

documentation for masked arrays?

Pierre GM

Pierre GM

Pierre GM

Pierre GM

Pierre GM

Pierre GM

Bill Spotz

tags

participants (6)