
Hi all
I am a fairly recent convert to python and I have got a question that's got me stumped. I hope this is the right mailing list: here goes :)
I am reading some time series data out of a netcdf file a single timestep at a time. If the data is NaN, I want to reset it to the minimum of the dataset over all timesteps (which I already know). The data is in a variable of type numpy.ma.core.MaskedArray called modelData.
If I do this:
for i in range(len(modelData)): if math.isnan(modelData[i]): modelData[i] = dataMin
I get the effect I want, If I do this:
modelData[np.isnan(modelData)] = dataMin
it doesn't seem to be working. Of course I could just do the first one, but len(modelData) is about 3.5 million, and it's taking about 20 seconds to run. This is happening inside of a rendering loop, so I'd like it to be as fast as possible, and I thought the second one might be faster, and maybe it is, but it doesn't seem to be working! :)
Any ideas would be much appreciated.
Thanks Howard

What are the types and shapes of modelData and dataMin? (it works for me with modelData a (3, 4) numpy array and dataMin a Python float, with numpy 1.6.1)
-=- Olivier
2012/1/27 Howard howard@renci.org
Hi all
I am a fairly recent convert to python and I have got a question that's got me stumped. I hope this is the right mailing list: here goes :)
I am reading some time series data out of a netcdf file a single timestep at a time. If the data is NaN, I want to reset it to the minimum of the dataset over all timesteps (which I already know). The data is in a variable of type numpy.ma.core.MaskedArray called modelData.
If I do this:
for i in range(len(modelData)): if math.isnan(modelData[i]): modelData[i] = dataMin
I get the effect I want, If I do this:
modelData[np.isnan(modelData)] = dataMin
it doesn't seem to be working. Of course I could just do the first one, but len(modelData) is about 3.5 million, and it's taking about 20 seconds to run. This is happening inside of a rendering loop, so I'd like it to be as fast as possible, and I thought the second one might be faster, and maybe it is, but it doesn't seem to be working! :)
Any ideas would be much appreciated.
Thanks Howard
-- Howard Lander howard@renci.org Senior Research Software Developer Renaissance Computing Institute (RENCI) http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi Olivier
I added this to the code:
print "modelData:", type(modelData), modelData.shape, modelData.size print "dataMin:", type(dataMin)
and got
modelData: <class 'numpy.ma.core.MaskedArray'> (1767734,) 1767734 dataMin: <type 'float'>
What's funny is I tried the example from
http://docs.scipy.org/doc/numpy-1.6.0/numpy-user.pdf
and it works fine for me. Maybe 1.7 million is over some threshhold?
Thanks Howard
myarr = np.ma.core.MaskedArray([1., 0., np.nan, 3.]) myarr[np.isnan(myarr)] = 30 myarr
masked_array(data = [ 1. 0. 30. 3.], mask = False, fill_value = 1e+20)
On 1/27/12 4:42 PM, Olivier Delalleau wrote:
What are the types and shapes of modelData and dataMin? (it works for me with modelData a (3, 4) numpy array and dataMin a Python float, with numpy 1.6.1)
-=- Olivier
2012/1/27 Howard <howard@renci.org mailto:howard@renci.org>
Hi all I am a fairly recent convert to python and I have got a question that's got me stumped. I hope this is the right mailing list: here goes :) I am reading some time series data out of a netcdf file a single timestep at a time. If the data is NaN, I want to reset it to the minimum of the dataset over all timesteps (which I already know). The data is in a variable of type numpy.ma.core.MaskedArray called modelData. If I do this: for i in range(len(modelData)): if math.isnan(modelData[i]): modelData[i] = dataMin I get the effect I want, If I do this: modelData[np.isnan(modelData)] = dataMin it doesn't seem to be working. Of course I could just do the first one, but len(modelData) is about 3.5 million, and it's taking about 20 seconds to run. This is happening inside of a rendering loop, so I'd like it to be as fast as possible, and I thought the second one might be faster, and maybe it is, but it doesn't seem to be working! :) Any ideas would be much appreciated. Thanks Howard -- Howard Lander <mailto:howard@renci.org> Senior Research Software Developer Renaissance Computing Institute (RENCI) <http://www.renci.org> The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651 _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpy-discussion

Oh, one other thing I should mention:
I did the install of numpy yesterday and I also have 1.6.1
Howard
On 1/27/12 4:54 PM, Howard wrote:
Hi Olivier
I added this to the code:
print "modelData:", type(modelData), modelData.shape, modelData.size print "dataMin:", type(dataMin)
and got
modelData: <class 'numpy.ma.core.MaskedArray'> (1767734,) 1767734 dataMin: <type 'float'>
What's funny is I tried the example from
http://docs.scipy.org/doc/numpy-1.6.0/numpy-user.pdf
and it works fine for me. Maybe 1.7 million is over some threshhold?
Thanks Howard
myarr = np.ma.core.MaskedArray([1., 0., np.nan, 3.]) myarr[np.isnan(myarr)] = 30 myarr
masked_array(data = [ 1. 0. 30. 3.], mask = False, fill_value = 1e+20)
On 1/27/12 4:42 PM, Olivier Delalleau wrote:
What are the types and shapes of modelData and dataMin? (it works for me with modelData a (3, 4) numpy array and dataMin a Python float, with numpy 1.6.1)
-=- Olivier
2012/1/27 Howard <howard@renci.org mailto:howard@renci.org>
Hi all I am a fairly recent convert to python and I have got a question that's got me stumped. I hope this is the right mailing list: here goes :) I am reading some time series data out of a netcdf file a single timestep at a time. If the data is NaN, I want to reset it to the minimum of the dataset over all timesteps (which I already know). The data is in a variable of type numpy.ma.core.MaskedArray called modelData. If I do this: for i in range(len(modelData)): if math.isnan(modelData[i]): modelData[i] = dataMin I get the effect I want, If I do this: modelData[np.isnan(modelData)] = dataMin it doesn't seem to be working. Of course I could just do the first one, but len(modelData) is about 3.5 million, and it's taking about 20 seconds to run. This is happening inside of a rendering loop, so I'd like it to be as fast as possible, and I thought the second one might be faster, and maybe it is, but it doesn't seem to be working! :) Any ideas would be much appreciated. Thanks Howard -- Howard Lander <mailto:howard@renci.org> Senior Research Software Developer Renaissance Computing Institute (RENCI) <http://www.renci.org> The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651 _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Howard Lander mailto:howard@renci.org Senior Research Software Developer Renaissance Computing Institute (RENCI) http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651

On 01/27/2012 11:18 AM, Howard wrote:
Hi all
I am a fairly recent convert to python and I have got a question that's got me stumped. I hope this is the right mailing list: here goes :)
I am reading some time series data out of a netcdf file a single timestep at a time. If the data is NaN, I want to reset it to the minimum of the dataset over all timesteps (which I already know). The data is in a variable of type numpy.ma.core.MaskedArray called modelData.
If I do this:
for i in range(len(modelData)): if math.isnan(modelData[i]): modelData[i] = dataMin
I get the effect I want, If I do this:
modelData[np.isnan(modelData)] = dataMin
it doesn't seem to be working. Of course I could just do the first one, but len(modelData) is about 3.5 million, and it's taking about 20 seconds to run. This is happening inside of a rendering loop, so I'd like it to be as fast as possible, and I thought the second one might be faster, and maybe it is, but it doesn't seem to be working! :)
It would help if you would say explicitly what you mean by "doesn't seem to be working", ideally by providing a minimal complete example illustrating the problem.
Does modelData have masked values that you want to keep separate from your NaN values? If not, you can do this:
y = np.ma.masked_invalid(modelData).filled(dataMin)
Then y will be an ordinary ndarray. If this is not satisfactory because you need to keep separate some initially masked values, then you may need to save the initial mask and use it to turn y back into a masked array.
You may be running into trouble with your initial approach because using np.isnan on a masked array is giving a masked array, and I think trying to index with a masked array is not advised.
In [2]: np.isnan(np.ma.array([1.0, np.nan, 2.0], mask=[False, False, True])) Out[2]: masked_array(data = [False True --], mask = [False False True], fill_value = True)
Eric
Any ideas would be much appreciated.
Thanks Howard
-- Howard Lander mailto:howard@renci.org Senior Research Software Developer Renaissance Computing Institute (RENCI) http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On 1/27/12 5:21 PM, Eric Firing wrote:
On 01/27/2012 11:18 AM, Howard wrote:
Hi all
I am a fairly recent convert to python and I have got a question that's got me stumped. I hope this is the right mailing list: here goes :)
I am reading some time series data out of a netcdf file a single timestep at a time. If the data is NaN, I want to reset it to the minimum of the dataset over all timesteps (which I already know). The data is in a variable of type numpy.ma.core.MaskedArray called modelData.
If I do this:
for i in range(len(modelData)): if math.isnan(modelData[i]): modelData[i] = dataMin
I get the effect I want, If I do this:
modelData[np.isnan(modelData)] = dataMin
it doesn't seem to be working. Of course I could just do the first one, but len(modelData) is about 3.5 million, and it's taking about 20 seconds to run. This is happening inside of a rendering loop, so I'd like it to be as fast as possible, and I thought the second one might be faster, and maybe it is, but it doesn't seem to be working! :)
It would help if you would say explicitly what you mean by "doesn't seem to be working", ideally by providing a minimal complete example illustrating the problem.
Hi Eric
Thanks for the reply. Yes, I can be a little more specific about the issue. I am reading data from a storm surge model out of a NetCDF file so I can render it with tricontourf. The model data has both a triangulation and a set of lat, lon points that are invariant for the entire model run, as well as data for each time step. As the model runs, triangles in the coastal plain wet and dry: the dry values are indicated by NaN values in the data and should not be rendered. Those I mask off previous to this code. I have found, in using tricontourf, that in the mapping from data values to color values, the range of the data seems to include even the data from the masked triangles. This causes the data to be either monochromatic or bi-chromatic (the high and low colors in the map). However, once the triangles are masked, if I set the corresponding data values to the known dataMin (or in fact, any value in the valid data range) the render proceeds correctly. So in the case of the first piece of code, I get reasonable images: using the second I do not.
Does modelData have masked values that you want to keep separate from your NaN values? If not, you can do this:
No I don't think so.
y = np.ma.masked_invalid(modelData).filled(dataMin)
Then y will be an ordinary ndarray. If this is not satisfactory because you need to keep separate some initially masked values, then you may need to save the initial mask and use it to turn y back into a masked array.
You may be running into trouble with your initial approach because using np.isnan on a masked array is giving a masked array, and I think trying to index with a masked array is not advised.
This could certainly be be the issue. I will look into this Monday.
Thanks very much for taking the time to reply. Howard
In [2]: np.isnan(np.ma.array([1.0, np.nan, 2.0], mask=[False, False, True])) Out[2]: masked_array(data = [False True --], mask = [False False True], fill_value = True)
Eric
Any ideas would be much appreciated.
Thanks Howard
-- Howard Landermailto:howard@renci.org Senior Research Software Developer Renaissance Computing Institute (RENCI)http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Fri, Jan 27, 2012 at 4:37 PM, Howard howard@renci.org wrote:
I have found, in using tricontourf, that in the mapping from data values to color values, the range of the data seems to include even the data from the masked triangles. This causes the data to be either monochromatic or bi-chromatic (the high and low colors in the map). However, once the triangles are masked, if I set the corresponding data values to the known dataMin (or in fact, any value in the valid data range) the render proceeds correctly. So in the case of the first piece of code, I get reasonable images: using the second I do not.
This sounds like a bug in tricontourf. It should not be doing that. If you could report it to the matplotlib-devel list with an example demonstrating your problem, I can see to it that it gets resolved.
Ben Root

Eric's probably right and it's indexing with a masked array that's causing you trouble. Since you seem to say your NaN values correspond to your mask, you should be able to simply do:
modelData[modeData.mask] = dataMin
Note that in further processing it may then make more sense to remove the mask, since your array is now full with valid data: modelData = modelData.data
-=- Olivier
Le 27 janvier 2012 17:37, Howard howard@renci.org a écrit :
On 1/27/12 5:21 PM, Eric Firing wrote:
On 01/27/2012 11:18 AM, Howard wrote:
Hi all
I am a fairly recent convert to python and I have got a question that's got me stumped. I hope this is the right mailing list: here goes :)
I am reading some time series data out of a netcdf file a single timestep at a time. If the data is NaN, I want to reset it to the minimum of the dataset over all timesteps (which I already know). The data is in a variable of type numpy.ma.core.MaskedArray called modelData.
If I do this:
for i in range(len(modelData)): if math.isnan(modelData[i]): modelData[i] = dataMin
I get the effect I want, If I do this:
modelData[np.isnan(modelData)] = dataMin
it doesn't seem to be working. Of course I could just do the first one, but len(modelData) is about 3.5 million, and it's taking about 20 seconds to run. This is happening inside of a rendering loop, so I'd like it to be as fast as possible, and I thought the second one might be faster, and maybe it is, but it doesn't seem to be working! :)
It would help if you would say explicitly what you mean by "doesn't seem to be working", ideally by providing a minimal complete example illustrating the problem.
Hi Eric
Thanks for the reply. Yes, I can be a little more specific about the issue. I am reading data from a storm surge model out of a NetCDF file so I can render it with tricontourf. The model data has both a triangulation and a set of lat, lon points that are invariant for the entire model run, as well as data for each time step. As the model runs, triangles in the coastal plain wet and dry: the dry values are indicated by NaN values in the data and should not be rendered. Those I mask off previous to this code. I have found, in using tricontourf, that in the mapping from data values to color values, the range of the data seems to include even the data from the masked triangles. This causes the data to be either monochromatic or bi-chromatic (the high and low colors in the map). However, once the triangles are masked, if I set the corresponding data values to the known dataMin (or in fact, any value in the valid data range) the render proceeds correctly. So in the case of the first piece of code, I get reasonable images: using the second I do not.
Does modelData have masked values that you want to keep separate from your NaN values? If not, you can do this:
No I don't think so.
y = np.ma.masked_invalid(modelData).filled(dataMin)
Then y will be an ordinary ndarray. If this is not satisfactory because you need to keep separate some initially masked values, then you may need to save the initial mask and use it to turn y back into a masked array.
You may be running into trouble with your initial approach because using np.isnan on a masked array is giving a masked array, and I think trying to index with a masked array is not advised.
This could certainly be be the issue. I will look into this Monday.
Thanks very much for taking the time to reply. Howard
In [2]: np.isnan(np.ma.array([1.0, np.nan, 2.0], mask=[False, False, True])) Out[2]: masked_array(data = [False True --], mask = [False False True], fill_value = True)
Eric
Any ideas would be much appreciated.
Thanks Howard
-- Howard Lander mailto:howard@renci.org howard@renci.org Senior Research Software Developer Renaissance Computing Institute (RENCI) http://www.renci.org http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651
NumPy-Discussion mailing listNumPy-Discussion@scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing listNumPy-Discussion@scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Howard Lander howard@renci.org
Senior Research Software Developer Renaissance Computing Institute (RENCI) http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (4)
-
Benjamin Root
-
Eric Firing
-
Howard
-
Olivier Delalleau