Hi all
I am a fairly recent convert to python and I have got a question that's got me stumped. I hope this is the right mailing list: here goes :)
I am reading some time series data out of a netcdf file a single timestep at a time. If the data is NaN, I want to reset it to the minimum of the dataset over all timesteps (which I already know). The data is in a variable of type numpy.ma.core.MaskedArray called modelData.
If I do this:
for i in range(len(modelData)): if math.isnan(modelData[i]): modelData[i] = dataMin
I get the effect I want, If I do this:
modelData[np.isnan(modelData)] = dataMin
it doesn't seem to be working. Of course I could just do the first one, but len(modelData) is about 3.5 million, and it's taking about 20 seconds to run. This is happening inside of a rendering loop, so I'd like it to be as fast as possible, and I thought the second one might be faster, and maybe it is, but it doesn't seem to be working! :)
Any ideas would be much appreciated.
Thanks Howard
What are the types and shapes of modelData and dataMin? (it works for me with modelData a (3, 4) numpy array and dataMin a Python float, with numpy 1.6.1)
= Olivier
2012/1/27 Howard howard@renci.org
Hi all
I am a fairly recent convert to python and I have got a question that's got me stumped. I hope this is the right mailing list: here goes :)
I am reading some time series data out of a netcdf file a single timestep at a time. If the data is NaN, I want to reset it to the minimum of the dataset over all timesteps (which I already know). The data is in a variable of type numpy.ma.core.MaskedArray called modelData.
If I do this:
for i in range(len(modelData)): if math.isnan(modelData[i]): modelData[i] = dataMin
I get the effect I want, If I do this:
modelData[np.isnan(modelData)] = dataMin
it doesn't seem to be working. Of course I could just do the first one, but len(modelData) is about 3.5 million, and it's taking about 20 seconds to run. This is happening inside of a rendering loop, so I'd like it to be as fast as possible, and I thought the second one might be faster, and maybe it is, but it doesn't seem to be working! :)
Any ideas would be much appreciated.
Thanks Howard
 Howard Lander howard@renci.org Senior Research Software Developer Renaissance Computing Institute (RENCI) http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 9194459651
NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
Hi Olivier
I added this to the code:
print "modelData:", type(modelData), modelData.shape, modelData.size print "dataMin:", type(dataMin)
and got
modelData: <class 'numpy.ma.core.MaskedArray'> (1767734,) 1767734 dataMin: <type 'float'>
What's funny is I tried the example from
http://docs.scipy.org/doc/numpy1.6.0/numpyuser.pdf
and it works fine for me. Maybe 1.7 million is over some threshhold?
Thanks Howard
myarr = np.ma.core.MaskedArray([1., 0., np.nan, 3.]) myarr[np.isnan(myarr)] = 30 myarr
masked_array(data = [ 1. 0. 30. 3.], mask = False, fill_value = 1e+20)
On 1/27/12 4:42 PM, Olivier Delalleau wrote:
What are the types and shapes of modelData and dataMin? (it works for me with modelData a (3, 4) numpy array and dataMin a Python float, with numpy 1.6.1)
= Olivier
2012/1/27 Howard <howard@renci.org mailto:howard@renci.org>
Hi all I am a fairly recent convert to python and I have got a question that's got me stumped. I hope this is the right mailing list: here goes :) I am reading some time series data out of a netcdf file a single timestep at a time. If the data is NaN, I want to reset it to the minimum of the dataset over all timesteps (which I already know). The data is in a variable of type numpy.ma.core.MaskedArray called modelData. If I do this: for i in range(len(modelData)): if math.isnan(modelData[i]): modelData[i] = dataMin I get the effect I want, If I do this: modelData[np.isnan(modelData)] = dataMin it doesn't seem to be working. Of course I could just do the first one, but len(modelData) is about 3.5 million, and it's taking about 20 seconds to run. This is happening inside of a rendering loop, so I'd like it to be as fast as possible, and I thought the second one might be faster, and maybe it is, but it doesn't seem to be working! :) Any ideas would be much appreciated. Thanks Howard  Howard Lander <mailto:howard@renci.org> Senior Research Software Developer Renaissance Computing Institute (RENCI) <http://www.renci.org> The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 9194459651 _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org <mailto:NumPyDiscussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpydiscussion
Oh, one other thing I should mention:
I did the install of numpy yesterday and I also have 1.6.1
Howard
On 1/27/12 4:54 PM, Howard wrote:
Hi Olivier
I added this to the code:
print "modelData:", type(modelData), modelData.shape, modelData.size print "dataMin:", type(dataMin)
and got
modelData: <class 'numpy.ma.core.MaskedArray'> (1767734,) 1767734 dataMin: <type 'float'>
What's funny is I tried the example from
http://docs.scipy.org/doc/numpy1.6.0/numpyuser.pdf
and it works fine for me. Maybe 1.7 million is over some threshhold?
Thanks Howard
myarr = np.ma.core.MaskedArray([1., 0., np.nan, 3.]) myarr[np.isnan(myarr)] = 30 myarr
masked_array(data = [ 1. 0. 30. 3.], mask = False, fill_value = 1e+20)
On 1/27/12 4:42 PM, Olivier Delalleau wrote:
What are the types and shapes of modelData and dataMin? (it works for me with modelData a (3, 4) numpy array and dataMin a Python float, with numpy 1.6.1)
= Olivier
2012/1/27 Howard <howard@renci.org mailto:howard@renci.org>
Hi all I am a fairly recent convert to python and I have got a question that's got me stumped. I hope this is the right mailing list: here goes :) I am reading some time series data out of a netcdf file a single timestep at a time. If the data is NaN, I want to reset it to the minimum of the dataset over all timesteps (which I already know). The data is in a variable of type numpy.ma.core.MaskedArray called modelData. If I do this: for i in range(len(modelData)): if math.isnan(modelData[i]): modelData[i] = dataMin I get the effect I want, If I do this: modelData[np.isnan(modelData)] = dataMin it doesn't seem to be working. Of course I could just do the first one, but len(modelData) is about 3.5 million, and it's taking about 20 seconds to run. This is happening inside of a rendering loop, so I'd like it to be as fast as possible, and I thought the second one might be faster, and maybe it is, but it doesn't seem to be working! :) Any ideas would be much appreciated. Thanks Howard  Howard Lander <mailto:howard@renci.org> Senior Research Software Developer Renaissance Computing Institute (RENCI) <http://www.renci.org> The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 9194459651 _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org <mailto:NumPyDiscussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpydiscussion
 Howard Lander mailto:howard@renci.org Senior Research Software Developer Renaissance Computing Institute (RENCI) http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 9194459651
On 01/27/2012 11:18 AM, Howard wrote:
Hi all
I am a fairly recent convert to python and I have got a question that's got me stumped. I hope this is the right mailing list: here goes :)
I am reading some time series data out of a netcdf file a single timestep at a time. If the data is NaN, I want to reset it to the minimum of the dataset over all timesteps (which I already know). The data is in a variable of type numpy.ma.core.MaskedArray called modelData.
If I do this:
for i in range(len(modelData)): if math.isnan(modelData[i]): modelData[i] = dataMin
I get the effect I want, If I do this:
modelData[np.isnan(modelData)] = dataMin
it doesn't seem to be working. Of course I could just do the first one, but len(modelData) is about 3.5 million, and it's taking about 20 seconds to run. This is happening inside of a rendering loop, so I'd like it to be as fast as possible, and I thought the second one might be faster, and maybe it is, but it doesn't seem to be working! :)
It would help if you would say explicitly what you mean by "doesn't seem to be working", ideally by providing a minimal complete example illustrating the problem.
Does modelData have masked values that you want to keep separate from your NaN values? If not, you can do this:
y = np.ma.masked_invalid(modelData).filled(dataMin)
Then y will be an ordinary ndarray. If this is not satisfactory because you need to keep separate some initially masked values, then you may need to save the initial mask and use it to turn y back into a masked array.
You may be running into trouble with your initial approach because using np.isnan on a masked array is giving a masked array, and I think trying to index with a masked array is not advised.
In [2]: np.isnan(np.ma.array([1.0, np.nan, 2.0], mask=[False, False, True])) Out[2]: masked_array(data = [False True ], mask = [False False True], fill_value = True)
Eric
Any ideas would be much appreciated.
Thanks Howard
 Howard Lander mailto:howard@renci.org Senior Research Software Developer Renaissance Computing Institute (RENCI) http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 9194459651
NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
On 1/27/12 5:21 PM, Eric Firing wrote:
On 01/27/2012 11:18 AM, Howard wrote:
Hi all
I am a fairly recent convert to python and I have got a question that's got me stumped. I hope this is the right mailing list: here goes :)
I am reading some time series data out of a netcdf file a single timestep at a time. If the data is NaN, I want to reset it to the minimum of the dataset over all timesteps (which I already know). The data is in a variable of type numpy.ma.core.MaskedArray called modelData.
If I do this:
for i in range(len(modelData)): if math.isnan(modelData[i]): modelData[i] = dataMin
I get the effect I want, If I do this:
modelData[np.isnan(modelData)] = dataMin
it doesn't seem to be working. Of course I could just do the first one, but len(modelData) is about 3.5 million, and it's taking about 20 seconds to run. This is happening inside of a rendering loop, so I'd like it to be as fast as possible, and I thought the second one might be faster, and maybe it is, but it doesn't seem to be working! :)
It would help if you would say explicitly what you mean by "doesn't seem to be working", ideally by providing a minimal complete example illustrating the problem.
Hi Eric
Thanks for the reply. Yes, I can be a little more specific about the issue. I am reading data from a storm surge model out of a NetCDF file so I can render it with tricontourf. The model data has both a triangulation and a set of lat, lon points that are invariant for the entire model run, as well as data for each time step. As the model runs, triangles in the coastal plain wet and dry: the dry values are indicated by NaN values in the data and should not be rendered. Those I mask off previous to this code. I have found, in using tricontourf, that in the mapping from data values to color values, the range of the data seems to include even the data from the masked triangles. This causes the data to be either monochromatic or bichromatic (the high and low colors in the map). However, once the triangles are masked, if I set the corresponding data values to the known dataMin (or in fact, any value in the valid data range) the render proceeds correctly. So in the case of the first piece of code, I get reasonable images: using the second I do not.
Does modelData have masked values that you want to keep separate from your NaN values? If not, you can do this:
No I don't think so.
y = np.ma.masked_invalid(modelData).filled(dataMin)
Then y will be an ordinary ndarray. If this is not satisfactory because you need to keep separate some initially masked values, then you may need to save the initial mask and use it to turn y back into a masked array.
You may be running into trouble with your initial approach because using np.isnan on a masked array is giving a masked array, and I think trying to index with a masked array is not advised.
This could certainly be be the issue. I will look into this Monday.
Thanks very much for taking the time to reply. Howard
In [2]: np.isnan(np.ma.array([1.0, np.nan, 2.0], mask=[False, False, True])) Out[2]: masked_array(data = [False True ], mask = [False False True], fill_value = True)
Eric
Any ideas would be much appreciated.
Thanks Howard
 Howard Landermailto:howard@renci.org Senior Research Software Developer Renaissance Computing Institute (RENCI)http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 9194459651
NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
On Fri, Jan 27, 2012 at 4:37 PM, Howard howard@renci.org wrote:
I have found, in using tricontourf, that in the mapping from data values to color values, the range of the data seems to include even the data from the masked triangles. This causes the data to be either monochromatic or bichromatic (the high and low colors in the map). However, once the triangles are masked, if I set the corresponding data values to the known dataMin (or in fact, any value in the valid data range) the render proceeds correctly. So in the case of the first piece of code, I get reasonable images: using the second I do not.
This sounds like a bug in tricontourf. It should not be doing that. If you could report it to the matplotlibdevel list with an example demonstrating your problem, I can see to it that it gets resolved.
Ben Root
Eric's probably right and it's indexing with a masked array that's causing you trouble. Since you seem to say your NaN values correspond to your mask, you should be able to simply do:
modelData[modeData.mask] = dataMin
Note that in further processing it may then make more sense to remove the mask, since your array is now full with valid data: modelData = modelData.data
= Olivier
Le 27 janvier 2012 17:37, Howard howard@renci.org a écrit :
On 1/27/12 5:21 PM, Eric Firing wrote:
On 01/27/2012 11:18 AM, Howard wrote:
Hi all
I am a fairly recent convert to python and I have got a question that's got me stumped. I hope this is the right mailing list: here goes :)
I am reading some time series data out of a netcdf file a single timestep at a time. If the data is NaN, I want to reset it to the minimum of the dataset over all timesteps (which I already know). The data is in a variable of type numpy.ma.core.MaskedArray called modelData.
If I do this:
for i in range(len(modelData)): if math.isnan(modelData[i]): modelData[i] = dataMin
I get the effect I want, If I do this:
modelData[np.isnan(modelData)] = dataMin
it doesn't seem to be working. Of course I could just do the first one, but len(modelData) is about 3.5 million, and it's taking about 20 seconds to run. This is happening inside of a rendering loop, so I'd like it to be as fast as possible, and I thought the second one might be faster, and maybe it is, but it doesn't seem to be working! :)
It would help if you would say explicitly what you mean by "doesn't seem to be working", ideally by providing a minimal complete example illustrating the problem.
Hi Eric
Thanks for the reply. Yes, I can be a little more specific about the issue. I am reading data from a storm surge model out of a NetCDF file so I can render it with tricontourf. The model data has both a triangulation and a set of lat, lon points that are invariant for the entire model run, as well as data for each time step. As the model runs, triangles in the coastal plain wet and dry: the dry values are indicated by NaN values in the data and should not be rendered. Those I mask off previous to this code. I have found, in using tricontourf, that in the mapping from data values to color values, the range of the data seems to include even the data from the masked triangles. This causes the data to be either monochromatic or bichromatic (the high and low colors in the map). However, once the triangles are masked, if I set the corresponding data values to the known dataMin (or in fact, any value in the valid data range) the render proceeds correctly. So in the case of the first piece of code, I get reasonable images: using the second I do not.
Does modelData have masked values that you want to keep separate from your NaN values? If not, you can do this:
No I don't think so.
y = np.ma.masked_invalid(modelData).filled(dataMin)
Then y will be an ordinary ndarray. If this is not satisfactory because you need to keep separate some initially masked values, then you may need to save the initial mask and use it to turn y back into a masked array.
You may be running into trouble with your initial approach because using np.isnan on a masked array is giving a masked array, and I think trying to index with a masked array is not advised.
This could certainly be be the issue. I will look into this Monday.
Thanks very much for taking the time to reply. Howard
In [2]: np.isnan(np.ma.array([1.0, np.nan, 2.0], mask=[False, False, True])) Out[2]: masked_array(data = [False True ], mask = [False False True], fill_value = True)
Eric
Any ideas would be much appreciated.
Thanks Howard
 Howard Lander mailto:howard@renci.org howard@renci.org Senior Research Software Developer Renaissance Computing Institute (RENCI) http://www.renci.org http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 9194459651
NumPyDiscussion mailing listNumPyDiscussion@scipy.orghttp://mail.scipy.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing listNumPyDiscussion@scipy.orghttp://mail.scipy.org/mailman/listinfo/numpydiscussion
 Howard Lander howard@renci.org
Senior Research Software Developer Renaissance Computing Institute (RENCI) http://www.renci.org The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 9194459651
NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
participants (4)

Benjamin Root

Eric Firing

Howard

Olivier Delalleau