Why does np.nan{min, max} clobber my array mask?
I'm just starting to work with masked arrays and I've found some behavior that definitely does not follow the Principle of Least Surprise: I've generated a 2-d array from a list of lists, where the elements are floats with a good number of NaNs. Inspections shows the expected numbers for ma.count() and ma.count_masked(). However, as soon as I run np.nanmin() or np.nanmax() over it, all of the mask elements are reset to False. (Pdb) flat = flatten(uut) # my own utility function (Pdb) len ( [ x for x in flat if x+0 == x ] ) # only way I could figure to detect 4086 (Pdb) len ( [ x for x in flat if x+0 != x ] ) # 1458 NaNs in the set. 1458 (Pdb) msk = ma.masked_invalid(uut) (Pdb) msk.shape (99, 56) (Pdb) ma.count(msk) 4086 (Pdb) ma.count_masked(msk) 1458 (Pdb) msk.hardmask False (Pdb) msk.harden_mask() # harden the mask first, for demo masked_array(data =.... (Pdb) msk.hardmask True (Pdb) rslt_hm = np.nanmin(msk, axis=1) (Pdb) rslt_hm.shape (99,) (Pdb) ma.count_masked(rslt_hm) 0 (Pdb) ma.count(rslt_hm) 99 # Is my original still OK? msk masked_array(data = ... ... [False False False ..., True True True]], fill_value = 1e+20) (Pdb) msk.soften_mask() # now re-soften the mask: masked_array(data = .... (Pdb) rslt_softmask = np.nanmin(msk, axis=1) (Pdb) rslt_softmask.shape (99,) (Pdb) msk.mask.any() False # BAM! note: 'control' is a hardmasked control copy: (Pdb) control.mask.any() True As the above shows, I discovered that I can work around this by setting the hardmask property, but ... there is no mention of such a side-effect in the docs (including the brand-new reference book). Have I found a bug? This is 1.4.0 running under 64-bit Windows 7 ( Python(x,y) distribution).
On Feb 13, 2010, at 10:04 PM, David Carmean wrote:
I'm just starting to work with masked arrays and I've found some behavior that definitely does not follow the Principle of Least Surprise:
A fuzzy concept ;)
I've generated a 2-d array from a list of lists, where the elements are floats with a good number of NaNs. Inspections shows the expected numbers for ma.count() and ma.count_masked().
However, as soon as I run np.nanmin() or np.nanmax() over it, all of the mask elements are reset to False.
I'm sorry, I can't follow you. Can you post a simpler self-contained example I can play with ? Why using np.nanmin/max ? These functions are designed for ndarrays, to avoid using a masked array: can't you just use min/max on the masked array ?
On Sun, Feb 14, 2010 at 03:22:04PM -0500, Pierre GM wrote:
I'm sorry, I can't follow you. Can you post a simpler self-contained example I can play with ? Why using np.nanmin/max ? These functions are designed for ndarrays, to avoid using a masked array: can't you just use min/max on the masked array ?
I was using np.nanmin/max because I did not yet understand how masked arrays worked; perhaps the docs for those methods need a note indicating that "If you can take the (small?) memory hit, use Masked Arrays instead". Now that I know different... I'm going to drop it unless you reall want to dig into it.
On Feb 15, 2010, at 8:51 PM, David Carmean wrote:
On Sun, Feb 14, 2010 at 03:22:04PM -0500, Pierre GM wrote:
I'm sorry, I can't follow you. Can you post a simpler self-contained example I can play with ? Why using np.nanmin/max ? These functions are designed for ndarrays, to avoid using a masked array: can't you just use min/max on the masked array ?
I was using np.nanmin/max because I did not yet understand how masked arrays worked; perhaps the docs for those methods need a note indicating that "If you can take the (small?) memory hit, use Masked Arrays instead". Now that I know different... I'm going to drop it unless you reall want to dig into it.
I'm curious. Can you post an excerpt of your array, so that I can check what goes wrong?
On Mon, Feb 15, 2010 at 8:35 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
On Feb 15, 2010, at 8:51 PM, David Carmean wrote:
On Sun, Feb 14, 2010 at 03:22:04PM -0500, Pierre GM wrote:
I'm sorry, I can't follow you. Can you post a simpler self-contained example I can play with ? Why using np.nanmin/max ? These functions are designed for ndarrays, to avoid using a masked array: can't you just use min/max on the masked array ?
I was using np.nanmin/max because I did not yet understand how masked arrays worked; perhaps the docs for those methods need a note indicating that "If you can take the (small?) memory hit, use Masked Arrays instead". Now that I know different... I'm going to drop it unless you reall want to dig into it.
I'm curious. Can you post an excerpt of your array, so that I can check what goes wrong?
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi, David, please file a bug report. I think is occurs with np.nansum, np.nanmin and np.nanmax. Perhaps some thing with the C99 changes as I think it exists with numpy 1.3. I think this code shows the problem with Linux and recent numpy svn: import numpy as np uut = np.array([[2, 1, 3, np.nan], [5, 2, 3, np.nan]]) msk = np.ma.masked_invalid(uut) msk np.nanmin(msk, axis=1) msk $ python Python 2.6 (r26:66714, Nov 3 2009, 17:33:18) [GCC 4.4.1 20090725 (Red Hat 4.4.1-2)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import numpy as np uut = np.array([[2, 1, 3, np.nan], [5, 2, 3, np.nan]]) msk = np.ma.masked_invalid(uut) msk masked_array(data = [[2.0 1.0 3.0 --] [5.0 2.0 3.0 --]], mask = [[False False False True] [False False False True]], fill_value = 1e+20)
np.nanmin(msk, axis=1) masked_array(data = [1.0 2.0], mask = [False False], fill_value = 1e+20)
msk masked_array(data = [[2.0 1.0 3.0 nan] [5.0 2.0 3.0 nan]], mask = [[False False False False] [False False False False]], fill_value = 1e+20)
Bruce
On Mon, Feb 15, 2010 at 9:24 PM, Bruce Southey <bsouthey@gmail.com> wrote:
On Mon, Feb 15, 2010 at 8:35 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
On Feb 15, 2010, at 8:51 PM, David Carmean wrote:
On Sun, Feb 14, 2010 at 03:22:04PM -0500, Pierre GM wrote:
I'm sorry, I can't follow you. Can you post a simpler self-contained example I can play with ? Why using np.nanmin/max ? These functions are designed for ndarrays, to avoid using a masked array: can't you just use min/max on the masked array ?
I was using np.nanmin/max because I did not yet understand how masked arrays worked; perhaps the docs for those methods need a note indicating that "If you can take the (small?) memory hit, use Masked Arrays instead". Now that I know different... I'm going to drop it unless you reall want to dig into it.
I'm curious. Can you post an excerpt of your array, so that I can check what goes wrong?
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi, David, please file a bug report.
I think is occurs with np.nansum, np.nanmin and np.nanmax. Perhaps some thing with the C99 changes as I think it exists with numpy 1.3.
I think this code shows the problem with Linux and recent numpy svn:
import numpy as np uut = np.array([[2, 1, 3, np.nan], [5, 2, 3, np.nan]]) msk = np.ma.masked_invalid(uut) msk np.nanmin(msk, axis=1) msk
$ python Python 2.6 (r26:66714, Nov 3 2009, 17:33:18) [GCC 4.4.1 20090725 (Red Hat 4.4.1-2)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import numpy as np uut = np.array([[2, 1, 3, np.nan], [5, 2, 3, np.nan]]) msk = np.ma.masked_invalid(uut) msk masked_array(data = [[2.0 1.0 3.0 --] [5.0 2.0 3.0 --]], mask = [[False False False True] [False False False True]], fill_value = 1e+20)
np.nanmin(msk, axis=1) masked_array(data = [1.0 2.0], mask = [False False], fill_value = 1e+20)
msk masked_array(data = [[2.0 1.0 3.0 nan] [5.0 2.0 3.0 nan]], mask = [[False False False False] [False False False False]], fill_value = 1e+20)
Bruce
Hi, I filed this ticket and hopefully the provided code is sufficient for a test: http://projects.scipy.org/numpy/ticket/1421 The bug is with the _nanop function because nansum, nanmin, nanmax, nanargmin and nanargmax have the same issue. Bruce Bruce
participants (3)
-
Bruce Southey
-
David Carmean
-
Pierre GM