[Numpy-discussion] numpy.ma.MaskedArray.min() makes a copy?

Fri Sep 7 09:36:40 EDT 2012

An issue just reported on the matplotlib-users list involved a user who ran
out of memory while attempting to do an imshow() on a large array.  While
this wouldn't be totally unexpected, the user's traceback shows that they
ran out of memory before any actual building of the image occurred.  Memory
usage sky-rocketed when imshow() attempted to determine the min and max of
the image.  The input data was a masked array, and it appears that the
implementation of min() for masked arrays goes something like this
(paraphrasing here):

obj.filled(inf).min()

The idea is that any masked element is set to the largest possible value
for their dtype in a copied array of itself, and then a min() is performed
on that copied array.  I am assuming that max() does the same thing.

Can this be done differently/more efficiently?  If the "filled" approach
has to be done, maybe it would be a good idea to make the copy in chunks
instead of all at once?  Ideally, it would be nice to avoid the copying
altogether and utilize some of the special iterators that Mark Weibe
created last year.

Cheers!
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120907/b63c5cca/attachment.html>