numpy.ma.MaskedArray.min() makes a copy?
An issue just reported on the matplotlib-users list involved a user who ran out of memory while attempting to do an imshow() on a large array. While this wouldn't be totally unexpected, the user's traceback shows that they ran out of memory before any actual building of the image occurred. Memory usage sky-rocketed when imshow() attempted to determine the min and max of the image. The input data was a masked array, and it appears that the implementation of min() for masked arrays goes something like this (paraphrasing here): obj.filled(inf).min() The idea is that any masked element is set to the largest possible value for their dtype in a copied array of itself, and then a min() is performed on that copied array. I am assuming that max() does the same thing. Can this be done differently/more efficiently? If the "filled" approach has to be done, maybe it would be a good idea to make the copy in chunks instead of all at once? Ideally, it would be nice to avoid the copying altogether and utilize some of the special iterators that Mark Weibe created last year. Cheers! Ben Root
On 7 Sep 2012 14:38, "Benjamin Root" <ben.root@ou.edu> wrote:
An issue just reported on the matplotlib-users list involved a user who
ran out of memory while attempting to do an imshow() on a large array. While this wouldn't be totally unexpected, the user's traceback shows that they ran out of memory before any actual building of the image occurred. Memory usage sky-rocketed when imshow() attempted to determine the min and max of the image. The input data was a masked array, and it appears that the implementation of min() for masked arrays goes something like this (paraphrasing here):
obj.filled(inf).min()
The idea is that any masked element is set to the largest possible value
for their dtype in a copied array of itself, and then a min() is performed on that copied array. I am assuming that max() does the same thing.
Can this be done differently/more efficiently? If the "filled" approach
has to be done, maybe it would be a good idea to make the copy in chunks instead of all at once? Ideally, it would be nice to avoid the copying altogether and utilize some of the special iterators that Mark Weibe created last year. I think what you're looking for is where= support for ufunc.reduce. This isn't implemented yet but at least it's straightforward in principle... otherwise I don't know anything better than reimplementing .min() by hand. -n
On Fri, Sep 7, 2012 at 12:05 PM, Nathaniel Smith <njs@pobox.com> wrote:
On 7 Sep 2012 14:38, "Benjamin Root" <ben.root@ou.edu> wrote:
An issue just reported on the matplotlib-users list involved a user who
ran out of memory while attempting to do an imshow() on a large array. While this wouldn't be totally unexpected, the user's traceback shows that they ran out of memory before any actual building of the image occurred. Memory usage sky-rocketed when imshow() attempted to determine the min and max of the image. The input data was a masked array, and it appears that the implementation of min() for masked arrays goes something like this (paraphrasing here):
obj.filled(inf).min()
The idea is that any masked element is set to the largest possible value
for their dtype in a copied array of itself, and then a min() is performed on that copied array. I am assuming that max() does the same thing.
Can this be done differently/more efficiently? If the "filled" approach
has to be done, maybe it would be a good idea to make the copy in chunks instead of all at once? Ideally, it would be nice to avoid the copying altogether and utilize some of the special iterators that Mark Weibe created last year.
I think what you're looking for is where= support for ufunc.reduce. This isn't implemented yet but at least it's straightforward in principle... otherwise I don't know anything better than reimplementing .min() by hand.
-n
Yes, it was the where= support that I was thinking of. I take it that it was pulled out of the 1.7 branch with the rest of the NA stuff? Ben Root
On 2012/09/18 7:40 AM, Benjamin Root wrote:
On Fri, Sep 7, 2012 at 12:05 PM, Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote:
On 7 Sep 2012 14:38, "Benjamin Root" <ben.root@ou.edu <mailto:ben.root@ou.edu>> wrote: > > An issue just reported on the matplotlib-users list involved a user who ran out of memory while attempting to do an imshow() on a large array. While this wouldn't be totally unexpected, the user's traceback shows that they ran out of memory before any actual building of the image occurred. Memory usage sky-rocketed when imshow() attempted to determine the min and max of the image. The input data was a masked array, and it appears that the implementation of min() for masked arrays goes something like this (paraphrasing here): > > obj.filled(inf).min() > > The idea is that any masked element is set to the largest possible value for their dtype in a copied array of itself, and then a min() is performed on that copied array. I am assuming that max() does the same thing. > > Can this be done differently/more efficiently? If the "filled" approach has to be done, maybe it would be a good idea to make the copy in chunks instead of all at once? Ideally, it would be nice to avoid the copying altogether and utilize some of the special iterators that Mark Weibe created last year.
I think what you're looking for is where= support for ufunc.reduce. This isn't implemented yet but at least it's straightforward in principle... otherwise I don't know anything better than reimplementing .min() by hand.
-n
Yes, it was the where= support that I was thinking of. I take it that it was pulled out of the 1.7 branch with the rest of the NA stuff?
The where= support was left in: http://docs.scipy.org/doc/numpy/reference/ufuncs.html See also get_ufunc_arguments in ufunc_object.c. Eric
Ben Root
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, 2012-09-18 at 08:42 -1000, Eric Firing wrote:
On 2012/09/18 7:40 AM, Benjamin Root wrote:
On Fri, Sep 7, 2012 at 12:05 PM, Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote:
On 7 Sep 2012 14:38, "Benjamin Root" <ben.root@ou.edu <mailto:ben.root@ou.edu>> wrote: > > An issue just reported on the matplotlib-users list involved a user who ran out of memory while attempting to do an imshow() on a large array. While this wouldn't be totally unexpected, the user's traceback shows that they ran out of memory before any actual building of the image occurred. Memory usage sky-rocketed when imshow() attempted to determine the min and max of the image. The input data was a masked array, and it appears that the implementation of min() for masked arrays goes something like this (paraphrasing here): > > obj.filled(inf).min() > > The idea is that any masked element is set to the largest possible value for their dtype in a copied array of itself, and then a min() is performed on that copied array. I am assuming that max() does the same thing. > > Can this be done differently/more efficiently? If the "filled" approach has to be done, maybe it would be a good idea to make the copy in chunks instead of all at once? Ideally, it would be nice to avoid the copying altogether and utilize some of the special iterators that Mark Weibe created last year.
I think what you're looking for is where= support for ufunc.reduce. This isn't implemented yet but at least it's straightforward in principle... otherwise I don't know anything better than reimplementing .min() by hand.
-n
Yes, it was the where= support that I was thinking of. I take it that it was pulled out of the 1.7 branch with the rest of the NA stuff?
The where= support was left in: http://docs.scipy.org/doc/numpy/reference/ufuncs.html
It seems though that the keyword argument is still missing from the ufunc help (`help(np.add)` and individual `np.info(np.add)`) though.
See also get_ufunc_arguments in ufunc_object.c.
Eric
Ben Root
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 18 Sep 2012 18:40, "Benjamin Root" <ben.root@ou.edu> wrote:
On Fri, Sep 7, 2012 at 12:05 PM, Nathaniel Smith <njs@pobox.com> wrote:
On 7 Sep 2012 14:38, "Benjamin Root" <ben.root@ou.edu> wrote:
An issue just reported on the matplotlib-users list involved a user
obj.filled(inf).min()
The idea is that any masked element is set to the largest possible
value for their dtype in a copied array of itself, and then a min() is
who ran out of memory while attempting to do an imshow() on a large array. While this wouldn't be totally unexpected, the user's traceback shows that they ran out of memory before any actual building of the image occurred. Memory usage sky-rocketed when imshow() attempted to determine the min and max of the image. The input data was a masked array, and it appears that the implementation of min() for masked arrays goes something like this (paraphrasing here): performed on that copied array. I am assuming that max() does the same thing.
Can this be done differently/more efficiently? If the "filled"
approach has to be done, maybe it would be a good idea to make the copy in chunks instead of all at once? Ideally, it would be nice to avoid the copying altogether and utilize some of the special iterators that Mark Weibe created last year.
I think what you're looking for is where= support for ufunc.reduce. This isn't implemented yet but at least it's straightforward in principle... otherwise I don't know anything better than reimplementing .min() by hand.
-n
Yes, it was the where= support that I was thinking of. I take it that it was pulled out of the 1.7 branch with the rest of the NA stuff?
where= was left in, but it was only implemented for regular vectorized ufunc operations in the first place. Supporting it in reductions still needs to be written. -n
participants (4)
-
Benjamin Root -
Eric Firing -
Nathaniel Smith -
Sebastian Berg