[Numpy-discussion] Subclassing ma.masked_array, code broken after version 1.9

Sebastian Berg sebastian at sipsolutions.net
Mon Feb 15 13:31:48 EST 2016


On Mo, 2016-02-15 at 17:06 +0000, Gutenkunst, Ryan N - (rgutenk) wrote:
> Thank Jonathan,
> 
> Good to confirm this isn't something inappropriate I'm doing. I give
> up transparency here in my application, so I'll just work around it.
> I leave it up to wiser numpy heads as to whether it's worth altering
> these numpy.ma functions to enable subclassing.
> 

Frankly, when it comes to masked array stuff, at least I am puzzled
most of the time, so input is very welcome.
Most of the people currently contributing, barely use masked arrays as
far as I know, and sometimes it is hard to make good calls. It is a not
the easiest code base and any feedback or nudging is important. A new
release is about to come out, and if you feel it there is a serious
regression, we may want to push for fixing it (or even better, you may
have time to suggest a fix yourself).

- Sebastian


> Best,
> Ryan
> 
> On Feb 13, 2016, at 11:48 AM, Jonathan Helmus <jjhelmus at gmail.com>
> wrote:
> 
> > 
> > 
> > On 2/12/16 6:06 PM, Gutenkunst, Ryan N - (rgutenk) wrote:
> > > Hello all,
> > > 
> > > In 2009 I developed an application that uses a subclass of masked
> > > arrays as a central data object. My subclass Spectrum possesses
> > > additional attributes along with many custom methods. It was very
> > > convenient to be able to use standard numpy functions for doing
> > > arithmetic on these objects. However, my code broke with numpy
> > > 1.10. I've finally had a chance to track down the problem, and I
> > > am hoping someone can suggest a workaround.
> > > 
> > > See below for an example, which is as minimal as I could concoct.
> > > In this case, I have a Spectrum object that I'd like to take the
> > > logarithm of using numpy.ma.log, while preserving the value of
> > > the "folded" attribute. Up to numpy 1.9, this worked as expected,
> > > but in numpy 1.10 and 1.11 the attribute is not preserved.
> > > 
> > > The change in behavior appears to be driven by a commit made on
> > > Jun 16th, 2015 by Marten van Kerkwijk. In particular, the commit
> > > changed _MaskedUnaryOperation.__call__ so that the result array's
> > > update_from method is no longer called with the input array as
> > > the argument, but rather the result of the numpy UnaryOperation
> > > (old line 889, new line 885). Because that UnaryOperation doesn't
> > > carry my new attribute, it's not present for update_from to
> > > access. I notice that similar changes were made to
> > > MaskedBinaryOperation, although I haven't tested those. It's not
> > > clear to me from the commit message why this particular change
> > > was made, so I don't know whether this new behavior is
> > > intentional.
> > > 
> > > I know that subclassing arrays isn't widely encouraged, but it
> > > has been very convenient in my code. Is it still possible to
> > > subclass masked_array in such a way that functions like
> > > numpy.ma.log preserve additional attributes? If so, can someone
> > > point me in the right direction?
> > > 
> > > Thanks!
> > > Ryan
> > > 
> > > *** Begin example
> > > 
> > > import numpy
> > > print 'Working with numpy {0}'.format(numpy.__version__)
> > > 
> > > class Spectrum(numpy.ma.masked_array):
> > >     def __new__(cls, data, mask=numpy.ma.nomask,
> > > data_folded=None):
> > >         subarr = numpy.ma.masked_array(data, mask=mask,
> > > keep_mask=True,
> > >                                        shrink=True)
> > >         subarr = subarr.view(cls)
> > >         subarr.folded = data_folded
> > > 
> > >         return subarr
> > > 
> > >     def __array_finalize__(self, obj):
> > >         if obj is None:
> > >             return
> > >         numpy.ma.masked_array.__array_finalize__(self, obj)
> > >         self.folded = getattr(obj, 'folded', 'unspecified')
> > > 
> > >     def _update_from(self, obj):
> > >         print('Input to update_from: {0}'.format(repr(obj)))
> > >         numpy.ma.masked_array._update_from(self, obj)
> > >         self.folded = getattr(obj, 'folded', 'unspecified')
> > > 
> > >     def __repr__(self):
> > >         return 'Spectrum(%s, folded=%s)'\
> > >                 % (str(self), str(self.folded))
> > > 
> > > fs1 = Spectrum([2,3,4.], data_folded=True)
> > > fs2 = numpy.ma.log(fs1)
> > > print('fs2.folded status: {0}'.format(fs2.folded))
> > > print('Expectation is True, achieved with numpy 1.9')
> > > 
> > > *** End example
> > > 
> > > --
> > > Ryan Gutenkunst
> > > Assistant Professor
> > > Molecular and Cellular Biology
> > > University of Arizona
> > > phone: (520) 626-0569, office LSS 325
> > > http://gutengroup.mcb.arizona.edu
> > > Latest paper: "Computationally efficient composite likelihood
> > > statistics for demographic inference"
> > > Molecular Biology and Evolution; 
> > > http://dx.doi.org/10.1093/molbev/msv255
> > Ryan,
> > 
> > I'm not sure if you will be able to get this to work as in NumPy
> > 1.9, but the __array_wrap__ method is intended to be the mechanism
> > for subclasses to set their return type, adjust metadata, etc [1]. 
> >  Unfortunately, the numpy.ma.log function does not seem to make a
> > call to  __array_wrap__ (at least in NumPy 1.10.2) although
> > numpy.log does:
> > 
> > from __future__ import print_function
> > import numpy
> > print('Working with numpy {0}'.format(numpy.__version__))
> > 
> > 
> > class Spectrum(numpy.ma.masked_array):
> >    def __new__(cls, data, mask=numpy.ma.nomask, data_folded=None):
> >        subarr = numpy.ma.masked_array(data, mask=mask,
> > keep_mask=True,
> >                                       shrink=True)
> >        subarr = subarr.view(cls)
> >        subarr.folded = data_folded
> > 
> >        return subarr
> > 
> >    def __array_finalize__(self, obj):
> >        if obj is None:
> >            return
> >        numpy.ma.masked_array.__array_finalize__(self, obj)
> >        self.folded = getattr(obj, 'folded', 'unspecified')
> > 
> >    def __array_wrap__(self, out_arr, context=None):
> >        print('__array_wrap__ called')
> >        return numpy.ndarray.__array_wrap__(self, out_arr, context)
> > 
> >    def __repr__(self):
> >        return 'Spectrum(%s, folded=%s)'\
> >                % (str(self), str(self.folded))
> > 
> > fs1 = Spectrum([2,3,4.], data_folded=True)
> > 
> > print('numpy.ma.log:')
> > fs2 = numpy.ma.log(fs1)
> > print('fs2 type:', type(fs2))
> > print('fs2.folded status: {0}'.format(fs2.folded))
> > 
> > print('numpy.log:')
> > fs3 = numpy.log(fs1)
> > print('fs3 type:', type(fs3))
> > print('fs3.folded status: {0}'.format(fs3.folded))
> > 
> > ----
> > $ python example.py
> > Working with numpy 1.10.2
> > numpy.ma.log:
> > fs2 type: <class '__main__.Spectrum'>
> > fs2.folded status: unspecified
> > numpy.log:
> > __array_wrap__ called
> > fs3 type: <class '__main__.Spectrum'>
> > fs3.folded status: True
> > 
> > 
> > The change mentioned in the original message was made in pull
> > request 3907 [2] in case anyone wants to have a look.
> > 
> > Cheers,
> > 
> >    - Jonathan Helmus
> > 
> > [1] http://docs.scipy.org/doc/numpy-1.10.1/user/basics.subclassing.
> > html#array-wrap-for-ufuncs
> > [2] https://github.com/numpy/numpy/pull/3907
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> --
> Ryan Gutenkunst
> Assistant Professor
> Molecular and Cellular Biology
> University of Arizona
> phone: (520) 626-0569, office LSS 325
> http://gutengroup.mcb.arizona.edu
> Latest paper: "Computationally efficient composite likelihood
> statistics for demographic inference"
> Molecular Biology and Evolution; 
> http://dx.doi.org/10.1093/molbev/msv255
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160215/6b7cdb08/attachment.sig>


More information about the NumPy-Discussion mailing list