[Numpy-discussion] numpy.trapz() doesn't respect subclass

Mon Mar 29 12:39:29 EDT 2010

On 03/29/2010 10:17 AM, Ryan May wrote:
> On Mon, Mar 29, 2010 at 8:00 AM, Bruce Southey<bsouthey at gmail.com>  wrote:
>    
>> On 03/27/2010 01:31 PM, Ryan May wrote:
>>      
>>> Because of the call to asarray(), the mask is completely discarded and
>>> you end up with identical results to an unmasked array,
>>> which is not what I'd expect.  Worse, the actual numeric value of the
>>> positions that were masked affect the final answer. My patch allows
>>> this to work as expected too.
>>>
>>>        
>> Actually you should assume that unless it is explicitly addressed
>> (either by code or via a test), any subclass of ndarray (matrix, masked,
>> structured, record and even sparse) may not provide a 'valid' answer.
>> There are probably many numpy functions that only really work with the
>> standard ndarray. Most of the time people do not meet these with the
>> subclasses or have workarounds so there has been little requirement to
>> address this especially due to the added overhead needed for checking.
>>      
> It's not that I'm surprised that masked arrays don't work. It's more
> that the calls to np.asarray within trapz() have been held up as being
> necessary for things like matrices and (at the time) masked arrays to
> work properly; as if calling asarray() is supposed to make all
> subclasses work, though at a base level by dropping to an ndarray. To
> me, the current behavior with masked arrays is worse than if passing
> in a matrix raised an exception.  One is a silently wrong answer, the
> other is a big error that the programmer can see, test, and fix.
>
>    
>> Also, any patch that does not explicitly define the assumed behavior
>> with points that are masked  has to be rejected. It is not even clear
>> what the expected behavior is for masked arrays should be:
>> Is it even valid for trapz to be integrating across the full range if
>> there are missing points? That implies some assumption about the missing
>> points.
>> If is valid, then should you just ignore the masked values or try to
>> predict the missing values first? Perhaps you may want to have the
>> option to do both.
>>      
> You're right, it doesn't actually work with MaskedArrays as it stand
> right now, because it calls add.reduce() directly instead of using the
> array.sum() method. Once fixed, by allowing MaskedArray to handle the
> operation, you end up not integrating over the masked region. Any
> operation involving masked points results in contributions by masked
> points are ignored.  I guess it's as if you assumed the function was 0
> over the masked region.  If you wanted to ignore the masked points,
> but integrate over the region (making a really big trapezoid over that
> region), you could just pass in the .compressed() versions of the
> arrays.
>
>    
>>> than implicit") It just seems absurd that if I make my own ndarray
>>> subclass that *just* adds some behavior to the array, but doesn't
>>> break *any* operations, I need to do one of the following:
>>>
>>> 1) Have my own copy of trapz that works with my class
>>> 2) Wrap every call to numpy's own trapz() to put the metadata back.
>>>
>>> Does it not seem backwards that the class that breaks conventions
>>> "just works" while those that don't break conventions, will work
>>> perfectly with the function as written, need help to be treated
>>> properly?
>>>
>>>        
>> You need your own version of trapz or whatever function because it has
>> the behavior that you expect. But a patch should not break numpy so you
>> need to at least to have a section that looks for masked array subtypes
>> and performs the desired behavior(s).
>>      
> I'm not trying to be difficult but it seems like there are conflicting
> ideas here: we shouldn't break numpy, which in this case means making
> matrices no longer work with trapz().  On the other hand, subclasses
> can do a lot of things, so there's no real expectation that they
> should ever work with numpy functions in general.  Am I missing
> something here? I'm just trying to understand what I perceive to be
> some inconsistencies in numpy's behavior and, more importantly,
> convention with regard subclasses.
>
> Ryan
>
>    
You should not confuse class functions with normal Python functions. 
Functions that are inherited from the ndarray superclass should be the 
same in the subclass unless these class functions have been modified. 
However many functions like trapz are not part of the ndarray superclass 
that have been written to handle the standard array (i.e. the unmodified 
ndarray superclass) but these may or may not work for all ndarray 
subclasses.  Other functions have been written to handle the specific 
ndarray subclass such as masked array or Matrix and may (but not 
guaranteed to) work for the standard array. Thus, I think your 
'inconsistencies' relate to the simple fact that not all numpy functions 
are aware of ndarray subclasses.

What is missing are the bug reports and solutions for these functions 
that occur when the expected behavior differs between the ndarray 
superclass and ndarray subclasses.  In the case of trapz, the bug report 
needs to be at least an indication of what is the expected behavior when 
there are masked values present.

Bruce