numpy.trapz() doesn't respect subclass
Hi, I found that trapz() doesn't work with subclasses: http://projects.scipy.org/numpy/ticket/1438 A simple patch (attached) to change asarray() to asanyarray() fixes the problem fine. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma
On Mon, Mar 22, 2010 at 12:49 AM, Ryan May <rmay31@gmail.com> wrote:
Hi,
I found that trapz() doesn't work with subclasses:
http://projects.scipy.org/numpy/ticket/1438
A simple patch (attached) to change asarray() to asanyarray() fixes the problem fine.
Are you sure this function works with matrices and other subclasses? Looking only very briefly at it: the multiplication might be a problem. Josef
Ryan
-- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sun, Mar 21, 2010 at 11:57 PM, <josef.pktd@gmail.com> wrote:
On Mon, Mar 22, 2010 at 12:49 AM, Ryan May <rmay31@gmail.com> wrote:
Hi,
I found that trapz() doesn't work with subclasses:
http://projects.scipy.org/numpy/ticket/1438
A simple patch (attached) to change asarray() to asanyarray() fixes the problem fine.
Are you sure this function works with matrices and other subclasses?
Looking only very briefly at it: the multiplication might be a problem.
Correct, it probably *is* a problem in some cases with matrices. In this case, I was using quantities (Darren Dale's unit-aware array package), and the result was that units were stripped off. The patch can't make trapz() work with all subclasses. However, right now, you have *no* hope of getting a subclass out of trapz(). With this change, subclasses that don't redefine operators can work fine. If you're passing a Matrix to trapz() and expecting it to work, IMHO you're doing it wrong. You can still pass one in by using asarray() yourself. Without this patch, I'm left with copying and maintaining a copy of the code elsewhere, just so I can loosen the function's input processing. That seems wrong, since there's really no need in my case to drop down to an ndarray. The input I'm giving it supports all the operations it needs, so it should just work with my original input. Or am I just off base here? Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma
On Mon, Mar 22, 2010 at 8:14 AM, Ryan May <rmay31@gmail.com> wrote:
On Sun, Mar 21, 2010 at 11:57 PM, <josef.pktd@gmail.com> wrote:
On Mon, Mar 22, 2010 at 12:49 AM, Ryan May <rmay31@gmail.com> wrote:
Hi,
I found that trapz() doesn't work with subclasses:
http://projects.scipy.org/numpy/ticket/1438
A simple patch (attached) to change asarray() to asanyarray() fixes the problem fine.
Are you sure this function works with matrices and other subclasses?
Looking only very briefly at it: the multiplication might be a problem.
Correct, it probably *is* a problem in some cases with matrices. In this case, I was using quantities (Darren Dale's unit-aware array package), and the result was that units were stripped off.
The patch can't make trapz() work with all subclasses. However, right now, you have *no* hope of getting a subclass out of trapz(). With this change, subclasses that don't redefine operators can work fine. If you're passing a Matrix to trapz() and expecting it to work, IMHO you're doing it wrong. You can still pass one in by using asarray() yourself. Without this patch, I'm left with copying and maintaining a copy of the code elsewhere, just so I can loosen the function's input processing. That seems wrong, since there's really no need in my case to drop down to an ndarray. The input I'm giving it supports all the operations it needs, so it should just work with my original input.
Anyone else care to weigh in here? Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma
On Sat, Mar 27, 2010 at 1:00 PM, Ryan May <rmay31@gmail.com> wrote:
On Mon, Mar 22, 2010 at 8:14 AM, Ryan May <rmay31@gmail.com> wrote:
On Sun, Mar 21, 2010 at 11:57 PM, <josef.pktd@gmail.com> wrote:
On Mon, Mar 22, 2010 at 12:49 AM, Ryan May <rmay31@gmail.com> wrote:
Hi,
I found that trapz() doesn't work with subclasses:
http://projects.scipy.org/numpy/ticket/1438
A simple patch (attached) to change asarray() to asanyarray() fixes the problem fine.
Are you sure this function works with matrices and other subclasses?
Looking only very briefly at it: the multiplication might be a problem.
Correct, it probably *is* a problem in some cases with matrices. In this case, I was using quantities (Darren Dale's unit-aware array package), and the result was that units were stripped off.
The patch can't make trapz() work with all subclasses. However, right now, you have *no* hope of getting a subclass out of trapz(). With this change, subclasses that don't redefine operators can work fine. If you're passing a Matrix to trapz() and expecting it to work, IMHO you're doing it wrong. You can still pass one in by using asarray() yourself. Without this patch, I'm left with copying and maintaining a copy of the code elsewhere, just so I can loosen the function's input processing. That seems wrong, since there's really no need in my case to drop down to an ndarray. The input I'm giving it supports all the operations it needs, so it should just work with my original input.
With asarray it gives correct results for matrices and all array_like and subclasses, it just doesn't preserve the type. Your patch would break matrices and possibly other types, masked_arrays?, ... One solution would be using arraywrap as in numpy.linalg. for related discussion: http://mail.scipy.org/pipermail/scipy-dev/2009-June/012061.html Josef
Anyone else care to weigh in here?
Ryan
-- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sat, Mar 27, 2010 at 11:12 AM, <josef.pktd@gmail.com> wrote:
On Sat, Mar 27, 2010 at 1:00 PM, Ryan May <rmay31@gmail.com> wrote:
On Mon, Mar 22, 2010 at 8:14 AM, Ryan May <rmay31@gmail.com> wrote:
On Sun, Mar 21, 2010 at 11:57 PM, <josef.pktd@gmail.com> wrote:
On Mon, Mar 22, 2010 at 12:49 AM, Ryan May <rmay31@gmail.com> wrote:
Hi,
I found that trapz() doesn't work with subclasses:
http://projects.scipy.org/numpy/ticket/1438
A simple patch (attached) to change asarray() to asanyarray() fixes the problem fine.
Are you sure this function works with matrices and other subclasses?
Looking only very briefly at it: the multiplication might be a problem.
Correct, it probably *is* a problem in some cases with matrices. In this case, I was using quantities (Darren Dale's unit-aware array package), and the result was that units were stripped off.
The patch can't make trapz() work with all subclasses. However, right now, you have *no* hope of getting a subclass out of trapz(). With this change, subclasses that don't redefine operators can work fine. If you're passing a Matrix to trapz() and expecting it to work, IMHO you're doing it wrong. You can still pass one in by using asarray() yourself. Without this patch, I'm left with copying and maintaining a copy of the code elsewhere, just so I can loosen the function's input processing. That seems wrong, since there's really no need in my case to drop down to an ndarray. The input I'm giving it supports all the operations it needs, so it should just work with my original input.
With asarray it gives correct results for matrices and all array_like and subclasses, it just doesn't preserve the type. Your patch would break matrices and possibly other types, masked_arrays?, ...
It would break matrices, yes. I would argue that masked arrays are already broken with trapz: In [1]: x = np.arange(10) In [2]: y = x * x In [3]: np.trapz(y, x) Out[3]: 244.5 In [4]: ym = np.ma.array(y, mask=(x>4)&(x<7)) In [5]: np.trapz(ym, x) Out[5]: 244.5 In [6]: y[5:7] = 0 In [7]: ym = np.ma.array(y, mask=(x>4)&(x<7)) In [8]: np.trapz(ym, x) Out[8]: 183.5 Because of the call to asarray(), the mask is completely discarded and you end up with identical results to an unmasked array, which is not what I'd expect. Worse, the actual numeric value of the positions that were masked affect the final answer. My patch allows this to work as expected too.
One solution would be using arraywrap as in numpy.linalg.
By arraywrap, I'm assuming you mean: def _makearray(a): new = asarray(a) wrap = getattr(a, "__array_prepare__", new.__array_wrap__) return new, wrap I'm not sure if that's identical to just letting the subclass handle what's needed. To my eyes, that doesn't look as though it'd be equivalent, both for handling masked arrays and Quantities. For quantities at least, the result of trapz will have different units than either of the inputs.
for related discussion: http://mail.scipy.org/pipermail/scipy-dev/2009-June/012061.html
Actually, that discussion kind of makes my point. Matrices are a pain to make work in a general sense because they *break* ndarray conventions--to me it doesn't make sense to help along classes that break convention at the expense of making well-behaved classes a pain to use. You should need an *explicit* cast of a matrix to an ndarray instead of the function quietly doing it for you. ("Explicit is better than implicit") It just seems absurd that if I make my own ndarray subclass that *just* adds some behavior to the array, but doesn't break *any* operations, I need to do one of the following: 1) Have my own copy of trapz that works with my class 2) Wrap every call to numpy's own trapz() to put the metadata back. Does it not seem backwards that the class that breaks conventions "just works" while those that don't break conventions, will work perfectly with the function as written, need help to be treated properly? Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma
On Sat, Mar 27, 2010 at 2:31 PM, Ryan May <rmay31@gmail.com> wrote:
On Sat, Mar 27, 2010 at 11:12 AM, <josef.pktd@gmail.com> wrote:
On Sat, Mar 27, 2010 at 1:00 PM, Ryan May <rmay31@gmail.com> wrote:
On Mon, Mar 22, 2010 at 8:14 AM, Ryan May <rmay31@gmail.com> wrote:
On Sun, Mar 21, 2010 at 11:57 PM, <josef.pktd@gmail.com> wrote:
On Mon, Mar 22, 2010 at 12:49 AM, Ryan May <rmay31@gmail.com> wrote:
Hi,
I found that trapz() doesn't work with subclasses:
http://projects.scipy.org/numpy/ticket/1438
A simple patch (attached) to change asarray() to asanyarray() fixes the problem fine.
Are you sure this function works with matrices and other subclasses?
Looking only very briefly at it: the multiplication might be a problem.
Correct, it probably *is* a problem in some cases with matrices. In this case, I was using quantities (Darren Dale's unit-aware array package), and the result was that units were stripped off.
The patch can't make trapz() work with all subclasses. However, right now, you have *no* hope of getting a subclass out of trapz(). With this change, subclasses that don't redefine operators can work fine. If you're passing a Matrix to trapz() and expecting it to work, IMHO you're doing it wrong. You can still pass one in by using asarray() yourself. Without this patch, I'm left with copying and maintaining a copy of the code elsewhere, just so I can loosen the function's input processing. That seems wrong, since there's really no need in my case to drop down to an ndarray. The input I'm giving it supports all the operations it needs, so it should just work with my original input.
With asarray it gives correct results for matrices and all array_like and subclasses, it just doesn't preserve the type. Your patch would break matrices and possibly other types, masked_arrays?, ...
It would break matrices, yes. I would argue that masked arrays are already broken with trapz:
In [1]: x = np.arange(10)
In [2]: y = x * x
In [3]: np.trapz(y, x) Out[3]: 244.5
In [4]: ym = np.ma.array(y, mask=(x>4)&(x<7))
In [5]: np.trapz(ym, x) Out[5]: 244.5
In [6]: y[5:7] = 0
In [7]: ym = np.ma.array(y, mask=(x>4)&(x<7))
In [8]: np.trapz(ym, x) Out[8]: 183.5
Because of the call to asarray(), the mask is completely discarded and you end up with identical results to an unmasked array, which is not what I'd expect. Worse, the actual numeric value of the positions that were masked affect the final answer. My patch allows this to work as expected too.
One solution would be using arraywrap as in numpy.linalg.
By arraywrap, I'm assuming you mean:
def _makearray(a): new = asarray(a) wrap = getattr(a, "__array_prepare__", new.__array_wrap__) return new, wrap
I'm not sure if that's identical to just letting the subclass handle what's needed. To my eyes, that doesn't look as though it'd be equivalent, both for handling masked arrays and Quantities. For quantities at least, the result of trapz will have different units than either of the inputs.
for related discussion: http://mail.scipy.org/pipermail/scipy-dev/2009-June/012061.html
Actually, that discussion kind of makes my point. Matrices are a pain to make work in a general sense because they *break* ndarray conventions--to me it doesn't make sense to help along classes that break convention at the expense of making well-behaved classes a pain to use. You should need an *explicit* cast of a matrix to an ndarray instead of the function quietly doing it for you. ("Explicit is better than implicit") It just seems absurd that if I make my own ndarray subclass that *just* adds some behavior to the array, but doesn't break *any* operations, I need to do one of the following:
1) Have my own copy of trapz that works with my class 2) Wrap every call to numpy's own trapz() to put the metadata back.
Does it not seem backwards that the class that breaks conventions "just works" while those that don't break conventions, will work perfectly with the function as written, need help to be treated properly?
(trying again, firefox or gmail deleted my earlier response instead of sending it) Matrices have been part of numpy for a long time and your patch would break backwards compatibility in a pretty serious way. subclasses of ndarray, like masked_arrays and quantities, and classes that delegate to array calculations, like pandas, can redefine anything. So there is not much that can be relied on if any subclass is allowed to be used inside a function e.g. quantities redefines sin, cos,... http://packages.python.org/quantities/user/issues.html#umath-functions What happens if you call fft with a quantity array? For ndarrays I know that (x*y).sum(0) is equivalent to np.dot(x.T,y) for appropriate y and x (e.g. for a faster calculation of means), so, it's just an implementation detail what I use. If a subclass redefines ufuncs but not linalg, then these two can be different. Just a thought, what happens if an existing function for ndarrays is cythonized? I guess the behavior for subclasses might change if they were not converted to ndarrays. So it appears to me that the only feasible way *in general* is to have separate functions or wrappers like np.ma. Except for simple functions and ufuncs, it would be a lot of work and fragile to allow asanyarray. And, as we discussed in a recent thread on masked arrays (and histogram), it would push the work on the function writer instead of the ones that are creating new subclasses. Of course, the behavior in numpy and scipy can be improved, and trapz may be simple enough to change, but I don't think a patch that breaks backwards compatibility pretty seriously and is not accompanied by sufficient tests should go into numpy or scipy. (On the other hand, I'm very slowly getting used to the pattern that for a simple function, 10% is calculation and 90% is interface code.) Josef
Ryan
-- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sat, Mar 27, 2010 at 8:23 PM, <josef.pktd@gmail.com> wrote:
Matrices have been part of numpy for a long time and your patch would break backwards compatibility in a pretty serious way.
Yeah, and I should admit that I realize that makes this particular patch a no-go. However, that to me doesn't put the issue to bed for any future code that gets written (see below).
subclasses of ndarray, like masked_arrays and quantities, and classes that delegate to array calculations, like pandas, can redefine anything. So there is not much that can be relied on if any subclass is allowed to be used inside a function
e.g. quantities redefines sin, cos,... http://packages.python.org/quantities/user/issues.html#umath-functions What happens if you call fft with a quantity array?
Probably ends up casting to an ndarray. But that's a complex operation that I can live with not working. It's coded in C and can't be implemented quickly using array methods. And in this
Except for simple functions and ufuncs, it would be a lot of work and fragile to allow asanyarray. And, as we discussed in a recent thread on masked arrays (and histogram), it would push the work on the function writer instead of the ones that are creating new subclasses.
I disagree in this case. I think the function writer should only be burdened to try to use array methods rather than numpy functions, if possible, and avoiding casts other than asanyarray() at all costs. I think we shouldn't be scared of getting an error when a subclass is passed to a function, because that's an indication to the programmer that it doesn't work with what you're passing in and you need to *explicitly* cast it to an ndarray. Having the function do the cast for you is: 1) magical and implicit 2) Forces an unnecessary cast on those who would otherwise work fine. I get errors when I try to pass structured arrays to math functions, but I don't see numpy casting that away.
Of course, the behavior in numpy and scipy can be improved, and trapz may be simple enough to change, but I don't think a patch that breaks backwards compatibility pretty seriously and is not accompanied by sufficient tests should go into numpy or scipy.
If sufficient tests is the only thing holding this back, let me know. I'll get to coding. But I can't argue with the backwards incompatibility. At this point, I think I'm more trying to see if there's any agreement that: casting *everyone* because some class breaks behavior is a bad idea. The programmer can always make it work by explicitly asking for the cast, but there's no way for the programmer to ask the function *not* to cast the data. Hell, I'd be happy if trapz took a flag just telling it subok=True.
(On the other hand, I'm very slowly getting used to the pattern that for a simple function, 10% is calculation and 90% is interface code.)
Yeah, it's kind of annoying, since the 10% is the cool part you want, and that 90% is thorny to design and boring to code. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma
On Sat, Mar 27, 2010 at 11:37 PM, Ryan May <rmay31@gmail.com> wrote:
On Sat, Mar 27, 2010 at 8:23 PM, <josef.pktd@gmail.com> wrote:
Matrices have been part of numpy for a long time and your patch would break backwards compatibility in a pretty serious way.
Yeah, and I should admit that I realize that makes this particular patch a no-go. However, that to me doesn't put the issue to bed for any future code that gets written (see below).
subclasses of ndarray, like masked_arrays and quantities, and classes that delegate to array calculations, like pandas, can redefine anything. So there is not much that can be relied on if any subclass is allowed to be used inside a function
e.g. quantities redefines sin, cos,... http://packages.python.org/quantities/user/issues.html#umath-functions What happens if you call fft with a quantity array?
Probably ends up casting to an ndarray. But that's a complex operation that I can live with not working. It's coded in C and can't be implemented quickly using array methods. And in this
Except for simple functions and ufuncs, it would be a lot of work and fragile to allow asanyarray. And, as we discussed in a recent thread on masked arrays (and histogram), it would push the work on the function writer instead of the ones that are creating new subclasses.
I disagree in this case. I think the function writer should only be burdened to try to use array methods rather than numpy functions, if possible, and avoiding casts other than asanyarray() at all costs. I think we shouldn't be scared of getting an error when a subclass is passed to a function, because that's an indication to the programmer that it doesn't work with what you're passing in and you need to *explicitly* cast it to an ndarray. Having the function do the cast for you is: 1) magical and implicit 2) Forces an unnecessary cast on those who would otherwise work fine. I get errors when I try to pass structured arrays to math functions, but I don't see numpy casting that away.
the problem is quality control and testing. If the cast is automatically done, then if I feed anything array_like into the function, I only have to pay attention that casting to ndarray works as intended (e.g. it doesn't work with masked arrays with inappropriate masked values). If the casting is correct, I know that I get correct numbers back out, even if I have to reattach the meta information or convert the type again. With asanyarray, anything could happen, including getting numbers back that are wrong, maybe obviously so, maybe not. (structured arrays at least have the advantage that an exception is thrown pretty fast.) And from what I have seen so far, testing is not a high priority for many users. Overall, I think that there are very few functions that are simple enough that asanyarray would work, without wrapping and casting (at least internally). In scipy.stats we converted a few, but for most functions I don't want to spend the time thinking how it can be written so we can have the internal code use anyarray (main problem are matrices, masked_array and arrays with nans need special code anyway, structured dtypes are mostly useless for calculations without creating a view) e.g. what is the quantities outcome of a call to np.corrcoef, np.cov ? And as long as not every function returns a consistent result, changes in implementation details can affect the outcome, and then it will be a lot of fun hunting for bugs. e.g. stats.linregress, dot or explicit sum of squares or np.cov, or ... Are you sure you always get the same result with quantities?
Of course, the behavior in numpy and scipy can be improved, and trapz may be simple enough to change, but I don't think a patch that breaks backwards compatibility pretty seriously and is not accompanied by sufficient tests should go into numpy or scipy.
If sufficient tests is the only thing holding this back, let me know. I'll get to coding.
But I can't argue with the backwards incompatibility. At this point, I think I'm more trying to see if there's any agreement that: casting *everyone* because some class breaks behavior is a bad idea. The programmer can always make it work by explicitly asking for the cast, but there's no way for the programmer to ask the function *not* to cast the data. Hell, I'd be happy if trapz took a flag just telling it subok=True.
I thought a while ago that subclasses or even classes that implement an array_like interface should have an attribute to signal this, like iamalgebrasafe, or subok or dontcast. The freedom or choice not to get cast to ndarray is desirable, but the increase in bugs and bug reports won't be much fun. And the user with the next subclass will argue that numpy/scipy should do the casting because it's too much wrapping code that has to be build around every function. (Just as a related aside, in statsmodels I'm also still trying hard to keep the main models to use ndarrays only, either it becomes to specialized if it is based on a specific class, or it requires a lot of wrapping code. I don't think your proposal, to just let any array class in, will get very far before raising an exception or producing incorrect numbers (except maybe for array subclasses that don't change any numerical behavior.) That's my opinion, maybe others see it in a different way. But in any case, it should be possible to change individual functions even if the overall policy doesn't change. Josef
(On the other hand, I'm very slowly getting used to the pattern that for a simple function, 10% is calculation and 90% is interface code.)
Yeah, it's kind of annoying, since the 10% is the cool part you want, and that 90% is thorny to design and boring to code.
Ryan
-- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sat, Mar 27, 2010 at 10:23 PM, <josef.pktd@gmail.com> wrote:
subclasses of ndarray, like masked_arrays and quantities, and classes that delegate to array calculations, like pandas, can redefine anything. So there is not much that can be relied on if any subclass is allowed to be used inside a function
e.g. quantities redefines sin, cos,... http://packages.python.org/quantities/user/issues.html#umath-functions
Those functions were only intended to be used in the short term, until the ufuncs that ship with numpy included a mechanism that allowed quantity arrays to propagate the units. It would be nice to have a mechanism (like we have discussed briefly just recently on this list) where there is a single entry point to a given function like add, but subclasses can tweak the execution. We discussed the possibility of simplifying the wrapping scheme with a method like __handle_gfunc__. (I don't think this necessarily has to be limited to ufuncs.) I think a second method like __prepare_input__ is also necessary. Imagine something like: class GenericFunction: @property def executable(self): return self._executable def __init__(self, executable): self._executable = executable def __call__(self, *args, **kwargs): # find the input with highest priority, and then: args, kwargs = input.__prepare_input__(self, *args, **kwargs) return input.__handle_gfunc__(self, *args, **kwargs) # this is the core function to be passed to the generic class: def _add(a, b, out=None): # the generic, ndarray implementation. ... # here is the publicly exposed interface: add = GenericFunction(_add) # now my subclasses class MyArray(ndarray): # My class tweaks the execution of the function in __handle_gfunc__ def __prepare_input__(self, gfunc, *args, **kwargs): return mod_input[gfunc](*args, **kwargs) def __handle_gfunc__(self, gfunc, *args, **kwargs): res = gfunc.executable(*args, **kwargs) # you could have called a different core func there return mod_output[gfunc](res, *args, **kwargs) class MyNextArray(MyArray): def __prepare_input__(self, gfunc, *args, **kwargs): # let the superclass do its thing: args, kwargs = MyArray.__prepare_input__(self, gfunc, *args, **kwargs) # now I can tweak it further: return mod_input_further[gfunc](*args, **kwargs) def __handle_gfunc__(self, gfunc, *args, **kwargs): # let's defer to the superclass to handle calling the core function: res = MyArray.__handle_gfunc__(self, gfunc, *args, **kwargs) # and now we have one more crack at the result before passing it back: return mod_output_further[gfunc](res, *args, **kwargs) If a gfunc is not recognized, the subclass might raise a NotImplementedError or it might just pass the original args, kwargs on through. I didn't write that part out because the example was already running long. But the point is that a single entry point could be used for any subclass, without having to worry about how to support every subclass. Darren
On 03/27/2010 01:31 PM, Ryan May wrote:
On Sat, Mar 27, 2010 at 11:12 AM,<josef.pktd@gmail.com> wrote:
On Sat, Mar 27, 2010 at 1:00 PM, Ryan May<rmay31@gmail.com> wrote:
On Mon, Mar 22, 2010 at 8:14 AM, Ryan May<rmay31@gmail.com> wrote:
On Sun, Mar 21, 2010 at 11:57 PM,<josef.pktd@gmail.com> wrote:
On Mon, Mar 22, 2010 at 12:49 AM, Ryan May<rmay31@gmail.com> wrote:
Hi,
I found that trapz() doesn't work with subclasses:
http://projects.scipy.org/numpy/ticket/1438
A simple patch (attached) to change asarray() to asanyarray() fixes the problem fine.
Are you sure this function works with matrices and other subclasses?
Looking only very briefly at it: the multiplication might be a problem.
Correct, it probably *is* a problem in some cases with matrices. In this case, I was using quantities (Darren Dale's unit-aware array package), and the result was that units were stripped off.
The patch can't make trapz() work with all subclasses. However, right now, you have *no* hope of getting a subclass out of trapz(). With this change, subclasses that don't redefine operators can work fine. If you're passing a Matrix to trapz() and expecting it to work, IMHO you're doing it wrong. You can still pass one in by using asarray() yourself. Without this patch, I'm left with copying and maintaining a copy of the code elsewhere, just so I can loosen the function's input processing. That seems wrong, since there's really no need in my case to drop down to an ndarray. The input I'm giving it supports all the operations it needs, so it should just work with my original input.
With asarray it gives correct results for matrices and all array_like and subclasses, it just doesn't preserve the type. Your patch would break matrices and possibly other types, masked_arrays?, ...
It would break matrices, yes. I would argue that masked arrays are already broken with trapz:
In [1]: x = np.arange(10)
In [2]: y = x * x
In [3]: np.trapz(y, x) Out[3]: 244.5
In [4]: ym = np.ma.array(y, mask=(x>4)&(x<7))
In [5]: np.trapz(ym, x) Out[5]: 244.5
In [6]: y[5:7] = 0
In [7]: ym = np.ma.array(y, mask=(x>4)&(x<7))
In [8]: np.trapz(ym, x) Out[8]: 183.5
Because of the call to asarray(), the mask is completely discarded and you end up with identical results to an unmasked array, which is not what I'd expect. Worse, the actual numeric value of the positions that were masked affect the final answer. My patch allows this to work as expected too.
Actually you should assume that unless it is explicitly addressed (either by code or via a test), any subclass of ndarray (matrix, masked, structured, record and even sparse) may not provide a 'valid' answer. There are probably many numpy functions that only really work with the standard ndarray. Most of the time people do not meet these with the subclasses or have workarounds so there has been little requirement to address this especially due to the added overhead needed for checking. Also, any patch that does not explicitly define the assumed behavior with points that are masked has to be rejected. It is not even clear what the expected behavior is for masked arrays should be: Is it even valid for trapz to be integrating across the full range if there are missing points? That implies some assumption about the missing points. If is valid, then should you just ignore the masked values or try to predict the missing values first? Perhaps you may want to have the option to do both.
One solution would be using arraywrap as in numpy.linalg.
By arraywrap, I'm assuming you mean:
def _makearray(a): new = asarray(a) wrap = getattr(a, "__array_prepare__", new.__array_wrap__) return new, wrap
I'm not sure if that's identical to just letting the subclass handle what's needed. To my eyes, that doesn't look as though it'd be equivalent, both for handling masked arrays and Quantities. For quantities at least, the result of trapz will have different units than either of the inputs.
for related discussion: http://mail.scipy.org/pipermail/scipy-dev/2009-June/012061.html
Actually, that discussion kind of makes my point. Matrices are a pain to make work in a general sense because they *break* ndarray conventions--to me it doesn't make sense to help along classes that break convention at the expense of making well-behaved classes a pain to use. You should need an *explicit* cast of a matrix to an ndarray instead of the function quietly doing it for you. ("Explicit is better than implicit") It just seems absurd that if I make my own ndarray subclass that *just* adds some behavior to the array, but doesn't break *any* operations, I need to do one of the following:
1) Have my own copy of trapz that works with my class 2) Wrap every call to numpy's own trapz() to put the metadata back.
Does it not seem backwards that the class that breaks conventions "just works" while those that don't break conventions, will work perfectly with the function as written, need help to be treated properly?
Ryan
You need your own version of trapz or whatever function because it has the behavior that you expect. But a patch should not break numpy so you need to at least to have a section that looks for masked array subtypes and performs the desired behavior(s). Bruce
On Mon, Mar 29, 2010 at 8:00 AM, Bruce Southey <bsouthey@gmail.com> wrote:
On 03/27/2010 01:31 PM, Ryan May wrote:
Because of the call to asarray(), the mask is completely discarded and you end up with identical results to an unmasked array, which is not what I'd expect. Worse, the actual numeric value of the positions that were masked affect the final answer. My patch allows this to work as expected too.
Actually you should assume that unless it is explicitly addressed (either by code or via a test), any subclass of ndarray (matrix, masked, structured, record and even sparse) may not provide a 'valid' answer. There are probably many numpy functions that only really work with the standard ndarray. Most of the time people do not meet these with the subclasses or have workarounds so there has been little requirement to address this especially due to the added overhead needed for checking.
It's not that I'm surprised that masked arrays don't work. It's more that the calls to np.asarray within trapz() have been held up as being necessary for things like matrices and (at the time) masked arrays to work properly; as if calling asarray() is supposed to make all subclasses work, though at a base level by dropping to an ndarray. To me, the current behavior with masked arrays is worse than if passing in a matrix raised an exception. One is a silently wrong answer, the other is a big error that the programmer can see, test, and fix.
Also, any patch that does not explicitly define the assumed behavior with points that are masked has to be rejected. It is not even clear what the expected behavior is for masked arrays should be: Is it even valid for trapz to be integrating across the full range if there are missing points? That implies some assumption about the missing points. If is valid, then should you just ignore the masked values or try to predict the missing values first? Perhaps you may want to have the option to do both.
You're right, it doesn't actually work with MaskedArrays as it stand right now, because it calls add.reduce() directly instead of using the array.sum() method. Once fixed, by allowing MaskedArray to handle the operation, you end up not integrating over the masked region. Any operation involving masked points results in contributions by masked points are ignored. I guess it's as if you assumed the function was 0 over the masked region. If you wanted to ignore the masked points, but integrate over the region (making a really big trapezoid over that region), you could just pass in the .compressed() versions of the arrays.
than implicit") It just seems absurd that if I make my own ndarray subclass that *just* adds some behavior to the array, but doesn't break *any* operations, I need to do one of the following:
1) Have my own copy of trapz that works with my class 2) Wrap every call to numpy's own trapz() to put the metadata back.
Does it not seem backwards that the class that breaks conventions "just works" while those that don't break conventions, will work perfectly with the function as written, need help to be treated properly?
You need your own version of trapz or whatever function because it has the behavior that you expect. But a patch should not break numpy so you need to at least to have a section that looks for masked array subtypes and performs the desired behavior(s).
I'm not trying to be difficult but it seems like there are conflicting ideas here: we shouldn't break numpy, which in this case means making matrices no longer work with trapz(). On the other hand, subclasses can do a lot of things, so there's no real expectation that they should ever work with numpy functions in general. Am I missing something here? I'm just trying to understand what I perceive to be some inconsistencies in numpy's behavior and, more importantly, convention with regard subclasses. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma
Hi, I decided that having actual code that does what I want and keeps backwards compatibility (and adds tests) might be better than arguing semantics. I've updated my patch to: * Uses the array.sum() method instead of add.reduce to make subclasses fully work (this was still breaking masked arrays. * Catches an exception on doing the actual multiply and sum of the arrays and tries again after casting to ndarrays. This allows any subclasses that relied on being cast to still work. * Adds tests that ensure matrices work (test passes before and after changes to trapz()) and adds a test for masked arrays that checks that masked points are treated as expected. In this case, expected is defined to be the same as if you implemented the trapezoidal method by hand using MaskedArray's basic arithmetic operations. Attached here and at: http://projects.scipy.org/numpy/ticket/1438 I think this addresses the concerns that were raised about the changes for subclasses in this case. Let me know if I've missed something (or if there's no way in hell any such patch will ever be committed). Thanks, Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma
On 03/29/2010 10:17 AM, Ryan May wrote:
On Mon, Mar 29, 2010 at 8:00 AM, Bruce Southey<bsouthey@gmail.com> wrote:
On 03/27/2010 01:31 PM, Ryan May wrote:
Because of the call to asarray(), the mask is completely discarded and you end up with identical results to an unmasked array, which is not what I'd expect. Worse, the actual numeric value of the positions that were masked affect the final answer. My patch allows this to work as expected too.
Actually you should assume that unless it is explicitly addressed (either by code or via a test), any subclass of ndarray (matrix, masked, structured, record and even sparse) may not provide a 'valid' answer. There are probably many numpy functions that only really work with the standard ndarray. Most of the time people do not meet these with the subclasses or have workarounds so there has been little requirement to address this especially due to the added overhead needed for checking.
It's not that I'm surprised that masked arrays don't work. It's more that the calls to np.asarray within trapz() have been held up as being necessary for things like matrices and (at the time) masked arrays to work properly; as if calling asarray() is supposed to make all subclasses work, though at a base level by dropping to an ndarray. To me, the current behavior with masked arrays is worse than if passing in a matrix raised an exception. One is a silently wrong answer, the other is a big error that the programmer can see, test, and fix.
Also, any patch that does not explicitly define the assumed behavior with points that are masked has to be rejected. It is not even clear what the expected behavior is for masked arrays should be: Is it even valid for trapz to be integrating across the full range if there are missing points? That implies some assumption about the missing points. If is valid, then should you just ignore the masked values or try to predict the missing values first? Perhaps you may want to have the option to do both.
You're right, it doesn't actually work with MaskedArrays as it stand right now, because it calls add.reduce() directly instead of using the array.sum() method. Once fixed, by allowing MaskedArray to handle the operation, you end up not integrating over the masked region. Any operation involving masked points results in contributions by masked points are ignored. I guess it's as if you assumed the function was 0 over the masked region. If you wanted to ignore the masked points, but integrate over the region (making a really big trapezoid over that region), you could just pass in the .compressed() versions of the arrays.
than implicit") It just seems absurd that if I make my own ndarray subclass that *just* adds some behavior to the array, but doesn't break *any* operations, I need to do one of the following:
1) Have my own copy of trapz that works with my class 2) Wrap every call to numpy's own trapz() to put the metadata back.
Does it not seem backwards that the class that breaks conventions "just works" while those that don't break conventions, will work perfectly with the function as written, need help to be treated properly?
You need your own version of trapz or whatever function because it has the behavior that you expect. But a patch should not break numpy so you need to at least to have a section that looks for masked array subtypes and performs the desired behavior(s).
I'm not trying to be difficult but it seems like there are conflicting ideas here: we shouldn't break numpy, which in this case means making matrices no longer work with trapz(). On the other hand, subclasses can do a lot of things, so there's no real expectation that they should ever work with numpy functions in general. Am I missing something here? I'm just trying to understand what I perceive to be some inconsistencies in numpy's behavior and, more importantly, convention with regard subclasses.
Ryan
You should not confuse class functions with normal Python functions. Functions that are inherited from the ndarray superclass should be the same in the subclass unless these class functions have been modified. However many functions like trapz are not part of the ndarray superclass that have been written to handle the standard array (i.e. the unmodified ndarray superclass) but these may or may not work for all ndarray subclasses. Other functions have been written to handle the specific ndarray subclass such as masked array or Matrix and may (but not guaranteed to) work for the standard array. Thus, I think your 'inconsistencies' relate to the simple fact that not all numpy functions are aware of ndarray subclasses. What is missing are the bug reports and solutions for these functions that occur when the expected behavior differs between the ndarray superclass and ndarray subclasses. In the case of trapz, the bug report needs to be at least an indication of what is the expected behavior when there are masked values present. Bruce
participants (4)
-
Bruce Southey
-
Darren Dale
-
josef.pktd@gmail.com
-
Ryan May