Code review for trapz update
I've proposed a change to trapz, so that bin widths (`x` array deltas) are always considered positive. This allows monotonically decreasing `x` sequences to be used, in addition to monotonically increasing sequences. I am open to the idea of accepting non-monotonic sequences as well, and performing a sort on `y` to make `x` monotonic, regardless of its input form, but I think this may make the code more complex than necessary. Thoughts and suggestions are welcome. https://github.com/fizyxnrd/numpy/compare/master...trapz_binwidth_fix
On 3/10/2015 12:48 AM, K. N. wrote:
I've proposed a change to trapz, so that bin widths (`x` array deltas) are always considered positive. This allows monotonically decreasing `x` sequences to be used, in addition to monotonically increasing sequences.
Can you elaborate please. Monontically decreasing sequences can already be used. Aren't you losing the sign of the area? Alan Isaac
Alan G Isaac <alan.isaac <at> gmail.com> writes:
On 3/10/2015 12:48 AM, K. N. wrote:
I've proposed a change to trapz, so that bin widths (`x` array
deltas) are always considered positive.
This allows monotonically decreasing `x`
sequences to be used, in addition to monotonically increasing sequences.
Can you elaborate please. Monontically decreasing sequences can already be used. Aren't you losing the sign of the area?
Alan Isaac
Trapezoid rule is an approximation of $\int y(x) dx$, replacing integral with sum by $\sum_{i=1}^{N-1} (y_{i+1} + y{i}) * (x_{i+1} - x{i}) / 2$. Written in this form, it might seem that we want to keep track of the sign of $x_{i+1} - x{i}$. However, this construction always assumes that the $x_i$ are an ordered sequence. By interpreting this simply, we find that we may have negative *intervals*. To be more precise, our summation should read $\sum_{i=1}^{N-1} (y_{i+1} + y{i}) * (\Delta x_i) / 2$, where $\Delta x_i = |x_{i+1} - x{i}|$. This is certainly the intent of the usual mathematical construction. Note that negative intervals are not necessary to achieve negative "areas". This data should integrate to negative "area": x = [1, 2, 3, 4, 5] y = [0, -1, -2, -1, 0] (-4) This data should integrate to positive "area", even though the x data is decreasing (and we have negative intervals): x = [5, 4, 3, 2, 1] y = [2, 2, 2, 2, 2] (currently, get -8, but should get +8). It might seem reasonable to require a user to always make their x sequence monotonically increasing. However, trapz is silent if this is not the case, and at the very least should be modified throw an error for non monotonically increasing x sequences. This issue arises, for example, in the conversion of spectral quantities from wavenumber (units of inverse length) to wavelength (units of length). When the spectral quantity is inverted, a monotonically increasing sequence becomes a monotonically decreasing sequence, and vice versa. This gives the unexpected result that trapz(<quantity>, <spectrum>) works as expected before conversion, but returns values with the wrong sign after conversion. This behavior is even more difficult to recognize if the <quantity> (y- value) term is not strictly positive, and a user may utilize trapz incorrectly with no indication that it is not behaving as expected.
On 3/10/2015 8:37 AM, fizyxnrd wrote:
This data should integrate to positive "area", even though the x data is decreasing (and we have negative intervals): x = [5, 4, 3, 2, 1] y = [2, 2, 2, 2, 2] (currently, get -8, but should get +8).
I still don't get it. Why are you claiming the sign is currently wrong? $\int_{5}^{1} 2 dx = |_5^1 2x = 2 - 10 = -8$. Why would we not want `trapz` to embody the core property of integrals that $\int_a^b f(x) dx = - \int_b^a f(x) dx$? Alan Isaac
Alan G Isaac <alan.isaac <at> gmail.com> writes:
On 3/10/2015 8:37 AM, fizyxnrd wrote:
This data should integrate to positive "area", even though the x
data is
decreasing (and we have negative intervals): x = [5, 4, 3, 2, 1] y = [2, 2, 2, 2, 2] (currently, get -8, but should get +8).
I still don't get it. Why are you claiming the sign is currently wrong? $\int_{5}^{1} 2 dx = |_5^1 2x = 2 - 10 = -8$. Why would we not want `trapz` to embody the core property of integrals that $\int_a^b f(x) dx = - \int_b^a f(x) dx$?
Alan Isaac
Trapz finds the area under a one-to-one association of y values with x values. If y(x) > 0, then the area bounded by [a, b] between y(x) and x=0 should always be positive. The core property you have referenced above is the very property that should be used in order to achieve the equivalence with integrating along a negative path. Maintaining this separation preserves the equivalence of np.trapz(y,x) == np.trapz(y[::-1], x[::-1]), which I believe is an equivalence that should hold true. This form requires a user to recognize that $F_b = F_a + \int_a^b f(x) dx$, while $F(a) = F(b) - \int_a^b f(x) dx$ instead of $F(a) = F(b) + \int_b^a f(x) dx$. One way or another, the user must be aware of the ordering in certain cases. However, by treating intervals as non- negative, order matters in a more limited set of cases. Additionally, non-monotonic sequences currently give strange behavior, because they allow non-surjective mappings from x to y. Thus, the meaning of np.trapz([1 3 5 7 3], [1 2 6 1 2]) is ambiguous at best. At a minimum, trapz should error on non-monotonic sequences. Thoughts?
On 3/10/2015 10:50 AM, fizyxnrd wrote:
If y(x) > 0, then the area bounded by [a, b] between y(x) and x=0 should always be positive.
That claim is what you have to justify. This is not true mathematically: as I noted, it is a **core** property of integrals that $\int_a^b f(x) dx = - \int_b^a f(x) dx$ So the sign of the integral *should* change if you reverse the sequences. That is how integration works. So, why do you want to break the usual mathematical relationship? That seems like a terrible idea. Non-monotonic sequences are also currently handled correctly, from a mathematical perspective. This is because (mathematically) integration is sub-interval additive. I admit it is hard to imagine a user making use of this feature, but it is a feature and not a bug. Alan Isaac
Trapz finds the area under a one-to-one association of y values with x values. If y(x) > 0, then the area bounded by [a, b] between y(x) and x=0 should always be positive.
You could write a trapz that does that, however np.trapz finds the integral from a to b of y using the sampled data you provide. The "from" in there is important. Since we are integrating samples, the a and b are essentially the first and last points of the x input. Since a is x[0] and b is x[-1], the x array is defining the path along which to integrate.
The core property you have referenced above is the very property that should be used in order to achieve the equivalence with integrating along a negative path. Maintaining this separation preserves the equivalence of np.trapz(y,x) == np.trapz(y[::-1], x[::-1]), which I believe is an equivalence that should hold true.
This equivalence is false. For instance both of these results are correct. Would they still be with your changes? In [46]: x = np.exp(1j*np.pi*np.linspace(0,1,100)) In [47]: z = 1/x In [48]: np.trapz(z, x) Out[48]: (1.3244509217643717e-18+3.1410654163086975j) In [49]: np.trapz(z[::-1], x[::-1]) Out[49]: (-1.3244509217643594e-18-3.1410654163086971j) Eric
Eric Moore <ewm <at> redtetrahedron.org> writes:
Trapz finds the area under a one-to-one association of y values with x values. If y(x) > 0, then the area bounded by [a, b] between y(x) and x=0 should always be positive.
You could write a trapz that does that, however np.trapz finds the
integral from a to b of y using the sampled data you provide. The "from" in there is important. Since we are integrating samples, the a and b are essentially the first and last points of the x input. Since a is x[0] and b is x[-1], the x array is defining the path along which to integrate.
The core property you have referenced above is the very property that should be used in order to achieve the equivalence with integrating along a negative path. Maintaining this separation preserves the equivalence of np.trapz(y,x) == np.trapz(y[::-1], x[::-1]), which I believe is an equivalence that should hold true.
This equivalence is false. For instance both of these results are
correct. Would they still be with your changes?
In [46]: x = np.exp(1j*np.pi*np.linspace(0,1,100))
In [47]: z = 1/x
In [48]: np.trapz(z, x) Out[48]: (1.3244509217643717e-18+3.1410654163086975j)
In [49]: np.trapz(z[::-1], x[::-1]) Out[49]: (-1.3244509217643594e-18-3.1410654163086971j)
These results would still be correct. In the first case, the user simply specifies that they wish to know $Z(x) - Z(0) = \int_0^x z(x') dx'$, while in the second case, they specify that they are looking for Z(0) - Z(x) = \int_x^0 z(x') dx' = -\int_0^x z(x') dx'$. I'm asserting that the two should be completely equivalent, and that the user recognize that the endpoints in the first instance are 1 and -1 ($\theta \elem [0, \pi]$), while in the latter case the endpoints are -1 and 1 ($\theta \elem [\pi, 0]$). Thus $\int_0^x z(x)$ is given by np.trapz(z, x) and $\int_x^0 z(x)$ is given by -np.trapz(z, x). This modification treats the area under the curve as independent of path, and asks the user to use path endpoints to determine what should be done with that quantity. I can certainly understand the argument for the existing position. I only advocate the change because I have never come across a circumstance in which I need to use it the way it is, and I and others expect that the sign of trapz should only depend on the sign of y, not the order of x.
On 3/10/2015 4:22 PM, fizyxnrd wrote:
I and others expect that the sign of trapz should only depend on the sign of y, not the order of x.
If for some reason (conventions in another sofware, perhaps?) many users do not expect the mathematically correct formulation, that certainly constitutes an argument for adding a sentence and an illustrative example to the docs. Alan Isaac
On Mar 10, 2015 1:22 PM, "fizyxnrd" <fizyxnrd@gmail.com> wrote:
Eric Moore <ewm <at> redtetrahedron.org> writes:
This equivalence is false. For instance both of these results are correct. Would they still be with your changes? In [46]: x = np.exp(1j*np.pi*np.linspace(0,1,100))
In [47]: z = 1/x
In [48]: np.trapz(z, x) Out[48]: (1.3244509217643717e-18+3.1410654163086975j)
In [49]: np.trapz(z[::-1], x[::-1]) Out[49]: (-1.3244509217643594e-18-3.1410654163086971j)
These results would still be correct. In the first case, the user simply specifies that they wish to know $Z(x) - Z(0) = \int_0^x z(x') dx'$, while in the second case, they specify that they are looking for Z(0) - Z(x) = \int_x^0 z(x') dx' = -\int_0^x z(x') dx'$. I'm asserting that the two should be completely equivalent, and that the user recognize that the endpoints in the first instance are 1 and -1 ($\theta \elem [0, \pi]$), while in the latter case the endpoints are -1 and 1 ($\theta \elem [\pi, 0]$). Thus $\int_0^x z(x)$ is given by np.trapz(z, x) and $\int_x^0 z(x)$ is given by -np.trapz(z, x).
This modification treats the area under the curve as independent of path, and asks the user to use path endpoints to determine what should be done with that quantity.
Look again at the example: here the path is a two-dimensional arc in the complex plane. Ian Henricksen's message gives an even more refined version of the same example, where the arc is a circle with the same start and end points. Nonetheless, it matters whether you traverse the circle clockwise or anti-clockwise. These are real cases where your proposed change just cannot be applied, because the integral really does depend on all the points in the path, not just the start and end. I understand where you're coming from, but, well, mathematics has spoken :-). -n
Nathaniel Smith <njs <at> pobox.com> writes:
On Mar 10, 2015 1:22 PM, "fizyxnrd" <fizyxnrd <at> gmail.com> wrote:
Eric Moore <ewm <at> redtetrahedron.org> writes:
This equivalence is false. For instance both of these results are correct. Would they still be with your changes? In [46]: x = np.exp(1j*np.pi*np.linspace(0,1,100))
In [47]: z = 1/x
In [48]: np.trapz(z, x) Out[48]: (1.3244509217643717e-18+3.1410654163086975j)
In [49]: np.trapz(z[::-1], x[::-1]) Out[49]: (-1.3244509217643594e-18-3.1410654163086971j)
These results would still be correct. In the first case, the user simply specifies that they wish to know $Z(x) - Z(0) = \int_0^x
dx'$, while in the second case, they specify that they are looking for Z(0) - Z(x) = \int_x^0 z(x') dx' = -\int_0^x z(x') dx'$. I'm asserting that the two should be completely equivalent, and that
user recognize that the endpoints in the first instance are 1 and -1 ($\theta \elem [0, \pi]$), while in the latter case the endpoints are -1 and 1 ($\theta \elem [\pi, 0]$). Thus $\int_0^x z(x)$ is given by np.trapz(z, x) and $\int_x^0 z(x)$ is given by -np.trapz(z, x).
This modification treats the area under the curve as independent of path, and asks the user to use path endpoints to determine what should be done with that quantity. Look again at the example: here the path is a two-dimensional arc in
These are real cases where your proposed change just cannot be applied, because the integral really does depend on all the points in
z(x') the the complex plane. Ian Henricksen's message gives an even more refined version of the same example, where the arc is a circle with the same start and end points. Nonetheless, it matters whether you traverse the circle clockwise or anti-clockwise. the path, not just the start and end. I understand where you're coming from, but, well, mathematics has spoken .
-n
_______________________________________________ SciPy-Dev mailing list SciPy-Dev <at> scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
It appears that there is more use for the current form than I have come across with my adventures in positive-definite quantities. Is there room for a keyword e.g., unsorted ( so that the call is trapz(y, x=None, dx = 1.0, axis=-1, unsorted=False) where if unsorted is set to True, the data is automatically sorted before integration?
On 11/03/15 00:23, fizyxnrd wrote:
It appears that there is more use for the current form than I have come across with my adventures in positive-definite quantities. Is there room for a keyword e.g., unsorted ( so that the call is trapz(y, x=None, dx = 1.0, axis=-1, unsorted=False) where if unsorted is set to True, the data is automatically sorted before integration?
I think a numerical integral routine should only return an approximation to the integral. If you sort the data the trapz function is no longer returning an approximation to the integral. Sturla
Not sure whether to laugh or cry. On Wed, Mar 11, 2015 at 12:11 AM, Sturla Molden <sturla.molden@gmail.com> wrote:
On 11/03/15 00:23, fizyxnrd wrote:
It appears that there is more use for the current form than I have come across with my adventures in positive-definite quantities. Is there room for a keyword e.g., unsorted ( so that the call is trapz(y, x=None, dx = 1.0, axis=-1, unsorted=False) where if unsorted is set to True, the data is automatically sorted before integration?
I think a numerical integral routine should only return an approximation to the integral. If you sort the data the trapz function is no longer returning an approximation to the integral.
Sturla
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
-- * * * * * * * * http://www.mssl.ucl.ac.uk/~npmk/ * * * * Dr. N.P.M. Kuin (n.kuin@ucl.ac.uk) phone +44-(0)1483 (prefix) -204927 (work) mobile +44(0)7806985366 skype ID: npkuin Mullard Space Science Laboratory – University College London – Holmbury St Mary – Dorking – Surrey RH5 6NT– U.K.
Feel free to join the discussion if you have a factual comment. Sturla On 11/03/15 02:07, Paul Kuin wrote:
Not sure whether to laugh or cry.
On Wed, Mar 11, 2015 at 12:11 AM, Sturla Molden <sturla.molden@gmail.com <mailto:sturla.molden@gmail.com>> wrote:
On 11/03/15 00:23, fizyxnrd wrote:
> It appears that there is more use for the current form than I have come > across with my adventures in positive-definite quantities. Is there > room for a keyword e.g., unsorted ( so that the call is trapz(y, x=None, > dx = 1.0, axis=-1, unsorted=False) where if unsorted is set to True, the > data is automatically sorted before integration?
I think a numerical integral routine should only return an approximation to the integral. If you sort the data the trapz function is no longer returning an approximation to the integral.
Sturla
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org <mailto:SciPy-Dev@scipy.org> http://mail.scipy.org/mailman/listinfo/scipy-dev
--
* * * * * * * * http://www.mssl.ucl.ac.uk/~npmk/ * * * * Dr. N.P.M. Kuin (n.kuin@ucl.ac.uk <mailto:n.kuin@ucl.ac.uk>) phone +44-(0)1483 (prefix) -204927 (work) mobile +44(0)7806985366 skype ID: npkuin Mullard Space Science Laboratory – University College London – Holmbury St Mary – Dorking – Surrey RH5 6NT– U.K.
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
On Wed, Mar 11, 2015 at 12:11 AM, Sturla Molden <sturla.molden <at> gmail.com> wrote: On 11/03/15 00:23, fizyxnrd wrote:
It appears that there is more use for the current form than I have come across with my adventures in positive-definite quantities. Is there room for a keyword e.g., unsorted ( so that the call is trapz(y, x=None, dx = 1.0, axis=-1, unsorted=False) where if unsorted is set to True, the data is automatically sorted before integration?
I think a numerical integral routine should only return an approximation to the integral. If you sort the data the trapz function is no longer returning an approximation to the integral.
Sturla
I was not clear in what I meant should be sorted. I am envisioning something like x = np.random.rand(100) y = np.random.rand(100) # Assume that x should be an ordered quantity idx = np.argsort(x) result = np.trapz(y[idx], x[idx]) The intent is not to sort the y values by value, only to assume that there is no meaningful path information in the x values, and so to sort them so that only the total area under y is computed, without regard to direction.
On 11/03/15 13:48, fizyxnrd wrote:
I was not clear in what I meant should be sorted. I am envisioning something like
# Assume that x should be an ordered quantity idx = np.argsort(x)
result = np.trapz(y[idx], x[idx])
I understood what you meant, but sorting x values is not permissible: $\int_a^b f(x) dx = - \int_b^a f(x) dx$ The x values therefore cannot be sorted. Consider a more complex situation like a line integral. What do you suppose sorting x values might do?
The intent is not to sort the y values by value, only to assume that there is no meaningful path information in the x values, and so to sort them so that only the total area under y is computed, without regard to direction.
Right. You want to estimate the integral of y = f(x) from x.min() to x.max() and have random samples of (x,y) pairs. But then you should preprocess your data before you call trapz. trapz provides a numerical integral given two input vectors. It is the user's responsibility to make sure it is the integral the user actually wants. The necessary preprosessing will vary from case to case. The order of x and y must be retained because it is actually required to produce the correct result. Sturla
On 10/03/15 15:03, Alan G Isaac wrote:
Why would we not want `trapz` to embody the core property of integrals that $\int_a^b f(x) dx = - \int_b^a f(x) dx$?
Obviously we want to retain this behavior. Otherwise we could not call it integration. The conclusion would be that the proposed update is wrong and the current trapz is correct.
This data should integrate to negative "area": x = [1, 2, 3, 4, 5] y = [0, -1, -2, -1, 0] (-4)
Yes.
This data should integrate to positive "area", even though the x data is decreasing (and we have negative intervals): x = [5, 4, 3, 2, 1] y = [2, 2, 2, 2, 2] (currently, get -8, but should get +8).
No. In general the path one integrates over matters. If you want to think about intervals they must be signed.
It might seem reasonable to require a user to always make their x sequence monotonically increasing. However, trapz is silent if this is not the case, and at the very least should be modified throw an error for non monotonically increasing x sequences.
Erroring in this case is probably fine.
This issue arises, for example, in the conversion of spectral quantities from wavenumber (units of inverse length) to wavelength (units of length). When the spectral quantity is inverted, a monotonically increasing sequence becomes a monotonically decreasing sequence, and vice versa. This gives the unexpected result that trapz(<quantity>, <spectrum>) works as expected before conversion, but returns values with the wrong sign after conversion.
I understand why this seems wrong, but the fact that you have to reverse the order is actually correct.
This behavior is even more difficult to recognize if the <quantity> (y-
value) term is not strictly positive, and a user may utilize trapz incorrectly with no indication that it is not behaving as expected.
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org <javascript:;> http://mail.scipy.org/mailman/listinfo/scipy-dev
Eric Moore <ewm@redtetrahedron.org> wrote:
This issue arises, for example, in the conversion of spectral quantities from wavenumber (units of inverse length) to wavelength (units of length). When the spectral quantity is inverted, a monotonically increasing sequence becomes a monotonically decreasing sequence, and vice versa. This gives the unexpected result that trapz(<quantity>, <spectrum>) works as expected before conversion, but returns values with the wrong sign after conversion.
I understand why this seems wrong, but the fact that you have to reverse the order is actually correct.
Yes. Sturla
On Tue, Mar 10, 2015 at 11:30 AM Eric Moore <ewm@redtetrahedron.org> wrote:
This data should integrate to negative "area": x = [1, 2, 3, 4, 5] y = [0, -1, -2, -1, 0] (-4)
Yes.
This data should integrate to positive "area", even though the x data is decreasing (and we have negative intervals): x = [5, 4, 3, 2, 1] y = [2, 2, 2, 2, 2] (currently, get -8, but should get +8).
No. In general the path one integrates over matters. If you want to think about intervals they must be signed.
It might seem reasonable to require a user to always make their x sequence monotonically increasing. However, trapz is silent if this is not the case, and at the very least should be modified throw an error for non monotonically increasing x sequences.
Erroring in this case is probably fine.
Sorting or checking for sorting isn't a good option. As a toy example, consider evaluating the residue of 1/z at the origin using a contour integral. import numpy as np t = np.linspace(0, 2 * np.pi, 100) circ = np.exp(1.0j * t) reciprocals = 1. / circ np.trapz(reciprocals, circ) / (2.0j * np.pi) As it is right now, this (correctly) returns an answer very close to 1. The output is garbage if the integral is evaluated in sorted form. -Ian Henriksen
This issue arises, for example, in the conversion of spectral quantities from wavenumber (units of inverse length) to wavelength (units of length). When the spectral quantity is inverted, a monotonically increasing sequence becomes a monotonically decreasing sequence, and vice versa. This gives the unexpected result that trapz(<quantity>, <spectrum>) works as expected before conversion, but returns values with the wrong sign after conversion.
I understand why this seems wrong, but the fact that you have to reverse the order is actually correct.
This behavior is even more difficult to recognize if the <quantity> (y-
value) term is not strictly positive, and a user may utilize trapz incorrectly with no indication that it is not behaving as expected.
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
participants (8)
-
Alan G Isaac
-
Eric Moore
-
fizyxnrd
-
Ian Henriksen
-
K. N.
-
Nathaniel Smith
-
Paul Kuin
-
Sturla Molden