Re: [Numpy-discussion] suggested change of behavior for interp

The simplest monotonicity test that I've seen is:
dx = np.diff(x) monotonic = np.all(dx < 0.) or np.all(dx > 0.)
I expect that this is pretty fast, though I haven't tested it yet. If we want to make checking optional, then I think the default should be to check with the option to skip the check.
Jon
On Tue, Jun 4, 2013 at 9:03 PM, numpy-discussion-request@scipy.org wrote:
From: Eric Firing efiring@hawaii.edu To: numpy-discussion@scipy.org Cc: Date: Tue, 04 Jun 2013 15:08:29 -1000 Subject: Re: [Numpy-discussion] suggested change of behavior for interp On 2013/06/04 2:05 PM, Charles R Harris wrote:
On Tue, Jun 4, 2013 at 12:07 PM, Slavin, Jonathan <jslavin@cfa.harvard.edu <mailto:jslavin@cfa.harvard.**edujslavin@cfa.harvard.edu>> wrote:
Hi, I would like to suggest that the behavior of numpy.interp be changed regarding treatment of situations in which the x-coordinates are not monotonically increasing. Specifically, it seems to me that interp should work correctly when the x-coordinate is decreasing monotonically. Clearly it cannot work if the x-coordinate is not monotonic, but in that case it should raise an exception. Currently if x is not increasing it simply silently fails, providing incorrect values. This fix could be as simple as a monotonicity test and inversion if necessary (plus a raise statement for non-monotonic
cases).
Seems reasonable, although it might add a bit of execution time.
The monotonicity test should be an option if it is available at all; when interpolating a small number of points from a large pair of arrays, the single sweep through the whole array could dominate the execution time. Checking for increasing versus decreasing, in contrast, can be done fast, so handling the decreasing case transparently is reasonable.
Eric
________________________________________________________ Jonathan D. Slavin Harvard-Smithsonian CfA jslavin@cfa.harvard.edu 60 Garden Street, MS 83 phone: (617) 496-7981 Cambridge, MA 02138-1516 fax: (617) 496-7577 USA ________________________________________________________

On Wed, Jun 5, 2013 at 3:16 PM, Slavin, Jonathan jslavin@cfa.harvard.edu wrote:
The simplest monotonicity test that I've seen is:
dx = np.diff(x) monotonic = np.all(dx < 0.) or np.all(dx > 0.)
I expect that this is pretty fast, though I haven't tested it yet. If we want to make checking optional, then I think the default should be to check with the option to skip the check.
The slow down people are worried about is, suppose that 'xp' has 1,000,000 entries, and the user wants to interpolate 1 point. If we can assume the array is sorted, then we can find which bin the 1 point falls into by using binary search (np.searchsorted), which will require examining ~log2(1,000,000) = 20 entries in the array. Checking that it's sorted, though, will require examining all 1,000,000 points -- it turns an O(log n) operation into an O(n) operation. And if this is being called as part of some larger algorithm, this effect can cascade -- if it gets called m times from a loop, now that's O(mn) instead of (m log n), etc. That's often the difference between tractable and intractable.
If you submit a PR that gives interp the option to check for monotonicity, then we'll almost certainly merge it :-). If you also make it the default then there might be some debate, though I'd argue for it...
Whether interp should accept descending inputs is a separate issue and probably more contentious; I'm weakly against it myself, as unnecessary magic when you can just say np.interp(x, xp[::-1], fp[::-1]).
-n

On 05.06.2013 16:33, Nathaniel Smith wrote:
On Wed, Jun 5, 2013 at 3:16 PM, Slavin, Jonathan jslavin@cfa.harvard.edu wrote:
The simplest monotonicity test that I've seen is:
dx = np.diff(x) monotonic = np.all(dx < 0.) or np.all(dx > 0.)
I expect that this is pretty fast, though I haven't tested it yet. If we want to make checking optional, then I think the default should be to check with the option to skip the check.
The slow down people are worried about is, suppose that 'xp' has 1,000,000 entries, and the user wants to interpolate 1 point. If we can assume the array is sorted, then we can find which bin the 1 point falls into by using binary search (np.searchsorted), which will require examining ~log2(1,000,000) = 20 entries in the array. Checking that it's sorted, though, will require examining all 1,000,000 points -- it turns an O(log n) operation into an O(n) operation. And if this is being called as part of some larger algorithm, this effect can cascade -- if it gets called m times from a loop, now that's O(mn) instead of (m log n), etc. That's often the difference between tractable and intractable.
If you submit a PR that gives interp the option to check for monotonicity, then we'll almost certainly merge it :-). If you also make it the default then there might be some debate, though I'd argue for it...
I would not like it when the performance goes from log to linear by default. It is documented that the coordinates must be increasing after all.
How about as a compromise the function checks one or two closest neighbors around the interpolation point and warns/errors if those are not monotonic.
Its not fool proof but should at least catch the very wrong cases with an acceptable performance loss.

On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor jtaylor.debian@googlemail.com wrote:
On 05.06.2013 16:33, Nathaniel Smith wrote:
The slow down people are worried about is, suppose that 'xp' has 1,000,000 entries, and the user wants to interpolate 1 point. If we can assume the array is sorted, then we can find which bin the 1 point falls into by using binary search (np.searchsorted), which will require examining ~log2(1,000,000) = 20 entries in the array. Checking that it's sorted, though, will require examining all 1,000,000 points -- it turns an O(log n) operation into an O(n) operation. And if this is being called as part of some larger algorithm, this effect can cascade -- if it gets called m times from a loop, now that's O(mn) instead of (m log n), etc. That's often the difference between tractable and intractable.
If you submit a PR that gives interp the option to check for monotonicity, then we'll almost certainly merge it :-). If you also make it the default then there might be some debate, though I'd argue for it...
I would not like it when the performance goes from log to linear by default. It is documented that the coordinates must be increasing after all.
I agree, I don't like it either. But the problem is there are two contradictory things and I don't like either of them: 1) performance going from log to linear by default (for a subset of somewhat extreme cases) 2) people silently getting the wrong results (and according to reports in this thread, the warning in the docs does not actually prevent this)
The question is which one do we dislike more. My feeling is that in the cases where (1) comes up, it will annoy people and get them to check the docs and find the go-faster switch, while the first warning of (2) is when your spaceship crashes, so we should worry more about (2).
How about as a compromise the function checks one or two closest neighbors around the interpolation point and warns/errors if those are not monotonic.
Its not fool proof but should at least catch the very wrong cases with an acceptable performance loss.
There are tons of ways for data to accidentally end up sorted within each local region, but unsorted overall, e.g., if you're combining results from parallel simulation runs. Such data would systematically pass this check, but still give nonsensical answers.
-n

On Wed, Jun 5, 2013 at 11:48 AM, Nathaniel Smith njs@pobox.com wrote:
On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor jtaylor.debian@googlemail.com wrote:
On 05.06.2013 16:33, Nathaniel Smith wrote:
The slow down people are worried about is, suppose that 'xp' has 1,000,000 entries, and the user wants to interpolate 1 point. If we can assume the array is sorted, then we can find which bin the 1 point falls into by using binary search (np.searchsorted), which will require examining ~log2(1,000,000) = 20 entries in the array. Checking that it's sorted, though, will require examining all 1,000,000 points -- it turns an O(log n) operation into an O(n) operation. And if this is being called as part of some larger algorithm, this effect can cascade -- if it gets called m times from a loop, now that's O(mn) instead of (m log n), etc. That's often the difference between tractable and intractable.
If you submit a PR that gives interp the option to check for monotonicity, then we'll almost certainly merge it :-). If you also make it the default then there might be some debate, though I'd argue for it...
I would not like it when the performance goes from log to linear by
default.
It is documented that the coordinates must be increasing after all.
I agree, I don't like it either. But the problem is there are two contradictory things and I don't like either of them:
- performance going from log to linear by default (for a subset of
somewhat extreme cases) 2) people silently getting the wrong results (and according to reports in this thread, the warning in the docs does not actually prevent this)
The question is which one do we dislike more. My feeling is that in the cases where (1) comes up, it will annoy people and get them to check the docs and find the go-faster switch, while the first warning of (2) is when your spaceship crashes, so we should worry more about (2).
How about as a compromise the function checks one or two closest neighbors around the interpolation point and warns/errors if those are not monotonic.
Its not fool proof but should at least catch the very wrong cases with an acceptable performance loss.
There are tons of ways for data to accidentally end up sorted within each local region, but unsorted overall, e.g., if you're combining results from parallel simulation runs. Such data would systematically pass this check, but still give nonsensical answers.
Some actual benchmarks might be useful to the discussion.
Chuck

On Wed, Jun 5, 2013 at 11:59 AM, Charles R Harris <charlesr.harris@gmail.com
wrote:
On Wed, Jun 5, 2013 at 11:48 AM, Nathaniel Smith njs@pobox.com wrote:
On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor jtaylor.debian@googlemail.com wrote:
On 05.06.2013 16:33, Nathaniel Smith wrote:
The slow down people are worried about is, suppose that 'xp' has 1,000,000 entries, and the user wants to interpolate 1 point. If we can assume the array is sorted, then we can find which bin the 1 point falls into by using binary search (np.searchsorted), which will require examining ~log2(1,000,000) = 20 entries in the array. Checking that it's sorted, though, will require examining all 1,000,000 points -- it turns an O(log n) operation into an O(n) operation. And if this is being called as part of some larger algorithm, this effect can cascade -- if it gets called m times from a loop, now that's O(mn) instead of (m log n), etc. That's often the difference between tractable and intractable.
If you submit a PR that gives interp the option to check for monotonicity, then we'll almost certainly merge it :-). If you also make it the default then there might be some debate, though I'd argue for it...
I would not like it when the performance goes from log to linear by
default.
It is documented that the coordinates must be increasing after all.
I agree, I don't like it either. But the problem is there are two contradictory things and I don't like either of them:
- performance going from log to linear by default (for a subset of
somewhat extreme cases) 2) people silently getting the wrong results (and according to reports in this thread, the warning in the docs does not actually prevent this)
The question is which one do we dislike more. My feeling is that in the cases where (1) comes up, it will annoy people and get them to check the docs and find the go-faster switch, while the first warning of (2) is when your spaceship crashes, so we should worry more about (2).
How about as a compromise the function checks one or two closest neighbors around the interpolation point and warns/errors if those are not monotonic.
Its not fool proof but should at least catch the very wrong cases with an acceptable performance loss.
There are tons of ways for data to accidentally end up sorted within each local region, but unsorted overall, e.g., if you're combining results from parallel simulation runs. Such data would systematically pass this check, but still give nonsensical answers.
Some actual benchmarks might be useful to the discussion.
And perhaps an inplace C function ismonotone would be generally useful.
Chuck

On Wed, Jun 5, 2013 at 12:00 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
On Wed, Jun 5, 2013 at 11:59 AM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Wed, Jun 5, 2013 at 11:48 AM, Nathaniel Smith njs@pobox.com wrote:
On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor jtaylor.debian@googlemail.com wrote:
On 05.06.2013 16:33, Nathaniel Smith wrote:
The slow down people are worried about is, suppose that 'xp' has 1,000,000 entries, and the user wants to interpolate 1 point. If we can assume the array is sorted, then we can find which bin the 1 point falls into by using binary search (np.searchsorted), which will require examining ~log2(1,000,000) = 20 entries in the array. Checking that it's sorted, though, will require examining all 1,000,000 points -- it turns an O(log n) operation into an O(n) operation. And if this is being called as part of some larger algorithm, this effect can cascade -- if it gets called m times from a loop, now that's O(mn) instead of (m log n), etc. That's often the difference between tractable and intractable.
If you submit a PR that gives interp the option to check for monotonicity, then we'll almost certainly merge it :-). If you also make it the default then there might be some debate, though I'd argue for it...
I would not like it when the performance goes from log to linear by
default.
It is documented that the coordinates must be increasing after all.
I agree, I don't like it either. But the problem is there are two contradictory things and I don't like either of them:
- performance going from log to linear by default (for a subset of
somewhat extreme cases) 2) people silently getting the wrong results (and according to reports in this thread, the warning in the docs does not actually prevent this)
The question is which one do we dislike more. My feeling is that in the cases where (1) comes up, it will annoy people and get them to check the docs and find the go-faster switch, while the first warning of (2) is when your spaceship crashes, so we should worry more about (2).
How about as a compromise the function checks one or two closest neighbors around the interpolation point and warns/errors if those are not monotonic.
Its not fool proof but should at least catch the very wrong cases with an acceptable performance loss.
There are tons of ways for data to accidentally end up sorted within each local region, but unsorted overall, e.g., if you're combining results from parallel simulation runs. Such data would systematically pass this check, but still give nonsensical answers.
Some actual benchmarks might be useful to the discussion.
And perhaps an inplace C function ismonotone would be generally useful.
Or make greater.reduce operate correctly.
Chuck
participants (4)
-
Charles R Harris
-
Julian Taylor
-
Nathaniel Smith
-
Slavin, Jonathan