[Numpy-discussion] suggested change of behavior for interp
Charles R Harris
charlesr.harris at gmail.com
Wed Jun 5 14:00:57 EDT 2013
On Wed, Jun 5, 2013 at 11:59 AM, Charles R Harris <charlesr.harris at gmail.com
> wrote:
>
>
> On Wed, Jun 5, 2013 at 11:48 AM, Nathaniel Smith <njs at pobox.com> wrote:
>
>> On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor
>> <jtaylor.debian at googlemail.com> wrote:
>> > On 05.06.2013 16:33, Nathaniel Smith wrote:
>> >> The slow down people are worried about is, suppose that 'xp' has
>> >> 1,000,000 entries, and the user wants to interpolate 1 point. If we
>> >> can assume the array is sorted, then we can find which bin the 1 point
>> >> falls into by using binary search (np.searchsorted), which will
>> >> require examining ~log2(1,000,000) = 20 entries in the array. Checking
>> >> that it's sorted, though, will require examining all 1,000,000 points
>> >> -- it turns an O(log n) operation into an O(n) operation. And if this
>> >> is being called as part of some larger algorithm, this effect can
>> >> cascade -- if it gets called m times from a loop, now that's O(mn)
>> >> instead of (m log n), etc. That's often the difference between
>> >> tractable and intractable.
>> >>
>> >> If you submit a PR that gives interp the option to check for
>> >> monotonicity, then we'll almost certainly merge it :-). If you also
>> >> make it the default then there might be some debate, though I'd argue
>> >> for it...
>> >
>> > I would not like it when the performance goes from log to linear by
>> default.
>> > It is documented that the coordinates must be increasing after all.
>>
>> I agree, I don't like it either. But the problem is there are two
>> contradictory things and I don't like either of them:
>> 1) performance going from log to linear by default (for a subset of
>> somewhat extreme cases)
>> 2) people silently getting the wrong results (and according to reports
>> in this thread, the warning in the docs does not actually prevent
>> this)
>>
>> The question is which one do we dislike more. My feeling is that in
>> the cases where (1) comes up, it will annoy people and get them to
>> check the docs and find the go-faster switch, while the first warning
>> of (2) is when your spaceship crashes, so we should worry more about
>> (2).
>>
>> > How about as a compromise the function checks one or two closest
>> > neighbors around the interpolation point and warns/errors if those are
>> > not monotonic.
>> >
>> > Its not fool proof but should at least catch the very wrong cases with
>> > an acceptable performance loss.
>>
>> There are tons of ways for data to accidentally end up sorted within
>> each local region, but unsorted overall, e.g., if you're combining
>> results from parallel simulation runs. Such data would systematically
>> pass this check, but still give nonsensical answers.
>>
>
> Some actual benchmarks might be useful to the discussion.
>
And perhaps an inplace C function ismonotone would be generally useful.
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130605/101aa11a/attachment.html>
More information about the NumPy-Discussion
mailing list