[Numpy-discussion] suggested change of behavior for interp
Charles R Harris
charlesr.harris at gmail.com
Wed Jun 5 13:59:09 EDT 2013
On Wed, Jun 5, 2013 at 11:48 AM, Nathaniel Smith <njs at pobox.com> wrote:
> On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor
> <jtaylor.debian at googlemail.com> wrote:
> > On 05.06.2013 16:33, Nathaniel Smith wrote:
> >> The slow down people are worried about is, suppose that 'xp' has
> >> 1,000,000 entries, and the user wants to interpolate 1 point. If we
> >> can assume the array is sorted, then we can find which bin the 1 point
> >> falls into by using binary search (np.searchsorted), which will
> >> require examining ~log2(1,000,000) = 20 entries in the array. Checking
> >> that it's sorted, though, will require examining all 1,000,000 points
> >> -- it turns an O(log n) operation into an O(n) operation. And if this
> >> is being called as part of some larger algorithm, this effect can
> >> cascade -- if it gets called m times from a loop, now that's O(mn)
> >> instead of (m log n), etc. That's often the difference between
> >> tractable and intractable.
> >>
> >> If you submit a PR that gives interp the option to check for
> >> monotonicity, then we'll almost certainly merge it :-). If you also
> >> make it the default then there might be some debate, though I'd argue
> >> for it...
> >
> > I would not like it when the performance goes from log to linear by
> default.
> > It is documented that the coordinates must be increasing after all.
>
> I agree, I don't like it either. But the problem is there are two
> contradictory things and I don't like either of them:
> 1) performance going from log to linear by default (for a subset of
> somewhat extreme cases)
> 2) people silently getting the wrong results (and according to reports
> in this thread, the warning in the docs does not actually prevent
> this)
>
> The question is which one do we dislike more. My feeling is that in
> the cases where (1) comes up, it will annoy people and get them to
> check the docs and find the go-faster switch, while the first warning
> of (2) is when your spaceship crashes, so we should worry more about
> (2).
>
> > How about as a compromise the function checks one or two closest
> > neighbors around the interpolation point and warns/errors if those are
> > not monotonic.
> >
> > Its not fool proof but should at least catch the very wrong cases with
> > an acceptable performance loss.
>
> There are tons of ways for data to accidentally end up sorted within
> each local region, but unsorted overall, e.g., if you're combining
> results from parallel simulation runs. Such data would systematically
> pass this check, but still give nonsensical answers.
>
Some actual benchmarks might be useful to the discussion.
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130605/8c3c674d/attachment.html>
More information about the NumPy-Discussion
mailing list