[Numpy-discussion] sorting -inf, nan, inf
Tim Hochberg
tim.hochberg at ieee.org
Tue Sep 19 17:40:21 EDT 2006
A. M. Archibald wrote:
> On 19/09/06, Tim Hochberg <tim.hochberg at ieee.org> wrote:
>
>
>> I'm not sure where the breakpoint is, but I was seeing failures for all
>> three sort types with N as high as 10000. I suspect that they're all
>> broken in the presence of NaNs. I further suspect you'd need some
>> punishingly slow n**2 algorithm to be robust in the presence of NaNs.
>>
>
> Not at all. Just temporarily make NaNs compare greater than any other
> floating-point value and use quicksort (say). You can even do this for
> python lists without much trouble.
>
I misspoke. What I meant here was keeping the behavior that people think
that we already have but don't: NaNs stay in place and everything is
sorted around them. And even that's not true, since you could just
record where the NaNs are, remove them, sort and put them back. What I
was really getting at was, that I'm guessing, and it's just a guess,
that (a) none of the fast sorting algorithms do anything sensible unless
special cased and (b) one could come up with a naive n**2 sort that does
do something sensible without special casing (where sensible means leave
the NaNs alone).
> That's actually a viable suggestion for numpy's sorting, although it
> would be kind of ugly to implement: do a quick any(isnan(A)), and if
> not, use the fast stock sorting algorithms; if there is a NaN
> somewhere in the array, use a version of the sort that has a tweaked
> comparison function so the NaNs wind up at the end and are easy to
> trim off.
>
> But the current situation, silently returning arrays in which the
> non-NaNs are unsorted, is really bad.
>
If your going to do isnan anyway, why not just raise an exception. An
array with NaNs in it can't be sorted by any common sense definition of
sorting. Any treatment of NaNs is going to be arbitrary, so we might as
well make the user specify what they want. "In the face of ambiguity,
refuse the temptation to guess" and all that.
My favorite solution would be to make sort respect the invalid mode of
seterr/geterr. However at the moment it doesn't seem to (in beta4 at
least) but neither does add or multiply so those probably need to be
looked at again....
-tim
More information about the NumPy-Discussion
mailing list