[Numpy-discussion] sorting -inf, nan, inf

Tim Hochberg tim.hochberg at ieee.org
Tue Sep 19 17:40:21 EDT 2006


A. M. Archibald wrote:
> On 19/09/06, Tim Hochberg <tim.hochberg at ieee.org> wrote:
>
>   
>> I'm not sure where the breakpoint is, but I was seeing failures for all
>> three sort types with N as high as 10000. I suspect that they're all
>> broken in the presence of  NaNs.  I further suspect you'd need some
>> punishingly slow n**2 algorithm to be robust in the presence of NaNs.
>>     
>
> Not at all. Just temporarily make NaNs compare greater than any other
> floating-point value and use quicksort (say). You can even do this for
> python lists without much trouble.
>   
I misspoke. What I meant here was keeping the behavior that people think 
that we already have but don't: NaNs stay in place and everything is 
sorted around them. And even that's not true, since you could just 
record where the NaNs are, remove them, sort and put them back. What I 
was really getting at was, that I'm guessing, and it's just a guess, 
that (a) none of the fast sorting algorithms do anything sensible unless 
special cased and (b) one could come up with a naive n**2 sort that does 
do something sensible without special casing (where sensible means leave 
the NaNs alone).
> That's actually a viable suggestion for numpy's sorting, although it
> would be kind of ugly to implement: do a quick any(isnan(A)), and if
> not, use the fast stock sorting algorithms; if there is a NaN
> somewhere in the array, use a version of the sort that has a tweaked
> comparison function so the NaNs wind up at the end and are easy to
> trim off.
>
> But the current situation, silently returning arrays in which the
> non-NaNs are unsorted, is really bad.
>   
If your going to do isnan anyway, why not just raise an exception. An 
array with NaNs in it can't be sorted by any common sense definition of 
sorting. Any treatment of NaNs is going to be arbitrary, so we might as 
well make the user specify what they want. "In the face of ambiguity, 
refuse the temptation to guess" and all that.

My favorite solution would be to make sort respect the invalid mode of 
seterr/geterr. However at the moment it doesn't seem to (in beta4 at 
least) but neither does add or multiply so those probably need to be 
looked at again....

-tim







More information about the NumPy-Discussion mailing list