[Python-ideas] Re: NAN handling in statistics functions

Aug. 29, 2021

      On Mon, Aug 30, 2021 at 1:33 PM Steven D'Aprano <steve@pearwood.info> wrote:
...
However we could add a function, totalorder, which can be used as a key
function to force an order on NANs. The 2008 version of the IEEE-754
standard recommends such a function:
from some_module import totalorder
    sorted([4, nan, 2, 5, 1, nan, 3, 0], key=totalorder)
    # --> [nan, nan, 0, 1, 2, 3, 4, 5]
It would be nice if such a totalorder function worked correctly on both
floats and Decimals. Anyone feel up to writing one?
I really don't feel like buying the standards document itself, so I'm
going based on this, which appears to be quoting the standard:

https://github.com/rust-lang/rust/issues/5585

Based on that, I don't think it's possible to have a totalorder
function that will work 100% correctly on float and Decimal in a
mixture. I suspect it's not even possible to make it generic while
still being fully compliant.

The differences from the vanilla less-than operator are:

1) Positive zero sorts after negative zero
2) NaNs sort at one end or the other depending on their sign bit
3) Signalling NaNs are closer to zero than quiet
4) NaNs are sorted by payload
5) Different representations of the same value are distinguished by
their exponents. I'm not sure when that would come up.

So here are two partial implementations.

1) Ensure that NaNs are at the end, but otherwise unchanged.
Compatible with all numeric types.

def simpleorder(val): return (val != val, val)

2) Acknowledge signs on NaNs and zeroes. Compatible with floats, not
sure about Decimals. I haven't figured out how to make a negative NaN
in Decimal.

def floatorder(val): return (math.copysign(1, val), val != val, val)

Neither is fully compliant. A signalling NaN will probably cause an
error. (I have no idea. Never worked with them.) NaN payloads... I
don't know how to access those in Python other than with ctypes. And
if two floats represent the same number, under what circumstances
could their exponents differ?

I doubt we'll get a fully compliant implementation in Python. If one
is to exist, it'd probably be best to write it in C, using someone
else's code:

https://www.gnu.org/software/libc/manual/html_node/FP-Comparison-Functions.h...

And it would be specific to float, not Decimal, which would need a
completely different implementation. I'm not sure how many of the same
concepts even exist.

TBH, I would just use simpleorder for most situations. It's simple,
easy, and doesn't care about data types. All NaNs get shoved to the
end, everything else gets compared normally.

ChrisA

[Python-ideas] Re: NAN handling in statistics functions

Chris Angelico