On 3/5/20 1:37 PM, Andrew Barnert via Python-ideas wrote:
On Mar 5, 2020, at 05:24, Richard Damon <Richard@damon-family.org> wrote:
On 3/4/20 11:07 PM, Andrew Barnert via Python-ideas wrote:
On Mar 4, 2020, at 19:12, Richard Damon <Richard@damon-family.org> wrote: Yes, because of the NaN issue, you sort of need an 'Almost Total Order' and 'Really Truly a Total Order', the first allowing the small exception of a very limited (maybe only one) special value that breaks the true definition of Total Order so that we could call Floats to be an Almost Total Order. The obvious answer is to have PartialOrder < NaNOrder < TotalOrder, where a type is NaN-ordered if it’s partially ordered, and its subset of all self-equal objects is totally ordered.
I can’t imagine when a generic AlmostTotalOrder that didn’t specify how it fails could be useful; a NaNOrder is useful all over the place. A median that ignores NaN values requires NaNOrdered input. Even a median that does something stupid with NaN—see below.
Of course there might be other kinds of almost total orders that might be useful. If anyone runs into one, they can write their own ABC with the proper relationships to the stdlib ones. Or even propose it for the stdlib. But they couldn’t do anything useful with a general AlmostTotalOrder if it had to handle both NaN and whatever their different case was. I think your NaNOrder is my Almost Total Order, except that the name NaNOrder implies that it is only NaN is the special case, and thus we are dealing with numbers, but there may well be some other user type that has a similar 'exceptional' value with properties like NaN, but since the items aren't numbers, the Not-A-Number tag isn't really appropriate. The important thing is that it implies that there’s a subset of values that are totally ordered, and that the only values outside that subset are not self-equal. This means, for example, that you can test whether set is actually orderable in linear time with all(x==x for x in xs). So you can write a median that skips NaN values, or raises on them, just by checking each value as it comes up. But your almost total order only implies that there are some values that break the total order in some way. So you can only test whether a set is actually orderable in quadratic time with all(not(x<y and y<x) for x in xs for y in xs)). The only way to raise in bad values is that quadratic check. There is no way to skip bad values, because even when the check fails, you can’t tell whether it’s x or y that’s bad. So it’s a useless notion.
The other important thing is that NaN order comes up all the time when sorting float, Decimal, Pandas datetime (its NaT has NaN semantics), etc. Of course you could imagine other almost total orders, even if nobody ever comes up with one when asked. Here’s one: the projective reals (real values extended with a single infinity that’s both negative and positive) are almost totally ordered, except that inf is not comparable to anything but itself (which it’s equal to). You could easily write a sort or median or whatever algorithm that does something useful with this order. But only if you know it’s this order. You can’t write something that’s useful with both this order and NaN order. So almost total still doesn’t help with anything; you need to write a new ABC that’s also a subclass of Partial and superclass of Total but has no relation to NaN.
Yes, the Almost Total Order classification needs a bit of work for a full definition. One simple one is that you have functions that really want Total Order classes, but will take an Almost Total Order with a promise that you aren't giving it any of the funny values. Is slightly more involved version might provide a testing function to all a routine to verify this promise. It is quite possible to write something that does something reasonable with an Almost Total Order, even without knowing the characteristics of the breakdown of total order, it the usage is defined as only working on the values that don't include those that break the total order. Float with its NaN issue is a good common example, many programs deal with floats, and don't intend to have NaNs creep in, so treating the float class as being float - nans and thus a total order might be reasonable. -- Richard Damon