On 3/4/20 11:07 PM, Andrew Barnert via Python-ideas wrote:
On Mar 4, 2020, at 19:12, Richard Damon Richard@damon-family.org wrote:
Yes, because of the NaN issue, you sort of need an 'Almost Total Order' and 'Really Truly a Total Order', the first allowing the small exception of a very limited (maybe only one) special value that breaks the true definition of Total Order so that we could call Floats to be an Almost Total Order.
The obvious answer is to have PartialOrder < NaNOrder < TotalOrder, where a type is NaN-ordered if it’s partially ordered, and its subset of all self-equal objects is totally ordered.
I can’t imagine when a generic AlmostTotalOrder that didn’t specify how it fails could be useful; a NaNOrder is useful all over the place. A median that ignores NaN values requires NaNOrdered input. Even a median that does something stupid with NaN—see below.
Of course there might be other kinds of almost total orders that might be useful. If anyone runs into one, they can write their own ABC with the proper relationships to the stdlib ones. Or even propose it for the stdlib. But they couldn’t do anything useful with a general AlmostTotalOrder if it had to handle both NaN and whatever their different case was.
I think your NaNOrder is my Almost Total Order, except that the name NaNOrder implies that it is only NaN is the special case, and thus we are dealing with numbers, but there may well be some other user type that has a similar 'exceptional' value with properties like NaN, but since the items aren't numbers, the Not-A-Number tag isn't really appropriate.
This wouldn't help the median issue as having median reject being given Float values because they don't form a true Total Order would be much worse than the issue of it getting confused with NaNs.
Well, if you want to handle floats and Decimals and do something stupid with NaN values so the user has to be careful, it seems to me the right thing to do is require NaNOrder but document that you do something stupid with NaN values so the user has to be careful.
Yes, that is the idea of AlmostTotalOrder, to have algorithms that really require a total order (like sorting) but we really need to use a type that has these exceptional values. Imagine that sort/median was defined to type check its parameter, and that meant that you couldn't take the median of a list of floats (because float has the NaN value that breaks TotalOrder).
The presence of NaN in the float system does add a significant complexity to dealing with floating point numbers. Sometimes I wonder if since Python supports dynamic typing of results, might not do better by removing the NaN value from Floats and Decimals, and make the operations that generate the NaN generate an object of a special NaN type.
The thing is, in most cases you’d be duck typing and get no benefit, because float and floatnan both have the same methods, etc. Sure, when you’re type-checking with isinstance you could choose whether to check float|floatnan or just float, but how often do you write such type checks? And if you really want that, you could write a Float ABC that does the same thing, although IIRC ABCs only have a subclasshook rather than an instancehook so you need a new metaclass:
class FloatMeta(ABCMeta): def __instancecheck__(self, instance): return isinstance(instance, float) and not math.isnan(instance) class Float(metaclass=FloatMeta): pass
People have experimented a lot with similar things and beyond in static type systems: you can define a new type which is just the non-NaN floats, or just the finite floats, or just the floats from -1 to 1, or just the three floats -1 or 0 or 1, and the compiler can check that you’re always using it safely. (If you call a function that takes one of those non-NaN floats with a float, unless the compiler can see an if not isnan(x) or a pattern match that excludes NaN or something else that guarantees correctness, it’s a TypeError.) Fitting this into the type algebra (& and | and -) is pretty easy. I’m not aware of anyone translating that idea to dynamic type systems, but it could be interesting.
The problem with trying to make NaN a special type in a static typed language is that suddenly you can't define any of your variables to be of type 'float' if they might have a NaN value, as that would have the wrong type to store it. Your basic fundamental float type suddenly needs to use the dynamic typing system within the static type system that handles type hierarchies. That would be a BIG efficiency hit in a language like C++. A Dynamically typed language like Python, since it already takes that hit (and leverages that it always takes the hit to add functionality) has much more flexibility here.
I will add, that the whole concept of the NaN value was an invention to handle the problem that a totally statically typed language, like C, couldn't efficiently handle those sorts of errors except by a funny value that didn't really act like a floating point number. Except for the fact that so many people have gotten used to the fact that certain erroneous math computation create this weird NaN value that we think of as a number but also not really a number, there is no reason in a dynamically typed language that this error code needs to be considered to actually be a Float value.