Magnitude and ProtoMagnitude ABCs — primarily for argument validation
I have encountered cases in which I would like to validate that an argument can be properly compared with other instances of its type. This is true of numbers, strings, dates, … but not for `NoneClass`, `type`, …. One way that I have tried to handle this is to check whether the object can be compared to itself using `>`, `<`, `>=`, and `<=` and that it is neither `>` or `<` itself and is both `>=` and `<=` itself. The most glaring example of why this is insufficient is the `set` type. A `set` object meets all of those criteria, but given any 2 instances, it is not true that if set `a > b` is `False` then `a <= b` is `True`. The operators are not acting as comparisons of relative magnitude in this case but as tests for superset/subset relations — which is fine and good but doesn't help with this situation. What I think would be helpful is to have a `Magnitude` abstract base class that is a subclass of `ProtoMagnitude` (or whatever better names anyone can imagine). The subclass hook for `Magnitude` would return `True` for any class with instance methods for all of the rich comparison methods, but it would skip that check and return `False` for any real or virtual subclass of `ProtoMagnitude` (overridable by registering as a `Magnitude` subclass). The, `set` type would then be registered as a virtual base class of `ProtoMagnitude` but not `Magnitude` so that `issubclass(set, Magnitude)` would return `False`. For performance optimization, the module that defines these ABCs would register the obviously appropriate built-in and standard-lib types with `Magnitude`: `Number`, `str`, `list`, `tuple`, `date`, … Why not have this be a separate distribution package? This concept is only reliable if all of the code that makes use of it shares a common implementation. It does no good to register a class as `ProtoMagnitude`, for instance, if an instance of that will passed to code in another library that is unaware of the `ProtoMagnitude` and `Magnitude` ABCs in the package, or maybe has its own independent system for attempting to accomplish the same goal.
I think it’s usually called Orderable. It’s a useful concept in static type checking too (e.g. mypy), where we’d use it as an upper bound for type variables, if we had it. I guess to exclude sets you’d have to introduce TotalOrderable. On Tue, Mar 3, 2020 at 04:03 Steve Jorgensen <stevej@stevej.name> wrote:
I have encountered cases in which I would like to validate that an argument can be properly compared with other instances of its type. This is true of numbers, strings, dates, … but not for `NoneClass`, `type`, ….
One way that I have tried to handle this is to check whether the object can be compared to itself using `>`, `<`, `>=`, and `<=` and that it is neither `>` or `<` itself and is both `>=` and `<=` itself. The most glaring example of why this is insufficient is the `set` type. A `set` object meets all of those criteria, but given any 2 instances, it is not true that if set `a > b` is `False` then `a <= b` is `True`. The operators are not acting as comparisons of relative magnitude in this case but as tests for superset/subset relations — which is fine and good but doesn't help with this situation.
What I think would be helpful is to have a `Magnitude` abstract base class that is a subclass of `ProtoMagnitude` (or whatever better names anyone can imagine).
The subclass hook for `Magnitude` would return `True` for any class with instance methods for all of the rich comparison methods, but it would skip that check and return `False` for any real or virtual subclass of `ProtoMagnitude` (overridable by registering as a `Magnitude` subclass). The, `set` type would then be registered as a virtual base class of `ProtoMagnitude` but not `Magnitude` so that `issubclass(set, Magnitude)` would return `False`.
For performance optimization, the module that defines these ABCs would register the obviously appropriate built-in and standard-lib types with `Magnitude`: `Number`, `str`, `list`, `tuple`, `date`, …
Why not have this be a separate distribution package? This concept is only reliable if all of the code that makes use of it shares a common implementation. It does no good to register a class as `ProtoMagnitude`, for instance, if an instance of that will passed to code in another library that is unaware of the `ProtoMagnitude` and `Magnitude` ABCs in the package, or maybe has its own independent system for attempting to accomplish the same goal. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7WC4SF... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile)
Guido van Rossum wrote:
I think it’s usually called Orderable. It’s a useful concept in static type checking too (e.g. mypy), where we’d use it as an upper bound for type variables, if we had it. I guess to exclude sets you’d have to introduce TotalOrderable. On Tue, Mar 3, 2020 at 04:03 Steve Jorgensen stevej@stevej.name wrote:
I have encountered cases in which I would like to validate that an argument can be properly compared with other instances of its type. This is true of numbers, strings, dates, … but not for NoneClass, type, …. One way that I have tried to handle this is to check whether the object can be compared to itself using >, <, >=, and <= and that it is neither > or < itself and is both >= and <= itself. The most glaring example of why this is insufficient is the set type. A set object meets all of those criteria, but given any 2 instances, it is not true that if set a > b is False then a <= b is True. The operators are not acting as comparisons of relative magnitude in this case but as tests for superset/subset relations — which is fine and good but doesn't help with this situation. What I think would be helpful is to have a Magnitude abstract base class that is a subclass of ProtoMagnitude (or whatever better names anyone can imagine). The subclass hook for Magnitude would return True for any class with instance methods for all of the rich comparison methods, but it would skip that check and return False for any real or virtual subclass of ProtoMagnitude (overridable by registering as a Magnitude subclass). The, set type would then be registered as a virtual base class of ProtoMagnitude but not Magnitude so that issubclass(set, Magnitude) would return False. For performance optimization, the module that defines these ABCs would register the obviously appropriate built-in and standard-lib types with Magnitude: Number, str, list, tuple, date, … Why not have this be a separate distribution package? This concept is only reliable if all of the code that makes use of it shares a common implementation. It does no good to register a class as ProtoMagnitude, for instance, if an instance of that will passed to code in another library that is unaware of the ProtoMagnitude and Magnitude ABCs in the package, or maybe has its own independent system for attempting to accomplish the same goal.
Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7WC4SF... Code of Conduct: http://python.org/psf/codeofconduct/ -- --Guido (mobile)
Right. That's a much better term. `Orderable` and `ProtoOrderable`.
On Tue, Mar 3, 2020 at 10:43 AM Steve Jorgensen <stevej@stevej.name> wrote:
Guido van Rossum wrote:
I think it’s usually called Orderable. It’s a useful concept in static type checking too (e.g. mypy), where we’d use it as an upper bound for type variables, if we had it. I guess to exclude sets you’d have to introduce TotalOrderable.
Right. That's a much better term. `Orderable` and `ProtoOrderable`.
Or even PartialOrderable and Orderable. This would follow Rust's PartialOrd and Ord (https://doc.rust-lang.org/std/cmp/trait.PartialOrd.html and https://doc.rust-lang.org/std/cmp/trait.Ord.html). But beware, IIRC there are pathological cases involving floats, (long) ints and rounding where transitivity may be violated in Python (though I believe only Tim Peters can produce an example :-). I'm honestly not sure that that's enough to sink the idea. (If it were, NaN would be a bigger problem.) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On Tue, Mar 3, 2020 at 5:35 PM Guido van Rossum <guido@python.org> wrote:
But beware, IIRC there are pathological cases involving floats, (long) ints and rounding where transitivity may be violated in Python (though I believe only Tim Peters can produce an example :-). I'm honestly not sure that that's enough to sink the idea. (If it were, NaN would be a bigger problem.)
Floats cannot violate transitivity of inequality. But they are also not a total order when you include nans and infs (which are part of floats). I'm not as certain about unbounded ints, but I would be pretty surprised if they could somehow violate transitivity either. Still, including floats in TotalOrderableExceptWeirdValues is fine... and that can alias Orderable for practical purposes. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
Guido van Rossum wrote:
Guido van Rossum wrote: I think it’s usually called Orderable. It’s a useful concept in static type checking too (e.g. mypy), where we’d use it as an upper bound for type variables, if we had it. I guess to exclude sets you’d have to introduce TotalOrderable. Right. That's a much better term. Orderable and ProtoOrderable. Or even PartialOrderable and Orderable. This would follow Rust's PartialOrd and Ord (https://doc.rust-lang.org/std/cmp/trait.PartialOrd.html and https://doc.rust-lang.org/std/cmp/trait.Ord.html). But beware, IIRC there are pathological cases involving floats, (long) ints and rounding where transitivity may be violated in Python (though I believe only Tim Peters can produce an example :-). I'm honestly not sure that
On Tue, Mar 3, 2020 at 10:43 AM Steve Jorgensen stevej@stevej.name wrote: that's enough to sink the idea. (If it were, NaN would be a bigger problem.)
Yeah. Violations of transitivity are already breaking their contracts, so having a new way of expressing the contract has no affect on that. The problem I came up with trying to spike out my proposal last night is that there doesn't seem to be anyway to implement it without creating infinite recursion in the `issublcass` call. If I make `Orderable` a real or virtual subclass of `ProtoOrderable` and `Orderable`'s `__subclasshook__` or metaclass `__subclasscheck__` (I tried both ways) tries to check whether `C` is a subclass of `ProtoOrderable`, then an infinite recursion occurs. It wasn't immediately obvious to me why that is the case, but when I thought about it deeply, I can see why that must happen. An alternative that I thought about previously but seems very smelly to me for several reasons is to have both `Orderable` and `NonOrderable` ABCs. In that case, what should be done to prevent a class from being both orderable and non-orderable or figure out which should take precedence in that case? As a meta-solution (wild-assed idea) what if metaclass registration could accept keyword arguments, similar to passing keyword arguments to a class definition? That way, a single ABC (`ProtoOrderable` or whatever better name) could be a real or virtual subclass that is explicitly orderable or non-orderable depending on `orderable=<True/False>`.
[Guido]
But beware, IIRC there are pathological cases involving floats, (long) ints and rounding where transitivity may be violated in Python (though I believe only Tim Peters can produce an example :-).
Not anymore ;-) That is, while comparisons mixing bigints and floats may have suffered rounding surprises long ago, all cases of float-or-int compared-to float-or-int now act as if infinite precision were used. You can throw fractions.Fraction and (I believe) decimal.Decimal into that too. Except for NaNs in the floating types, there's a well-behaved total ordering now (whether doing mixed-type or intra-type comparisons). That doesn't make it surprise-free, though. For example, just last week someone on bugs.python.org flipped out ;-) upon learning
10**100 == 1e100 False 10**100 < 1e100 True
Because, indeed, they don't denote the same "mathematical" value:
int(1e100) 10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104 hex(_) '0x1249ad2594c37d0000000000000000000000000000000000000000000000000000000000000000000000' (1e100).hex() '0x1.249ad2594c37dp+332'
On 2020-03-04 00:58, Tim Peters wrote:
[Guido]
But beware, IIRC there are pathological cases involving floats, (long) ints and rounding where transitivity may be violated in Python (though I believe only Tim Peters can produce an example :-).
Not anymore ;-) That is, while comparisons mixing bigints and floats may have suffered rounding surprises long ago, all cases of float-or-int compared-to float-or-int now act as if infinite precision were used. You can throw fractions.Fraction and (I believe) decimal.Decimal into that too. Except for NaNs in the floating types, there's a well-behaved total ordering now (whether doing mixed-type or intra-type comparisons).
That doesn't make it surprise-free, though. For example, just last week someone on bugs.python.org flipped out ;-) upon learning
10**100 == 1e100 False 10**100 < 1e100 True
Because, indeed, they don't denote the same "mathematical" value:
int(1e100) 10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104 hex(_) '0x1249ad2594c37d0000000000000000000000000000000000000000000000000000000000000000000000' (1e100).hex() '0x1.249ad2594c37dp+332'
In the Ada programming language, 1e100 is an integer, whereas 1.0e100 is a float, so if Python has done that example, 10**100 == 1e100 would indeed be True.
Greg Ewing wrote:
On 4/03/20 7:42 am, Steve Jorgensen wrote:
That's a much better term. Orderable and ProtoOrderable. I would suggest "TotallyOrdered" and "PartiallyOrdered".
Possibly, but the reasoning is not obvious to me. Can you explain? I get that `TotallyOrdered` is consistent with https://docs.python.org/2/library/functools.html#functools.total_ordering, but I don't get the `PartialyOrdered` term. In case I was not sufficiently clear about my proposal (just making sure) the `Proto`… in my concept simply means that the determination of whether the class is orderable is explicit and not determined by whether the rich comparison methods are present. A class that has `ProtoOrderable` but not `Orderable` as an actual or virtual subclass is not orderable, but a class that is not a sublcass of either is assumed to be orderable if it implements all the rich comparison methods.
On Wed, Mar 4, 2020 at 6:04 PM Steve Jorgensen <stevej@stevej.name> wrote:
Greg Ewing wrote:
On 4/03/20 7:42 am, Steve Jorgensen wrote:
That's a much better term. Orderable and ProtoOrderable. I would suggest "TotallyOrdered" and "PartiallyOrdered".
Possibly, but the reasoning is not obvious to me. Can you explain? I get that `TotallyOrdered` is consistent with https://docs.python.org/2/library/functools.html#functools.total_ordering, but I don't get the `PartialyOrdered` term.
It's a mathematical concept: https://en.wikipedia.org/wiki/Partially_ordered_set "Partially ordered" means you can compare pairs of elements and find which one comes first. "Totally ordered" means you can compare ANY pair of elements, and you'll always know which comes first. ChrisA
Chris Angelico wrote:
On Wed, Mar 4, 2020 at 6:04 PM Steve Jorgensen stevej@stevej.name wrote: <snip> https://en.wikipedia.org/wiki/Partially_ordered_set "Partially ordered" means you can compare pairs of elements and find which one comes first. "Totally ordered" means you can compare ANY pair of elements, and you'll always know which comes first. ChrisA
Ah. Good to know. I don't think "Partially ordered" actually applies, then, because that still seems to imply that transitivity would apply to comparisons between any given pair of objects. Simply having implementations of all the rich comparison operators does not make that true, however, and in particular, that's not true for sets. If we consider just the sets `{1, 2}` and `{1, 3}`, … ``` In [1]: {1, 2} < {1, 3} Out[1]: False In [2]: {1, 2} >= {1, 3} Out[2]: False ``` Neither is a subset of the other, so both of those tests return `False`.
Steve Jorgensen wrote:
Chris Angelico wrote:
On Wed, Mar 4, 2020 at 6:04 PM Steve Jorgensen stevej@stevej.name wrote: <snip> https://en.wikipedia.org/wiki/Partially_ordered_set "Partially ordered" means you can compare pairs of elements and find which one comes first. "Totally ordered" means you can compare ANY pair of elements, and you'll always know which comes first. ChrisA Ah. Good to know. I don't think "Partially ordered" actually applies, then, because that still seems to imply that transitivity would apply to comparisons between any given pair of objects. Simply having implementations of all the rich comparison operators does not make that true, however, and in particular, that's not true for sets. If we consider just the sets {1, 2} and {1, 3}, … In [1]: {1, 2} < {1, 3} Out[1]: False
In [2]: {1, 2} >= {1, 3} Out[2]: False
Neither is a subset of the other, so both of those tests return False.
Ah. Maybe I'm arguing against a different point than what you were making then. Just because sets are not partially ordered does not mean that "partially ordered" is not a useful distinction in addition to "totally ordered". In that case, maybe the hierarchy would be something like… * ProtoOrdered (or ProtoOrderable): Orderability is explicit and never inferred. Unordered unless also a subclass of PartiallyOrdered or TotallyOrdered. * * PartiallyOrdered * * * TotallyOrdered An class that does not directly or virtually subclass any of those but implements all the rich comparison operators would be treated as an inferred virtual subclass of `TotallyOrdered`.
On Wed, Mar 4, 2020 at 8:24 AM Steve Jorgensen <stevej@stevej.name> wrote:
Chris Angelico wrote:
On Wed, Mar 4, 2020 at 6:04 PM Steve Jorgensen stevej@stevej.name wrote: <snip> https://en.wikipedia.org/wiki/Partially_ordered_set "Partially ordered" means you can compare pairs of elements and find which one comes first. "Totally ordered" means you can compare ANY pair of elements, and you'll always know which comes first. ChrisA
Ah. Good to know. I don't think "Partially ordered" actually applies, then, because that still seems to imply that transitivity would apply to comparisons between any given pair of objects. Simply having implementations of all the rich comparison operators does not make that true, however, and in particular, that's not true for sets.
Not quite: https://en.wikipedia.org/wiki/Partially_ordered_set#Examples (see example 2). Or: https://math.stackexchange.com/questions/1305004/what-is-meant-by-ordering-o... S. -- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyParis & PyData Paris - http://pyparis.org/ & http://pydata.fr/
On Wed, Mar 4, 2020 at 9:26 AM Steve Jorgensen <stevej@stevej.name> wrote:
Chris Angelico wrote:
On Wed, Mar 4, 2020 at 6:04 PM Steve Jorgensen stevej@stevej.name wrote: <snip> https://en.wikipedia.org/wiki/Partially_ordered_set "Partially ordered" means you can compare pairs of elements and find which one comes first. "Totally ordered" means you can compare ANY pair of elements, and you'll always know which comes first. ChrisA
Ah. Good to know. I don't think "Partially ordered" actually applies, then, because that still seems to imply that transitivity would apply to comparisons between any given pair of objects. Simply having implementations of all the rich comparison operators does not make that true, however, and in particular, that's not true for sets.
Set comparison is transitive. If A is a subset of B and B is a subset of C, A is a subset of C. Furthermore, Wikipedia gives sets as an example of a partial ordering. An ABC which means nothing more than "the comparison operators are implemented" is not very useful. ABCs are most useful when they tell the user that the implementations satisfy some contract.
If we consider just the sets `{1, 2}` and `{1, 3}`, … ``` In [1]: {1, 2} < {1, 3} Out[1]: False
In [2]: {1, 2} >= {1, 3} Out[2]: False ``` Neither is a subset of the other, so both of those tests return `False`.
From Wikipedia: "For *a, b*, elements of a partially ordered set *P*, if *a* ≤ *b* or *b* ≤ *a*, then *a* and *b* are *comparable <https://en.wikipedia.org/wiki/Comparability>*. Otherwise they are *incomparable*. In the figure on top-right, e.g. {x} and {x,y,z} are comparable, while {x} and {y} are not. A partial order under which every pair of elements is comparable is called a *total order <https://en.wikipedia.org/wiki/Totally_ordered_set>* or *linear order*;"
Stéfane Fermigier wrote:
On Wed, Mar 4, 2020 at 8:24 AM Steve Jorgensen stevej@stevej.name wrote:
Chris Angelico wrote: On Wed, Mar 4, 2020 at 6:04 PM Steve Jorgensen stevej@stevej.name wrote: <snip> https://en.wikipedia.org/wiki/Partially_ordered_set "Partially ordered" means you can compare pairs of elements and find which one comes first. "Totally ordered" means you can compare ANY pair of elements, and you'll always know which comes first. ChrisA Ah. Good to know. I don't think "Partially ordered" actually applies, then, because that still seems to imply that transitivity would apply to comparisons between any given pair of objects. Simply having implementations of all the rich comparison operators does not make that true, however, and in particular, that's not true for sets. Not quite: https://en.wikipedia.org/wiki/Partially_ordered_set#Examples (see example 2). Or: https://math.stackexchange.com/questions/1305004/what-is-meant-by-ordering-o... S.
Ah! That Wikipedia article is very helpful. I see that it is not necessary for all items in a partially ordered set to be comparable. Taking one step back out of the realm of mathematical definition, however, the original idea was simply to distinguish what I now understand to be "totally ordered" types from other types, be they "partially ordered" or unordered — not even having a full complement of rich comparison operators or having all but using them in weirder ways than sets do.
Taking one step back out of the realm of mathematical definition, however, the original idea was simply to distinguish what I now understand to be "totally ordered" types from other types, be they "partially ordered" or unordered — not even having a full complement of rich comparison operators or having all but using them in weirder ways than sets do.
It sounds like you want a mechanism to blacklist certain types (such as sets) from being considered totally ordered while assuming that everything else with the right operators is totally ordered. ABCs and classes in general are usually whitelists that inform the user that the implementation satisfies certain properties. They're explicit rather than implicit. If you want to validate arguments, you're being conservative and safe. It's safer to require that an argument is on a whitelist - if the user forgets the whitelist, the validation will fail, the user will be reminded, and they can fix it. For example they can register their class as a virtual subclass. If the user forgets to put their type on a blacklist, then the argument validation can't do its job. I suggest having ABCs PartiallyOrdered and TotallyOrdered, where: - TotallyOrdered is a subclass of PartiallyOrdered, as is the case mathematically. - Most familiar types with the operators are TotallyOrdered. - Sets are merely PartiallyOrdered. - sorted, min and max will require Iterable[TotallyOrdered]. - @functools.total_ordering automatically registers the class as TotallyOrdered. Some use cases probably exist that require PartiallyOrdered but where TotallyOrdered is optional and irrelevant. Users that want to take a generous blacklist approach can check that `<operators are present> and not (isinstance(x, PartiallyOrdered) and not isinstance(x, TotallyOrdered))` or equivalently `<operators are present> and (not isinstance(x, PartiallyOrdered) or isinstance(x, TotallyOrdered))` which would exclude sets and any other case where the person who defined the class thought about this stuff.
On Mar 4, 2020, at 00:07, Steve Jorgensen <stevej@stevej.name> wrote:
Taking one step back out of the realm of mathematical definition, however, the original idea was simply to distinguish what I now understand to be "totally ordered" types from other types, be they "partially ordered" or unordered — not even having a full complement of rich comparison operators or having all but using them in weirder ways than sets do.
Is there any commonly used or even imaginable useful type that uses them in weirder ways than set and float (which are both partially ordered) or np.array (where they aren’t even Boolean-values)? In particular, transitivity keeps coming up, but all of those examples are transitive (it’s never true that a<b and true that b<c without being true than a<c for any of them). If there are such uses it might be important to distinguish them, but if there aren’t, it doesn’t seem unreasonable for PartiallyOrdered to “wrongly” pick up hypothetical pathological types that no one will ever write in exchange for automatically being right about every actual type anyone uses. After all, Iterable is a virtual superclass of any type with __iter__, even if it returns the number 42 instead of an Iterator, and so on; technically every implicit ABC in Python is “wrong” like this, but in practice it doesn’t come up and implicit ABCs are very useful.
On Wed, Mar 4, 2020 at 9:22 AM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
Is there any commonly used or even imaginable useful type that uses them in weirder ways than set and float (which are both partially ordered) [...]
Nitpick: why do you say "partially ordered" for float? If we take NaNs into account, then the float type is neither partially nor totally ordered, since x <= x is False when x is a NaN. If we don't take NaNs into account (which is probably the way we'd want to go from a practicality standpoint), it's both partially and totally ordered. (To David Mertz's earlier comment, "not a total order when you include nans and infs", infinities don't affect the existence of a total or partial order here; it's only the NaNs that we have to worry about.) Mark
Is there any commonly used or even imaginable useful type that uses them in weirder ways than set and float (which are both partially ordered) or np.array (where they aren’t even Boolean-values)? In particular, transitivity keeps coming up, but all of those examples are transitive (it’s never true that a<b and true that b<c without being true than a<c for any of them). If there are such uses it might be important to distinguish them, but if there aren’t, it doesn’t seem unreasonable for PartiallyOrdered to “wrongly” pick up hypothetical pathological types that no one will ever write in exchange for automatically being right about every actual type anyone uses. After all, Iterable is a virtual superclass of any type with __iter__, even if it returns the number 42 instead of an Iterator, and so on; technically every implicit ABC in Python is “wrong” like this, but in practice it doesn’t come up and implicit ABCs are very useful.
I think that np.array is a great example of something I wouldn't want to be PartiallyOrdered. An algorithm which requires PartiallyOrdered probably needs to use the results of comparisons as booleans and for np.array that will fail immediately. Same thing for pandas Series and DataFrames. Another example which comes to mind is SQLAlchemy columns which produce part of a query for the ORM - these presumably can be used as booleans but not in an informative way. There are 3 possible ABCs: Comparable (has the operators, makes no promises about what they do), PartiallyOrdered, and TotallyOrdered. We could have all three. If we don't want that but we want to distinguish between TotallyOrdered and others, we need one of the other two. Which option is more useful? It's probably rare that someone specifically needs PartiallyOrdered and not the other two. But Comparable is just a protocol that anyone can implement. I should note that I've never personally needed any of this, I'm just trying to imagine use cases. But I think a 'negative' ABC as I believe was originally proposed is the wrong path.
This thread feels very much like a solution looking for a problem. But on one narrow point, I'm trying to think of everything in the standard library or builtins that actually forms a total order with elements of the same type. * Not floats * I think not strings with unicode canonical equivalence, decomposition, and normalization * Definitely not sets, tuples, lists, dicts, etc. * Arguably not Decimal because it is sensitive to decimal context * Complex clearly not * I think not datetimes under timezone issues (although any ordering is certainly *wrong* given the vagaries of timezones) * Queues, deques, etc. don't even try, nor should they * Do array.array's lack even a partial order? So really I think we are left with wanting a test for whether something is "int or Fraction". -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
On Mar 4, 2020, at 10:09 AM, David Mertz <mertz@gnosis.cx> wrote:
This thread feels very much like a solution looking for a problem.
But on one narrow point, I'm trying to think of everything in the standard library or builtins that actually forms a total order with elements of the same type.
* Not floats * I think not strings with unicode canonical equivalence, decomposition, and normalization * Definitely not sets, tuples, lists, dicts, etc. * Arguably not Decimal because it is sensitive to decimal context * Complex clearly not * I think not datetimes under timezone issues (although any ordering is certainly *wrong* given the vagaries of timezones) * Queues, deques, etc. don't even try, nor should they * Do array.array's lack even a partial order?
So really I think we are left with wanting a test for whether something is "int or Fraction".
I think strings do have the Total Order property, because string equality & etc does NOT get into the issues of Unicode canonical equivalence, I.e. the equality test is that they are the same sequence of code points, ignoring that multiple sequences of code points might be represent the same same canonical character, in part because there isn’t a single definition of that, as there is NFD/NFC and NFKD/NFKC equivalences which are different.
On Wed, Mar 4, 2020 at 10:22 AM Richard Damon <Richard@damon-family.org> wrote:
But on one narrow point, I'm trying to think of everything in the standard library or builtins that actually forms a total order with elements of the same type. * Not floats * I think not strings with unicode canonical equivalence, decomposition, and normalization * Definitely not sets, tuples, lists, dicts, etc. * Arguably not Decimal because it is sensitive to decimal context * Complex clearly not * I think not datetimes under timezone issues (although any ordering is certainly *wrong* given the vagaries of timezones) * Queues, deques, etc. don't even try, nor should they * Do array.array's lack even a partial order? So really I think we are left with wanting a test for whether something is "int or Fraction".
I think strings do have the Total Order property, because string equality & etc does NOT get into the issues of Unicode canonical equivalence, I.e. the equality test is that they are the same sequence of code points, ignoring that multiple sequences of code points might be represent the same same canonical character, in part because there isn’t a single definition of that, as there is NFD/NFC and NFKD/NFKC equivalences which are different.
Fair enough. I'm kinda conflating (kinda deliberately) the Python level of "what does the less-than operator do" with "what is meaningfully less that what for the actual problem?" And I'm also fudging the Decimal question too. The relationship between *expressions* can vary by decimal context. But not of a value named by a single variable. However, the expression can be quite simple:
with decimal.localcontext() as ctx: ... ctx.prec = 2 ... print(+a < b) True with decimal.localcontext() as ctx: ... ctx.prec = 10 ... print(+a < b) False
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
On Wed, Mar 4, 2020 at 4:08 PM David Mertz <mertz@gnosis.cx> wrote:
And I'm also fudging the Decimal question too. The relationship between *expressions* can vary by decimal context. But not of a value named by a single variable.
Right: Decimal objects don't know anything about context, and comparisons of Decimal objects don't make use of the context, so if we ignore NaNs, then any set of Decimal objects is totally orderable. So Decimal is totally orderable to exactly the same extent that float is. (Though the behaviour with NaNs is a little more extreme, since comparisons involving sNaNs will raise rather than return False.) I'd argue that on a practicality-beats-purity basis, it wouldn't be unreasonable to register both `Decimal` and `float` as implementing `TotalOrdering` (or whatever the ABC ends up being called). -- Mark
On Wed, Mar 04, 2020 at 04:21:46PM +0000, Mark Dickinson wrote:
I'd argue that on a practicality-beats-purity basis, it wouldn't be unreasonable to register both `Decimal` and `float` as implementing `TotalOrdering` (or whatever the ABC ends up being called).
And that seems perfectly practical until somebody tries to compare or sort floats and discovers that they actually don't implement a total ordering if any NANs have crept into their data. This can be a genuine pain point: https://mail.python.org/archives/list/python-ideas@python.org/message/K35GTW... https://bugs.python.org/issue33084 and various other discussions. -- Steven
On Wed, Mar 4, 2020 at 11:23 AM Mark Dickinson <dickinsm@gmail.com> wrote:
So Decimal is totally orderable to exactly the same extent that float is. (Though the behaviour with NaNs is a little more extreme, since comparisons involving sNaNs will raise rather than return False.) I'd argue that on a practicality-beats-purity basis, it wouldn't be unreasonable to register both `Decimal` and `float` as implementing `TotalOrdering` (or whatever the ABC ends up being called).
I do not disagree on the practicality-beats-purity point. But the "failure" modes are different between Decimal and float. Technically, Decimal objects themselves are total ordered, but an expression as simple as "+x" creates a different decimal object that depends on context... and I can very easily see folks tripping on that with expected order properties. Floats a total ordered *if you ignore a bunch of them* (the NaNs, which are easy to arrive at by a variety of operations on other floats). That said, I still haven't actually seen a real use case where we need TotalOrdered as an ABC. I love mathematical concepts as much as the next person, but what are we actually trying to get in our programs? -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
Andrew Barnert wrote:
Taking one step back out of the realm of mathematical definition, however, the original idea was simply to distinguish what I now understand to be "totally ordered" types from other types, be they "partially ordered" or unordered — not even having a full complement of rich comparison operators or having all but using them in weirder ways than sets do. Is there any commonly used or even imaginable useful type that uses them in weirder ways than set and float (which are both partially ordered) or np.array (where they aren’t even Boolean-values)? In particular, transitivity keeps coming up, but all of those examples are transitive (it’s never true that a<b and true that b<c without being
On Mar 4, 2020, at 00:07, Steve Jorgensen stevej@stevej.name wrote: true than a<c for any of them). If there are such uses it might be important to distinguish them, but if there aren’t, it doesn’t seem unreasonable for PartiallyOrdered to “wrongly” pick up hypothetical pathological types that no one will ever write in exchange for automatically being right about every actual type anyone uses. After all, Iterable is a virtual superclass of any type with __iter__, even if it returns the number 42 instead of an Iterator, and so on; technically every implicit ABC in Python is “wrong” like this, but in practice it doesn’t come up and implicit ABCs are very useful.
I see what you're saying. I guess what I was getting at is that for purposes of determining whether something is totally orderable or not, it doesn't matter what kind of not-totally-orderable the thing is — partially orderable (like sets), non-orderable (without full complement of operators), or some other weird thing that has the full compliment of operators.
On Mar 4, 2020, at 01:56, Mark Dickinson <dickinsm@gmail.com> wrote:
On Wed, Mar 4, 2020 at 9:22 AM Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Is there any commonly used or even imaginable useful type that uses them in weirder ways than set and float (which are both partially ordered) [...]
Nitpick: why do you say "partially ordered" for float? If we take NaNs into account, then the float type is neither partially nor totally ordered, since x <= x is False when x is a NaN.
In math, non-strict orders (<=) are usually considered the more fundamental and useful concept, but in programming it’s the other way around: Python’s sort is based on < rather than <=, and the same is true in C++, Haskell, etc. And < (and IEEE lessThan) is a strict partial order: it’s irreflexive, transitive, and antisymmetric. And it’s properly partial, because of NaN values. So, how can < be a strict partial order when <= is not a non-strict partial order, even though they’re related in the usual way (x<=y iff x<y or x==y)? Because == is not an equivalence relation. Define any natural equivalence relation on floats (e.g., make each NaN value equal itself), and (x<y or x=y) is a correspondingly natural non-strict partial order (e.g., with all the NaNs as an antichain). And I’m pretty sure this is all intentional on the part of the IEEE 754 committee and various language designers: if == is not going to be an equivalence relation, you can’t have a natural non-strict partial order—but you can still have a natural strict partial order, and it’s useful, so that’s what they defined. And that’s what languages use to sort numbers (and, usually, everything else—it’s also more naturql to use strict orders to do topological sorts on DAGs, etc.).
If we don't take NaNs into account (which is probably the way we'd want to go from a practicality standpoint), it's both partially and totally ordered. (To David Mertz's earlier comment, "not a total order when you include nans and infs", infinities don't affect the existence of a total or partial order here; it's only the NaNs that we have to worry about.)
You do have to worry about -0 and +0, however, but only very briefly: -0 == 0, not -0 < 0, not 0 < -0, so they aren’t breaking the equivalence or the order. Only NaNs break the equivalence, and therefore the non-strict partial order, and they also turn the strict total order into a strict properly partial order.
Andrew Barnert via Python-ideas writes:
Is there any commonly used or even imaginable useful type that uses them in weirder ways than set and float (which are both partially ordered) or np.array (where they aren’t even Boolean-values)?
I've had occasion (a class for outcomes in two-player games) to define both < and <= as complete preorders, with "symmetry" in the sense that A < B iff B > A and A <= B iff B >= A, but < and <= were completely independent: they could be identical ("game of pure coordination"), they could be inverse ("game of pure competition"), and they could be anything else (eg, "prisoners' dilemma"). This system took a little getting used to, but writing "A > B and A >= B and A != B" [1] to implement "A Pareto dominates B" was one expressive convenience among others. I'm not sure this is "weird" in the sense you mean, and I greatly doubt this is sufficiently common to deserve an ABC :-), but it's a real example (long since archived on a disc whose exact location has slipped from memory ;-) that shows how flexible "ordering" in Python can be. Footnotes: [1] How the class, and specifically "A != B", were implemented is left for the reader who knows game theory to imagine. ;-)
Is there any commonly used or even imaginable useful type that uses them in weirder ways than set and float
I don’t think this is relevant— Python is very dynamic, operator overloading is a way to make your custom classes work With the operators. There is no guarantee at all that any of the operators will “mean” the same thing for all objects. And the fact that classes in the standard library use them In non standard ways (sets as is being discussed), and / for Path concatenation, and even + for lists (and aren’t we adding something for dicts as we speak?) not to mention numpy arrays... So the presence of a given operator(s) does not mean anything about behavior. If we want a way to know if an arbitrary object is orderable, an ABC is the way to do that. This is in contrast to the presence of __iter__ for example. While you could define an __iter__ method that returns anything, the whole point is to make an object iterable, so there is no reason whatsoever to (an)use it in any other way. Personally, The discussion about floats makes me have no interest in a well defined ABC, but then again, I have not gotten on the type checking bandwagon yet anyway :-) -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On 3/4/20 11:43 AM, Steven D'Aprano wrote:
On Wed, Mar 04, 2020 at 04:21:46PM +0000, Mark Dickinson wrote:
I'd argue that on a practicality-beats-purity basis, it wouldn't be unreasonable to register both `Decimal` and `float` as implementing `TotalOrdering` (or whatever the ABC ends up being called). And that seems perfectly practical until somebody tries to compare or sort floats and discovers that they actually don't implement a total ordering if any NANs have crept into their data.
This can be a genuine pain point:
https://mail.python.org/archives/list/python-ideas@python.org/message/K35GTW...
https://bugs.python.org/issue33084
and various other discussions.
Yes, because of the NaN issue, you sort of need an 'Almost Total Order' and 'Really Truly a Total Order', the first allowing the small exception of a very limited (maybe only one) special value that breaks the true definition of Total Order so that we could call Floats to be an Almost Total Order. This wouldn't help the median issue as having median reject being given Float values because they don't form a true Total Order would be much worse than the issue of it getting confused with NaNs. The presence of NaN in the float system does add a significant complexity to dealing with floating point numbers. Sometimes I wonder if since Python supports dynamic typing of results, might not do better by removing the NaN value from Floats and Decimals, and make the operations that generate the NaN generate an object of a special NaN type. -- Richard Damon
On Mar 4, 2020, at 19:12, Richard Damon <Richard@damon-family.org> wrote:
Yes, because of the NaN issue, you sort of need an 'Almost Total Order' and 'Really Truly a Total Order', the first allowing the small exception of a very limited (maybe only one) special value that breaks the true definition of Total Order so that we could call Floats to be an Almost Total Order.
The obvious answer is to have PartialOrder < NaNOrder < TotalOrder, where a type is NaN-ordered if it’s partially ordered, and its subset of all self-equal objects is totally ordered. I can’t imagine when a generic AlmostTotalOrder that didn’t specify how it fails could be useful; a NaNOrder is useful all over the place. A median that ignores NaN values requires NaNOrdered input. Even a median that does something stupid with NaN—see below. Of course there might be other kinds of almost total orders that might be useful. If anyone runs into one, they can write their own ABC with the proper relationships to the stdlib ones. Or even propose it for the stdlib. But they couldn’t do anything useful with a general AlmostTotalOrder if it had to handle both NaN and whatever their different case was.
This wouldn't help the median issue as having median reject being given Float values because they don't form a true Total Order would be much worse than the issue of it getting confused with NaNs.
Well, if you want to handle floats and Decimals and do something stupid with NaN values so the user has to be careful, it seems to me the right thing to do is require NaNOrder but document that you do something stupid with NaN values so the user has to be careful.
The presence of NaN in the float system does add a significant complexity to dealing with floating point numbers. Sometimes I wonder if since Python supports dynamic typing of results, might not do better by removing the NaN value from Floats and Decimals, and make the operations that generate the NaN generate an object of a special NaN type.
The thing is, in most cases you’d be duck typing and get no benefit, because float and floatnan both have the same methods, etc. Sure, when you’re type-checking with isinstance you could choose whether to check float|floatnan or just float, but how often do you write such type checks? And if you really want that, you could write a Float ABC that does the same thing, although IIRC ABCs only have a subclasshook rather than an instancehook so you need a new metaclass: class FloatMeta(ABCMeta): def __instancecheck__(self, instance): return isinstance(instance, float) and not math.isnan(instance) class Float(metaclass=FloatMeta): pass People have experimented a lot with similar things and beyond in static type systems: you can define a new type which is just the non-NaN floats, or just the finite floats, or just the floats from -1 to 1, or just the three floats -1 or 0 or 1, and the compiler can check that you’re always using it safely. (If you call a function that takes one of those non-NaN floats with a float, unless the compiler can see an if not isnan(x) or a pattern match that excludes NaN or something else that guarantees correctness, it’s a TypeError.) Fitting this into the type algebra (& and | and -) is pretty easy. I’m not aware of anyone translating that idea to dynamic type systems, but it could be interesting.
On 5/03/20 4:08 pm, Richard Damon wrote:
Sometimes I wonder if since Python supports dynamic typing of results, might not do better by removing the NaN value from Floats and Decimals, and make the operations that generate the NaN generate an object of a special NaN type.
I don't see what problem that would solve. All it would do is enable us to claim that the float type was totally ordered, without technically lying. Any operation on floats would still be able to return a NaNType value that would get mixed in with your other float operations, so it would make no practical difference. -- Greg
Steve Jorgensen wrote: <snip>
The problem I came up with trying to spike out my proposal last night is that there doesn't seem to be anyway to implement it without creating infinite recursion in the issublcass call. If I make Orderable a real or virtual subclass of ProtoOrderable and Orderable's __subclasshook__ or metaclass __subclasscheck__ (I tried both ways) tries to check whether C is a subclass of ProtoOrderable, then an infinite recursion occurs. It wasn't immediately obvious to me why that is the case, but when I thought about it deeply, I can see why that must happen. An alternative that I thought about previously but seems very smelly to me for several reasons is to have both Orderable and NonOrderable ABCs. In that case, what should be done to prevent a class from being both orderable and non-orderable or figure out which should take precedence in that case? As a meta-solution (wild-assed idea) what if metaclass registration could accept keyword arguments, similar to passing keyword arguments to a class definition? That way, a single ABC (ProtoOrderable or whatever better name) could be a real or virtual subclass that is explicitly orderable or non-orderable depending on orderable=<True/False>.
I have been unable to implement the class hierarchy that I proposed, and I think I've determined that it's just not a practical fit with how the virtual bas class mechanism works, so… Maybe just a single `TotalOrdered` or `TotalOrderable` ABC with a `register_explicit_only` method. The `__subclasshook__` method would skip the rich comparison methods check and return `NotImplemented` for any class registered using `register_explicit_only` (or any of its true subclasses). The only weird edge case in the above is that is someone registers another ABC using `TotalOrdered.register_explicit_only` and uses that as a virtual base class of something else, the `register_explicit_only` registration will not apply to the virtual subclass. I'm thinking that's completely acceptable as a known limitation if documented?
On 3/5/20 1:27 AM, Greg Ewing wrote:
On 5/03/20 4:08 pm, Richard Damon wrote:
Sometimes I wonder if since Python supports dynamic typing of results, might not do better by removing the NaN value from Floats and Decimals, and make the operations that generate the NaN generate an object of a special NaN type.
I don't see what problem that would solve. All it would do is enable us to claim that the float type was totally ordered, without technically lying. Any operation on floats would still be able to return a NaNType value that would get mixed in with your other float operations, so it would make no practical difference.
What it would allow would be for the Float type to say it was Totally Ordered, and sort (and related functions) could check that the sequence passed to them consisted of just totally ordered items, and throw an exception if some item passed to it wasn't Totally Ordered. I suppose that since Python is not only Dynamically typed, also supports Duck Typing, the float type could declare itself to be Totally Ordered and the Nan just remove that characteristic from itself. (Since Float is built in, that might need some implementation magic to do that). -- Richard Damon
On Mar 4, 2020, at 19:12, Richard Damon <Richard@damon-family.org> wrote:
Yes, because of the NaN issue, you sort of need an 'Almost Total Order' and 'Really Truly a Total Order', the first allowing the small exception of a very limited (maybe only one) special value that breaks the true definition of Total Order so that we could call Floats to be an Almost Total Order. The obvious answer is to have PartialOrder < NaNOrder < TotalOrder, where a type is NaN-ordered if it’s partially ordered, and its subset of all self-equal objects is totally ordered.
I can’t imagine when a generic AlmostTotalOrder that didn’t specify how it fails could be useful; a NaNOrder is useful all over the place. A median that ignores NaN values requires NaNOrdered input. Even a median that does something stupid with NaN—see below.
Of course there might be other kinds of almost total orders that might be useful. If anyone runs into one, they can write their own ABC with the proper relationships to the stdlib ones. Or even propose it for the stdlib. But they couldn’t do anything useful with a general AlmostTotalOrder if it had to handle both NaN and whatever their different case was. I think your NaNOrder is my Almost Total Order, except that the name NaNOrder implies that it is only NaN is the special case, and thus we are dealing with numbers, but there may well be some other user type
On 3/4/20 11:07 PM, Andrew Barnert via Python-ideas wrote: that has a similar 'exceptional' value with properties like NaN, but since the items aren't numbers, the Not-A-Number tag isn't really appropriate.
This wouldn't help the median issue as having median reject being given Float values because they don't form a true Total Order would be much worse than the issue of it getting confused with NaNs. Well, if you want to handle floats and Decimals and do something stupid with NaN values so the user has to be careful, it seems to me the right thing to do is require NaNOrder but document that you do something stupid with NaN values so the user has to be careful.
Yes, that is the idea of AlmostTotalOrder, to have algorithms that really require a total order (like sorting) but we really need to use a type that has these exceptional values. Imagine that sort/median was defined to type check its parameter, and that meant that you couldn't take the median of a list of floats (because float has the NaN value that breaks TotalOrder).
The presence of NaN in the float system does add a significant complexity to dealing with floating point numbers. Sometimes I wonder if since Python supports dynamic typing of results, might not do better by removing the NaN value from Floats and Decimals, and make the operations that generate the NaN generate an object of a special NaN type. The thing is, in most cases you’d be duck typing and get no benefit, because float and floatnan both have the same methods, etc. Sure, when you’re type-checking with isinstance you could choose whether to check float|floatnan or just float, but how often do you write such type checks? And if you really want that, you could write a Float ABC that does the same thing, although IIRC ABCs only have a subclasshook rather than an instancehook so you need a new metaclass:
class FloatMeta(ABCMeta): def __instancecheck__(self, instance): return isinstance(instance, float) and not math.isnan(instance)
class Float(metaclass=FloatMeta): pass
People have experimented a lot with similar things and beyond in static type systems: you can define a new type which is just the non-NaN floats, or just the finite floats, or just the floats from -1 to 1, or just the three floats -1 or 0 or 1, and the compiler can check that you’re always using it safely. (If you call a function that takes one of those non-NaN floats with a float, unless the compiler can see an if not isnan(x) or a pattern match that excludes NaN or something else that guarantees correctness, it’s a TypeError.) Fitting this into the type algebra (& and | and -) is pretty easy. I’m not aware of anyone translating that idea to dynamic type systems, but it could be interesting.
The problem with trying to make NaN a special type in a static typed language is that suddenly you can't define any of your variables to be of type 'float' if they might have a NaN value, as that would have the wrong type to store it. Your basic fundamental float type suddenly needs to use the dynamic typing system within the static type system that handles type hierarchies. That would be a BIG efficiency hit in a language like C++. A Dynamically typed language like Python, since it already takes that hit (and leverages that it always takes the hit to add functionality) has much more flexibility here. I will add, that the whole concept of the NaN value was an invention to handle the problem that a totally statically typed language, like C, couldn't efficiently handle those sorts of errors except by a funny value that didn't really act like a floating point number. Except for the fact that so many people have gotten used to the fact that certain erroneous math computation create this weird NaN value that we think of as a number but also not really a number, there is no reason in a dynamically typed language that this error code needs to be considered to actually be a Float value. -- Richard Damon
On Thu, Mar 05, 2020 at 08:23:22AM -0500, Richard Damon wrote:
Yes, that is the idea of AlmostTotalOrder, to have algorithms that really require a total order (like sorting)
Sorting doesn't require a total order. Sorting only requires a weak order where the only operator required is the "comes before" operator, or less than. That's precisely how sorting in Python is implemented. Here is an interesting discussion of a practical use-case of sorting data with a partial order: https://blog.thecybershadow.net/2018/11/18/d-compilation-is-too-slow-and-i-a...
but we really need to use a type that has these exceptional values. Imagine that sort/median was defined to type check its parameter,
No need to imagine it, sort already type-checks its arguments: py> sorted([1, 3, 5, "Hello", 2]) TypeError: '<' not supported between instances of 'str' and 'int'
and that meant that you couldn't take the median of a list of floats (because float has the NaN value that breaks TotalOrder).
Dealing with NANs depends on what you want to do with the data. If you are sorting for presentation purposes, what you probably want is to sort with a custom key that pushes all the NANs to the front (or rear) of the list. If you are sorting for the purposes of calculating the median, it depends. There are at least three reasonable strategies for median: - ignore the NANs; - return a NAN; - raise an exception. Personally, I think that the first is by far the most practical: if you have NANs in your statistical data, that's probably because they've come from some other library or application that is using them to represent missing values, and if that's the case, the right thing to do is to ignore them. -- Steven
On Mar 5, 2020, at 05:24, Richard Damon <Richard@damon-family.org> wrote:
On 3/4/20 11:07 PM, Andrew Barnert via Python-ideas wrote:
On Mar 4, 2020, at 19:12, Richard Damon <Richard@damon-family.org> wrote: Yes, because of the NaN issue, you sort of need an 'Almost Total Order' and 'Really Truly a Total Order', the first allowing the small exception of a very limited (maybe only one) special value that breaks the true definition of Total Order so that we could call Floats to be an Almost Total Order. The obvious answer is to have PartialOrder < NaNOrder < TotalOrder, where a type is NaN-ordered if it’s partially ordered, and its subset of all self-equal objects is totally ordered.
I can’t imagine when a generic AlmostTotalOrder that didn’t specify how it fails could be useful; a NaNOrder is useful all over the place. A median that ignores NaN values requires NaNOrdered input. Even a median that does something stupid with NaN—see below.
Of course there might be other kinds of almost total orders that might be useful. If anyone runs into one, they can write their own ABC with the proper relationships to the stdlib ones. Or even propose it for the stdlib. But they couldn’t do anything useful with a general AlmostTotalOrder if it had to handle both NaN and whatever their different case was. I think your NaNOrder is my Almost Total Order, except that the name NaNOrder implies that it is only NaN is the special case, and thus we are dealing with numbers, but there may well be some other user type that has a similar 'exceptional' value with properties like NaN, but since the items aren't numbers, the Not-A-Number tag isn't really appropriate.
The important thing is that it implies that there’s a subset of values that are totally ordered, and that the only values outside that subset are not self-equal. This means, for example, that you can test whether set is actually orderable in linear time with all(x==x for x in xs). So you can write a median that skips NaN values, or raises on them, just by checking each value as it comes up. But your almost total order only implies that there are some values that break the total order in some way. So you can only test whether a set is actually orderable in quadratic time with all(not(x<y and y<x) for x in xs for y in xs)). The only way to raise in bad values is that quadratic check. There is no way to skip bad values, because even when the check fails, you can’t tell whether it’s x or y that’s bad. So it’s a useless notion. The other important thing is that NaN order comes up all the time when sorting float, Decimal, Pandas datetime (its NaT has NaN semantics), etc. Of course you could imagine other almost total orders, even if nobody ever comes up with one when asked. Here’s one: the projective reals (real values extended with a single infinity that’s both negative and positive) are almost totally ordered, except that inf is not comparable to anything but itself (which it’s equal to). You could easily write a sort or median or whatever algorithm that does something useful with this order. But only if you know it’s this order. You can’t write something that’s useful with both this order and NaN order. So almost total still doesn’t help with anything; you need to write a new ABC that’s also a subclass of Partial and superclass of Total but has no relation to NaN.
The presence of NaN in the float system does add a significant complexity to dealing with floating point numbers. Sometimes I wonder if since Python supports dynamic typing of results, might not do better by removing the NaN value from Floats and Decimals, and make the operations that generate the NaN generate an object of a special NaN type. The thing is, in most cases you’d be duck typing and get no benefit, because float and floatnan both have the same methods, etc. Sure, when you’re type-checking with isinstance you could choose whether to check float|floatnan or just float, but how often do you write such type checks? And if you really want that, you could write a Float ABC that does the same thing, although IIRC ABCs only have a subclasshook rather than an instancehook so you need a new metaclass:
class FloatMeta(ABCMeta): def __instancecheck__(self, instance): return isinstance(instance, float) and not math.isnan(instance)
class Float(metaclass=FloatMeta): pass
People have experimented a lot with similar things and beyond in static type systems: you can define a new type which is just the non-NaN floats, or just the finite floats, or just the floats from -1 to 1, or just the three floats -1 or 0 or 1, and the compiler can check that you’re always using it safely. (If you call a function that takes one of those non-NaN floats with a float, unless the compiler can see an if not isnan(x) or a pattern match that excludes NaN or something else that guarantees correctness, it’s a TypeError.) Fitting this into the type algebra (& and | and -) is pretty easy. I’m not aware of anyone translating that idea to dynamic type systems, but it could be interesting.
The problem with trying to make NaN a special type in a static typed language is that suddenly you can't define any of your variables to be of type 'float' if they might have a NaN value, as that would have the wrong type to store it.
That’s not a problem in a static typed language unless the language’s type system isn’t powerful enough to help. If you think of C and C++ (without template metaprogramming) as exemplars of static typing, you really should go learn TypeScript or Scala, or at least Swift or C# or Rust. It will open your eyes about what type systems can do (and some of those insights are transferable to implicit dynamic type systems like Python’s). Let’s break down float like this (for simplicity, pretend that there’s only a single NaN value): floatnan = NaN floatnum = float - floatnan floatinf = -inf | inf floatzero = 0 floatfinite = floatnum - floatinf - floatzero Now, you know that floatfinite + floatfinite gives you a floatnum, and so does floatnum + floatfinite, and so on, but floatnum + floatnum only gives you float. You write up all those rules, and now the compiler can verify that all of your variables have the right type. And it can also verify this: def spam(x: floatfinite) -> floatfinite: ... def eggs(x: float) -> auto: return 1 x: float y: auto ... if isfinite(x): y = spam(x) else: y = eggs(x) z: floatfinite = spam(y) And so on. You don’t need any runtime type checks. You don’t even need the types to exist at runtime—all of these values can be stored as just a 64-bit C-style double. You also don’t need value checks at every operation—in this example, we only needed a single value check (which was needed anyway for the algorithm to be correct) for the compiler to verify that our code is correct. If we screwed up and called spam(x) at the top level, the compiler would give us a TypeError telling us that we’re calling a function that requires a floatfinite with a float that can’t be proven to be a floatfinite. And this isn’t a fantasy. There are real experimental languages that do this. There are real actually-used-in-production-code languages like TypeScript and Haskell that do the equivalent for smaller cases like small integer ranges (and proposals for ways to do it for floats by adding support for continuous ranges or huge ranges or subtraction types or … but as far as I know none of them have panned out yet). The fact that C++ can’t do it isn’t a limitation of static typing, it’s a limitation of the C++ type system. In fact, even C++ can technically do it, but it requires a huge mess of hideous template metaprogramming code. There’s a proof of concept out there for checking bytes for overflow that requires thousands of lines of unreadable generated code and takes minutes to compile, but it works. The equivalent trick for doubles would be technically valid C++ code, but it would take O(2**(2**64)) compile time memory.
Your basic fundamental float type suddenly needs to use the dynamic typing system within the static type system that handles type hierarchies.
No it doesn’t. At runtime, all of those float subset types compile to the same 64-bit IEEE double with no extra information, and all operations on them work the same as in C (in fact, even simpler and faster than in C, because you can eliminate a few value checks that C requires), because the whole point of the type system is that it proved that your code’s assumptions are correct and the value checks you wrote are sufficient so nothing needs to be checked at runtime.
That would be a BIG efficiency hit in a language like C++.
Sure, but again, only in a language like C++, not in a language with a better type system. Also, even if that efficiency hit were ubiquitous, it would still be faster than, say, Python (which has to do the same boxing and unboxing and dynamic checking, but also has to look up operations by name and interpret bytecode), which is hardly an unusable language. So, the interesting question is: could any of this be leveraged by a powerful dynamic type system? Ideally you could make a runtime that can skip the checking and/or dispatch costs in some cases. If not, maybe you can at least borrow some of the same ideas for defining single-value, range-value, union, intersection, and subtraction types for clarity, even if they can’t help the compiler optimize anything. In particular, would being able to check isinstance(x, floatnum) or floatnan instead of just float (which is floatnum|floatnan) be useful? Then write the ABCs and use them. The fact that they’re not quite ABCs, so you need a custom metaclass to do it in Python, is a bit annoying, but only a bit; it’s still doable.
Steve Jorgensen wrote:
Steve Jorgensen wrote: <snip>
The problem I came up with trying to spike out my proposal last night is that there doesn't seem to be anyway to implement it without creating infinite recursion in the issublcass call. If I make Orderable a real or virtual subclass of ProtoOrderable and Orderable's __subclasshook__ or metaclass __subclasscheck__ (I tried both ways) tries to check whether C is a subclass of ProtoOrderable, then an infinite recursion occurs. It wasn't immediately obvious to me why that is the case, but when I thought about it deeply, I can see why that must happen. An alternative that I thought about previously but seems very smelly to me for several reasons is to have both Orderable and NonOrderable ABCs. In that case, what should be done to prevent a class from being both orderable and non-orderable or figure out which should take precedence in that case? As a meta-solution (wild-assed idea) what if metaclass registration could accept keyword arguments, similar to passing keyword arguments to a class definition? That way, a single ABC (ProtoOrderable or whatever better name) could be a real or virtual subclass that is explicitly orderable or non-orderable depending on orderable=<True/False>. I have been unable to implement the class hierarchy that I proposed, and I think I've determined that it's just not a practical fit with how the virtual bas class mechanism works, so… Maybe just a single TotalOrdered or TotalOrderable ABC with a register_explicit_only method. The __subclasshook__ method would skip the rich comparison methods check and return NotImplemented for any class registered using register_explicit_only (or any of its true subclasses). The only weird edge case in the above is that is someone registers another ABC using TotalOrdered.register_explicit_only and uses that as a virtual base class of something else, the register_explicit_only registration will not apply to the virtual subclass. I'm thinking that's completely acceptable as a known limitation if documented?
Code spike of that idea: ``` from abc import ABCMeta from weakref import WeakSet class TotallyOrderable(metaclass=ABCMeta): _explicit_only_registry = WeakSet() @classmethod def register_explicit_only(cls, C): if cls is not TotallyOrderable: raise NotImplementedError( f"{cls.__name__} does not implement 'register_explicit_only'") cls._explicit_only_registry.add(C) @classmethod def __subclasshook__(cls, C): if cls is not TotallyOrderable: return NotImplemented for B in C.__mro__: if B in cls._explicit_only_registry: return NotImplemented return cls._check_overrides_rich_comparison_methods(C) @classmethod def _check_overrides_rich_comparison_methods(cls, C): mro = C.__mro__ for method in ('__lt__', '__le__', '__gt__', '__ge__'): for B in mro: if B is not object and method in B.__dict__: if B.__dict__[method] is None: return NotImplemented break else: return NotImplemented return True ```
On Mar 5, 2020, at 11:05, Steve Jorgensen <stevej@stevej.name> wrote:
Steve Jorgensen wrote:
Steve Jorgensen wrote: <snip>
The problem I came up with trying to spike out my proposal last night is that there doesn't seem to be anyway to implement it without creating infinite recursion in the issublcass call.
Is this something we should be looking to add to the ABC mechanism in general? Would a way to “unregister” classes that would be implicitly accepted be simpler than a way to “register_explicit_only” classes so they skip the implicit test?
If I make Orderable a real or virtual subclass of ProtoOrderable and Orderable's __subclasshook__ or metaclass __subclasscheck__ (I tried both ways) tries to check whether C is a subclass of ProtoOrderable, then an infinite recursion occurs. It wasn't immediately obvious to me why that is the case, but when I thought about it deeply, I can see why that must happen. An alternative that I thought about previously but seems very smelly to me for several reasons is to have both Orderable and NonOrderable ABCs. In that case, what should be done to prevent a class from being both orderable and non-orderable or figure out which should take precedence in that case? As a meta-solution (wild-assed idea) what if metaclass registration could accept keyword arguments, similar to passing keyword arguments to a class definition? That way, a single ABC (ProtoOrderable or whatever better name) could be a real or virtual subclass that is explicitly orderable or non-orderable depending on orderable=<True/False>. I have been unable to implement the class hierarchy that I proposed, and I think I've determined that it's just not a practical fit with how the virtual bas class mechanism works, so… Maybe just a single TotalOrdered or TotalOrderable ABC with a register_explicit_only method. The __subclasshook__ method would skip the rich comparison methods check and return NotImplemented for any class registered using register_explicit_only (or any of its true subclasses). The only weird edge case in the above is that is someone registers another ABC using TotalOrdered.register_explicit_only and uses that as a virtual base class of something else, the register_explicit_only registration will not apply to the virtual subclass. I'm thinking that's completely acceptable as a known limitation if documented?
Code spike of that idea: ``` from abc import ABCMeta from weakref import WeakSet
class TotallyOrderable(metaclass=ABCMeta): _explicit_only_registry = WeakSet()
@classmethod def register_explicit_only(cls, C): if cls is not TotallyOrderable: raise NotImplementedError( f"{cls.__name__} does not implement 'register_explicit_only'")
cls._explicit_only_registry.add(C)
@classmethod def __subclasshook__(cls, C): if cls is not TotallyOrderable: return NotImplemented
for B in C.__mro__: if B in cls._explicit_only_registry: return NotImplemented
return cls._check_overrides_rich_comparison_methods(C)
@classmethod def _check_overrides_rich_comparison_methods(cls, C): mro = C.__mro__ for method in ('__lt__', '__le__', '__gt__', '__ge__'): for B in mro: if B is not object and method in B.__dict__: if B.__dict__[method] is None: return NotImplemented break else: return NotImplemented return True ``` _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2OZBPQ... Code of Conduct: http://python.org/psf/codeofconduct/
[Steven D'Aprano <steve@pearwood.info>]
Sorting doesn't require a total order. Sorting only requires a weak order where the only operator required is the "comes before" operator, or less than. That's precisely how sorting in Python is implemented.
Let's be careful here. Python's list.sort() guarantees that _if_ the elements are totally orderable, it will deliver the unique stable permutation consistent with that ordering. Otherwise, the only thing it guarantees is that the output will be _some_ permutation of the input. There was no attempt to do something "useful" in the absence of a total order - whatever happens then is purely down to implementation accidents. The reason is sticks to __lt__ is pragmatic, not theoretical: at the time it was designed, rich comparisons weren't yet implemented, and all comparisons were implemented via __cmp__ methods. But rich comparisons were scheduled for a future release, and sticking to __lt__ was my deliberate nod to making it as easy as possible then for classes that wanted to define their own order to participate in sorting. Implement __lt__, and you're good to go. I don't think it's documented, but the guts of the bisect and heapq modules were reworked (I think by Raymond) to stick to __lt__ too, for the same reason. But, e.g.,
sorted([{1, 2}, {3}, {1}]) [{1, 2}, {3}, {1}]
despite that {1} < {1, 2} As far as sorted() is concerned, that's "garbage in garbage out". What happens today (and since the current list.sort() was written - but may not be so tomorrow): First it asks whether {3} < {1, 2}. It gets back False, so concludes the first two elements are already in order. Then it asks whether {1} < {3}. Again False, so it concludes those two are also in order. And that's it. It's done. It never compares {1, 2} to {1} at all. By comparing only adjacent pairs of elements, it concludes that the entire input consists of one non-descending run. For totally ordered input elements, that means "it's already sorted".
Here is an interesting discussion of a practical use-case of sorting data with a partial order:
https://blog.thecybershadow.net/2018/11/18/d-compilation-is-too-slow-and-i-a...
list.sort() is of no use for finding a linear ordering that respects a collection of."A must precede B" constraints But Python 3.9 introduces a capable new functools.TopologicalSorter class that is good for that. Although, unlike the approach in that blog, it doesn't cater to cycles - if there's a cycle in the input, it raises CycleError.
but we really need to use a type that has these exceptional values. Imagine that sort/median was defined to type check its parameter,
No need to imagine it, sort already type-checks its arguments:
py> sorted([1, 3, 5, "Hello", 2]) TypeError: '<' not supported between instances of 'str' and 'int'
Yes and no ;-) list.sort() makes no attempt to check element types for the sake of checking them. In the example, it's just passing on an exception raised by asking whether "Hello" < 5 (as in the example above, it first compares adjacent pairs to identify the longest initial run). If, e.g., we sorted a list like [a, b, c], where (b < a) == (c < b) == False it would stop then, even if comparing `a` with `c` would raise an exception.
[snipped stuff about NaNs]
On 3/5/20 9:10 AM, Steven D'Aprano wrote:
On Thu, Mar 05, 2020 at 08:23:22AM -0500, Richard Damon wrote:
Yes, that is the idea of AlmostTotalOrder, to have algorithms that really require a total order (like sorting) Sorting doesn't require a total order. Sorting only requires a weak order where the only operator required is the "comes before" operator, or less than. That's precisely how sorting in Python is implemented.
Here is an interesting discussion of a practical use-case of sorting data with a partial order:
https://blog.thecybershadow.net/2018/11/18/d-compilation-is-too-slow-and-i-a... Reading that, yes, there are applications of sorting that don't need total order, but as the article points out, many of the general purpose sorting algorithms do (like the one that Python uses in sort)
but we really need to use a type that has these exceptional values. Imagine that sort/median was defined to type check its parameter, No need to imagine it, sort already type-checks its arguments:
py> sorted([1, 3, 5, "Hello", 2]) TypeError: '<' not supported between instances of 'str' and 'int'
If you consider that proper type checking, then you must consider that the proper answer for the median of a list of numbers that contain a NaN is any of the numbers in the list. If Sort had an easy/cheap way to confirm that values passed to it met its assumptions, then it could make are reasonable response.
and that meant that you couldn't take the median of a list of floats (because float has the NaN value that breaks TotalOrder). Dealing with NANs depends on what you want to do with the data. If you are sorting for presentation purposes, what you probably want is to sort with a custom key that pushes all the NANs to the front (or rear) of the list. If you are sorting for the purposes of calculating the median, it depends. There are at least three reasonable strategies for median:
- ignore the NANs; - return a NAN; - raise an exception.
Personally, I think that the first is by far the most practical: if you have NANs in your statistical data, that's probably because they've come from some other library or application that is using them to represent missing values, and if that's the case, the right thing to do is to ignore them.
There was not that long ago about that very topic. All those options can be reasonable, but ignoring seems to me to be one of the worse options for a simple package (but reasonable for one where the whole package uses that convention). The danger of it is that if you get a NaN as a result of a computation generating your data, that error gets hidden by having the data just be ignored. I would say that in Python, it would make a lot more sense to use None as the missing data code, and leave NaN for invalid data/computations. That way you keep things explicit. The use of NaN here goes back to the use of strictly static typed languages for doing this, where NaN was a convenient special value to mark it. (prior to the invention of NaN you just used an impossible value for these). -- Richard Damon
Andrew Barnert wrote:
On Mar 5, 2020, at 11:05, Steve Jorgensen stevej@stevej.name wrote:
Steve Jorgensen wrote: Steve Jorgensen wrote: <snip> The problem I came up with trying to spike out my proposal last night is that there doesn't seem to be anyway to implement it without creating infinite recursion in the issublcass call. Is this something we should be looking to add to the ABC mechanism in general? Would a way to “unregister” classes that would be implicitly accepted be simpler than a way to “register_explicit_only” classes so they skip the implicit test? If I make Orderable a real or virtual subclass of ProtoOrderable and Orderable's __subclasshook__ or metaclass __subclasscheck__ (I tried both ways) tries to check whether C is a subclass of ProtoOrderable, then an infinite recursion occurs. It wasn't immediately obvious to me why that is the case, but when I thought about it deeply, I can see why that must happen. An alternative that I thought about previously but seems very smelly to me for several reasons is to have both Orderable and NonOrderable ABCs. In that case, what should be done to prevent a class from being both orderable and non-orderable or figure out which should take precedence in that case? As a meta-solution (wild-assed idea) what if metaclass registration could accept keyword arguments, similar to passing keyword arguments to a class definition? That way, a single ABC (ProtoOrderable or whatever better name) could be a real or virtual subclass that is explicitly orderable or non-orderable depending on orderable=<True/False>. I have been unable to implement the class hierarchy that I proposed, and I think I've determined that it's just not a practical fit with how the virtual bas class mechanism works, so… Maybe just a single TotalOrdered or TotalOrderable ABC with a register_explicit_only method. The __subclasshook__ method would skip the rich comparison methods check and return NotImplemented for any class registered using register_explicit_only (or any of its true subclasses). The only weird edge case in the above is that is someone registers another ABC using TotalOrdered.register_explicit_only and uses that as a virtual base class of something else, the register_explicit_only registration will not apply to the virtual subclass. I'm thinking that's completely acceptable as a known limitation if documented? Code spike of that idea: from abc import ABCMeta from weakref import WeakSet
class TotallyOrderable(metaclass=ABCMeta): _explicit_only_registry = WeakSet()
@classmethod def register_explicit_only(cls, C): if cls is not TotallyOrderable: raise NotImplementedError( f"{cls.__name__} does not implement 'register_explicit_only'")
cls._explicit_only_registry.add(C)
@classmethod def __subclasshook__(cls, C): if cls is not TotallyOrderable: return NotImplemented
for B in C.__mro__: if B in cls._explicit_only_registry: return NotImplemented
return cls._check_overrides_rich_comparison_methods(C)
@classmethod def _check_overrides_rich_comparison_methods(cls, C): mro = C.__mro__ for method in ('__lt__', '__le__', '__gt__', '__ge__'): for B in mro: if B is not object and method in B.__dict__: if B.__dict__[method] is None: return NotImplemented break else: return NotImplemented return True
Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2OZBPQ... Code of Conduct: http://python.org/psf/codeofconduct/
Maybe so because I found a limitation with my code spike. Calling `register_explicit_only` doesn't bust the cache, so if `issubclass(<myclass>, TotallyOrderable)` is called prior to calling `TotallyOrderable.register_explicit_only(<myclass>)`, then it doesn't have any effect. Perhaps, that's also not a big deal as a documented limitation though?
On 3/5/20 1:37 PM, Andrew Barnert via Python-ideas wrote:
On Mar 5, 2020, at 05:24, Richard Damon <Richard@damon-family.org> wrote:
On 3/4/20 11:07 PM, Andrew Barnert via Python-ideas wrote:
On Mar 4, 2020, at 19:12, Richard Damon <Richard@damon-family.org> wrote: Yes, because of the NaN issue, you sort of need an 'Almost Total Order' and 'Really Truly a Total Order', the first allowing the small exception of a very limited (maybe only one) special value that breaks the true definition of Total Order so that we could call Floats to be an Almost Total Order. The obvious answer is to have PartialOrder < NaNOrder < TotalOrder, where a type is NaN-ordered if it’s partially ordered, and its subset of all self-equal objects is totally ordered.
I can’t imagine when a generic AlmostTotalOrder that didn’t specify how it fails could be useful; a NaNOrder is useful all over the place. A median that ignores NaN values requires NaNOrdered input. Even a median that does something stupid with NaN—see below.
Of course there might be other kinds of almost total orders that might be useful. If anyone runs into one, they can write their own ABC with the proper relationships to the stdlib ones. Or even propose it for the stdlib. But they couldn’t do anything useful with a general AlmostTotalOrder if it had to handle both NaN and whatever their different case was. I think your NaNOrder is my Almost Total Order, except that the name NaNOrder implies that it is only NaN is the special case, and thus we are dealing with numbers, but there may well be some other user type that has a similar 'exceptional' value with properties like NaN, but since the items aren't numbers, the Not-A-Number tag isn't really appropriate. The important thing is that it implies that there’s a subset of values that are totally ordered, and that the only values outside that subset are not self-equal. This means, for example, that you can test whether set is actually orderable in linear time with all(x==x for x in xs). So you can write a median that skips NaN values, or raises on them, just by checking each value as it comes up. But your almost total order only implies that there are some values that break the total order in some way. So you can only test whether a set is actually orderable in quadratic time with all(not(x<y and y<x) for x in xs for y in xs)). The only way to raise in bad values is that quadratic check. There is no way to skip bad values, because even when the check fails, you can’t tell whether it’s x or y that’s bad. So it’s a useless notion.
The other important thing is that NaN order comes up all the time when sorting float, Decimal, Pandas datetime (its NaT has NaN semantics), etc. Of course you could imagine other almost total orders, even if nobody ever comes up with one when asked. Here’s one: the projective reals (real values extended with a single infinity that’s both negative and positive) are almost totally ordered, except that inf is not comparable to anything but itself (which it’s equal to). You could easily write a sort or median or whatever algorithm that does something useful with this order. But only if you know it’s this order. You can’t write something that’s useful with both this order and NaN order. So almost total still doesn’t help with anything; you need to write a new ABC that’s also a subclass of Partial and superclass of Total but has no relation to NaN.
Yes, the Almost Total Order classification needs a bit of work for a full definition. One simple one is that you have functions that really want Total Order classes, but will take an Almost Total Order with a promise that you aren't giving it any of the funny values. Is slightly more involved version might provide a testing function to all a routine to verify this promise. It is quite possible to write something that does something reasonable with an Almost Total Order, even without knowing the characteristics of the breakdown of total order, it the usage is defined as only working on the values that don't include those that break the total order. Float with its NaN issue is a good common example, many programs deal with floats, and don't intend to have NaNs creep in, so treating the float class as being float - nans and thus a total order might be reasonable. -- Richard Damon
On Thu, Mar 05, 2020 at 04:46:14PM -0600, Tim Peters wrote:
[Steven D'Aprano <steve@pearwood.info>]
Sorting doesn't require a total order. Sorting only requires a weak order where the only operator required is the "comes before" operator, or less than. That's precisely how sorting in Python is implemented.
Let's be careful here. Python's list.sort() guarantees that _if_ the elements are totally orderable, it will deliver the unique stable permutation consistent with that ordering. Otherwise, the only thing it guarantees is that the output will be _some_ permutation of the input.
Thank you for the correction, I did know that but didn't express myself as well as I should have. I wasn't intending to imply that sort could magically tease out a stable, intuitive smallest-to-largest order from data that doesn't have a stable, intuitive smallest-to-largest order :-) All I intended was to say that sort would "work" in the sense you state: it will give *some* permutation of the data, where (I think, correct me if I'm wrong) each element compares less than the following element. Wait, no, that's not right either... each element doesn't compare less than the previous element? [...]
No need to imagine it, sort already type-checks its arguments:
py> sorted([1, 3, 5, "Hello", 2]) TypeError: '<' not supported between instances of 'str' and 'int'
Yes and no ;-) list.sort() makes no attempt to check element types for the sake of checking them.
Again, mea culpa, I wasn't as clear as I intended. I didn't mean sort itself did explicit type-checks. As you say, it's just passing on the type errors from comparing incomparable types. Thanks for clarifying what I meant :-) -- Steven
On 3/9/20 12:41 AM, Steven D'Aprano wrote:
Wait, no, that's not right either... each element doesn't compare less than the previous element?
If the elements of the list don't form a total order, then sort makes NO promises about the data returned, other than it is the input data in some order. I suppose the last elements compared will be in order, but it doesn't promise that for the rest of them. In particular, if a comparison causes a swap, it doesn't go back and check that the resulting new pairs are in order, unless the algorithm, based on the assumption of total order, thinks it might need to. -- Richard Damon
On Mon, Mar 09, 2020 at 06:39:10AM -0400, Richard Damon wrote:
On 3/9/20 12:41 AM, Steven D'Aprano wrote:
Wait, no, that's not right either... each element doesn't compare less than the previous element?
If the elements of the list don't form a total order, then sort makes NO promises about the data returned, other than it is the input data in some order.
I think we can tell more than that from the guarantee that sort only calls `<` and not any other comparison. If the output of sort is [a, b, c, d] then I think that we know that: b < a returns False c < b returns False d < c returns False but not necessarily anything else. I'm making this judgement based on Tim's earlier post, where he describes the comparisons used in sorting a list of sets, not from reading the source code or experimentation. There may be more, or less, that we can be sure about. For example, I'm confident that sort has to do at least one comparison on each element (there are no elements that never get looked at; every element gets inspected at least once). I'm also confident (absolutely sure actually) that it *doesn't* compare every *pair* of elements. But it might compare every pair of *consecutive* elements. Of course this isn't a language guarantee, it will depend on the implementation. Bubblesort will probably behave very differently from Heapsort. But we know the implementation used in CPython, and that's what I'm discussing. -- Steven
On 3/9/20 7:45 AM, Steven D'Aprano wrote:
On Mon, Mar 09, 2020 at 06:39:10AM -0400, Richard Damon wrote:
On 3/9/20 12:41 AM, Steven D'Aprano wrote:
Wait, no, that's not right either... each element doesn't compare less than the previous element?
If the elements of the list don't form a total order, then sort makes NO promises about the data returned, other than it is the input data in some order. I think we can tell more than that from the guarantee that sort only calls `<` and not any other comparison.
If the output of sort is [a, b, c, d] then I think that we know that:
b < a returns False c < b returns False d < c returns False
but not necessarily anything else.
I'm making this judgement based on Tim's earlier post, where he describes the comparisons used in sorting a list of sets, not from reading the source code or experimentation. There may be more, or less, that we can be sure about.
For example, I'm confident that sort has to do at least one comparison on each element (there are no elements that never get looked at; every element gets inspected at least once).
I'm also confident (absolutely sure actually) that it *doesn't* compare every *pair* of elements. But it might compare every pair of *consecutive* elements.
Of course this isn't a language guarantee, it will depend on the implementation. Bubblesort will probably behave very differently from Heapsort. But we know the implementation used in CPython, and that's what I'm discussing.
But you can create a type that behaves sufficiently bad that it can't do that. Say we have the opposite of a NaN, lets call it an AnA (its any number), that return true for all comparisons, since there is some number that makes the comparison true. Such a type first makes your guarantee impossible, and will totally mess up any algorithm that is based on transitivity (if a < b and b < c then a < c, since if b is an AnA that won't necessarily hold). NaNs perhaps cause a smaller problem because people tend to not use the form of transitivity that comes from a total order in the negative, i.e. if a is not less than b, and b is not less than c, then a is not less than c. -- Richard Damon
[Steven D'Aprano <steve@pearwood.info>]
... All I intended was to say that sort would "work" in the sense you state: it will give *some* permutation of the data, where (I think, correct me if I'm wrong) each element compares less than the following element.
Wait, no, that's not right either... each element doesn't compare less than the previous element?
There's really nothing useful that can be said without knowing precise implementation details, and essentially nothing at all that can be said about sorting-in-general, when a total order doesn't exist. Even for a 2-element list! If the result of Python's sort is [x, y] then I happen to know (because I know everything about its implementation) that one of two things is true: - The original list was also [x, y], and y < x was False. But we can't deduce anything about what x < y, x <= y, x == y, x != y. x > y, x >= y, y <= x, y > x, y >= x, y == x, or y != x would do. - The original list was [y, x], and x < y was True (but again we can't deduce anything about what any other comparisons involving x and y would do). For longer lists it can get much more complicated; comparing adjacent elements is just the first step in an algorithm with several stages, which dynamically adjust to patterns of comparison outcomes that are discovered as the algorithm goes along. If we don't know the precise implementation, there's really nothing crisp I can think of that necessarily follows from that the result is [x, y]. And I think this must be so for any "reasonably efficient" sorting algorithm. If there are N elements in the list, there are N*(N-1) _possible_ pairwise __lt__ invocations that could be made, but an efficient sort will never invoke more than O(N*log(N)) of them. The ratio of comparisons made to the number that _could_ be made tends to 0 as N increases - and in that sense the sort explores "almost none" of the possible comparison outcomes. For that reason, it would be much more expensive to check that a consistent total order exists than to do the sort. For the same reason, you can't deduce much from the result without knowing precisely which of the "nearly all of 'em" possible comparisons were skipped.
On Mon, Mar 09, 2020 at 04:50:49PM -0500, Tim Peters wrote:
[Steven D'Aprano <steve@pearwood.info>]
... All I intended was to say that sort would "work" in the sense you state: it will give *some* permutation of the data, where (I think, correct me if I'm wrong) each element compares less than the following element.
Wait, no, that's not right either... each element doesn't compare less than the previous element?
There's really nothing useful that can be said without knowing precise implementation details, and essentially nothing at all that can be said about sorting-in-general,
Understood, I'm not discussing sorting in general, I'm discussing Timsort specifically :-)
when a total order doesn't exist. Even for a 2-element list! If the result of Python's sort is
[x, y]
then I happen to know (because I know everything about its implementation) that one of two things is true:
- The original list was also [x, y], and y < x was False.
That's my reasoning, based on your earlier example with the list of three sets.
But we can't deduce anything about what x < y, x <= y, x == y, x != y. x > y, x >= y, y <= x, y > x, y >= x, y == x, or y != x would do.
This doesn't surprise me, because comparison methods are independent of each other and the law of trichotomy doesn't necessarily hold.
- The original list was [y, x], and x < y was True (but again we can't deduce anything about what any other comparisons involving x and y would do).
Now *that* surprises me! Should I understand from this that the direction of comparisons (whether timsort compares `x < y` or `y < x`) could be chosen arbitrarily? My understanding was that the direction of comparisons was controlled by the reverse parameter. So if we're referring only to the default reverse=False, the comparisons must go in one direction rather than the other. Is that wrong?
For longer lists it can get much more complicated; comparing adjacent elements is just the first step in an algorithm with several stages, which dynamically adjust to patterns of comparison outcomes that are discovered as the algorithm goes along.
Ack. But my expectation is that by the time all the complicated stuff was done, we might not know all the gory details of what elements got compared, and we don't know anything about any "global" order over the entire set, but we do know the "local" order: if x ends up next to y, then *at some point* in the sorting x must have been compared to y and so we know that y doesn't compare less than x. Is this not accurate? Thanks for the fascinating glimpse into timsort. -- Steven
[Tim]
.... If the result of Python's sort is
[x, y]
then I happen to know (because I know everything about its implementation) that one of two things is true:
- The original list was also [x, y], and y < x was False.
[Steven]
That's my reasoning, based on your earlier example with the list of three sets.
But we can't deduce anything about what x < y, x <= y, x == y, x != y. x > y, x >= y, y <= x, y > x, y >= x, y == x, or y != x would do.
This doesn't surprise me, because comparison methods are independent of each other and the law of trichotomy doesn't necessarily hold.
The code doesn't even assume that x == x holds - or that x < y will return the same value if called a second time. When user-supplied operators are called, nothing can be assumed. This isn't empty ;-) There's a long history of segfaults due to assuming _any_ kind of sanity in user-supplied code.
- The original list was [y, x], and x < y was True (but again we can't deduce anything about what any other comparisons involving x and y would do).
Now *that* surprises me!
Should I understand from this that the direction of comparisons (whether timsort compares `x < y` or `y < x`) could be chosen arbitrarily?
I'm not sure what this is about. In both cases above, I'm referring to elements by their names in the second (output) list, but they're treated exactly the same way in the original list. If the list is named, say, "values", the only comparison done is values[1] < values[0] If that's True, the values are swapped; if False, they're left alone.
My understanding was that the direction of comparisons was controlled by the reverse parameter.
If reverse=True is passed, values[1] < values[0] is _still_ the only comparison done (although, as about to be explained, the elements are swapped _before_ the sort sees them). `reverse` has no effect whatsoever on any internal step of the algorithm. Instead it's a pre/post-processing gimmick: values.sort(reverse=True) acts like, and is actually _implemented_ as: values.reverse() values.sort() values.reverse() This is subtle, and the docs could be clearer about what it means by "as if each comparison were reversed", I didn't write that (I didn't implement "reverse=" either). but appreciate that what it's trying to say is subtle. It's NOT the same as: values.sort() values.reverse() and in my experience barely anyone knows why. It's to make list.sort() stable even if reverse=True is passed: equal items remain in the their original order regardless of the value of the reverse flag. The second shorter spelling would violate stability, but the first longer spelling preserves it (the order of equal items is first reversed, then the stable regular sort preserves that reversed order, and then the final reverse restores the original order of equal items).
So if we're referring only to the default reverse=False, the comparisons must go in one direction rather than the other. Is that wrong?
Still not clear what the question is, but hope the above answered it. In particular, the reverse flag is mostly a red herring.
For longer lists it can get much more complicated; comparing adjacent elements is just the first step in an algorithm with several stages, which dynamically adjust to patterns of comparison outcomes that are discovered as the algorithm goes along.
Ack. But my expectation is that by the time all the complicated stuff was done, we might not know all the gory details of what elements got compared, and we don't know anything about any "global" order over the entire set, but we do know the "local" order: if x ends up next to y, then *at some point* in the sorting x must have been compared to y and so we know that y doesn't compare less than x.
Is this not accurate?
I really don't know! It sounds plausible ;-) , for the current sort implementation, which is a mix of strategies mostly based on binary search and merging. Offhand I couldn't think of a counterexample - but their was no _intent_ to make that so. It's not true of sorts in general. For an easy example, here's a simple stable(!) quicksort, based on 3-way partitioning (extremely effective in the presence of many equal elements): def quick3(xs): from random import choice if len(xs) <= 1: return xs lt = [] eq = [] gt = [] pivot = choice(xs) for x in xs: if x is pivot or x == pivot: # "is" to guard against insane __eq__ eq.append(x) elif x < pivot: lt.append(x) else: gt.append(x) return quick3(lt) + eq + quick3(gt) Pass it the list [1, 2a. 2b. 2c. 3] where the three instances of "2" are tagged with letters so we can visually distinguish them. If the pivot is 2b, then the only comparisons ever done involve 2b, the output list is the same, but 1 is never compared with 2a nor is 2c ever compared with 3.
Steve Jorgensen wrote:
Steve Jorgensen wrote:
Steve Jorgensen wrote: <snip> The problem I came up with trying to spike out my proposal last night is that there doesn't seem to be anyway to implement it without creating infinite recursion in the issublcass call. If I make Orderable a real or virtual subclass of ProtoOrderable and Orderable's __subclasshook__ or metaclass __subclasscheck__ (I tried both ways) tries to check whether C is a subclass of ProtoOrderable, then an infinite recursion occurs. It wasn't immediately obvious to me why that is the case, but when I thought about it deeply, I can see why that must happen. An alternative that I thought about previously but seems very smelly to me for several reasons is to have both Orderable and NonOrderable ABCs. In that case, what should be done to prevent a class from being both orderable and non-orderable or figure out which should take precedence in that case? As a meta-solution (wild-assed idea) what if metaclass registration could accept keyword arguments, similar to passing keyword arguments to a class definition? That way, a single ABC (ProtoOrderable or whatever better name) could be a real or virtual subclass that is explicitly orderable or non-orderable depending on orderable=<True/False>. I have been unable to implement the class hierarchy that I proposed, and I think I've determined that it's just not a practical fit with how the virtual bas class mechanism works, so… Maybe just a single TotalOrdered or TotalOrderable ABC with a register_explicit_only method. The __subclasshook__ method would skip the rich comparison methods check and return NotImplemented for any class registered using register_explicit_only (or any of its true subclasses). The only weird edge case in the above is that is someone registers another ABC using TotalOrdered.register_explicit_only and uses that as a virtual base class of something else, the register_explicit_only registration will not apply to the virtual subclass. I'm thinking that's completely acceptable as a known limitation if documented? Code spike of that idea: from abc import ABCMeta from weakref import WeakSet
class TotallyOrderable(metaclass=ABCMeta): _explicit_only_registry = WeakSet()
@classmethod def register_explicit_only(cls, C): if cls is not TotallyOrderable: raise NotImplementedError( f"{cls.__name__} does not implement 'register_explicit_only'")
cls._explicit_only_registry.add(C)
@classmethod def __subclasshook__(cls, C): if cls is not TotallyOrderable: return NotImplemented
for B in C.__mro__: if B in cls._explicit_only_registry: return NotImplemented
return cls._check_overrides_rich_comparison_methods(C)
@classmethod def _check_overrides_rich_comparison_methods(cls, C): mro = C.__mro__ for method in ('__lt__', '__le__', '__gt__', '__ge__'): for B in mro: if B is not object and method in B.__dict__: if B.__dict__[method] is None: return NotImplemented break else: return NotImplemented return True
Naming question: Should an abstract base class for this concept be named `TotalOrderable`, `TotallyOrderable`, `TotalOrdered`, or `TotallyOrdered`?
participants (16)
-
Alex Hall
-
Andrew Barnert
-
Chris Angelico
-
Christopher Barker
-
David Mertz
-
Greg Ewing
-
Guido van Rossum
-
Mark Dickinson
-
MRAB
-
Richard Damon
-
Richard Damon
-
Stephen J. Turnbull
-
Steve Jorgensen
-
Steven D'Aprano
-
Stéfane Fermigier
-
Tim Peters