Rich Comparisons Gotcha

Tue Jan 6 18:10:08 EST 2009

On Tue, 06 Jan 2009 12:42:13 +0000, Mark Wooding wrote:

> Steven D'Aprano <steven at REMOVE.THIS.cybersource.com.au> wrote:
> 
>> Such assumptions only hold under particular domains though. You can't
>> assume equality is an equivalence relation once you start thinking
>> about arbitrary domains.
> 
> From a formal mathematical point of view, equality /is/ an equivalence
> relation.  If you have a relation on some domain, and it's not an
> equivalence relation, then it ain't the equality relation, and that's
> flat.

Okay, fair enough. In the formal mathematical sense, equality is always 
an equivalence relation. So there are certain domains which don't have 
equality, e.g. floating point, since nan != nan. Also Python objects, 
since x.__eq__(y) is not necessarily the same as y.__eq__(x).

>> But there cannot be any such function which is a domain-independent
>> equivalence relation, not if we're talking about arbitrarily wacky
>> domains.
> 
> That looks like a claim which requires a proof to me.  But it could also
> do with a definition of `domain', so I'll settle for one of those first.

I'm talking about domain in the sense of "a particular problem domain". 
That is, the model, data and operations used to solve a problem. I don't 
know that I can be more formal than that.

To prove my claim, all you need is two domains with a mutually 
incompatible definition of equality. That's not so difficult, surely? How 
about equality of integers, versus equality of integers modulo some N?

> If we're dealing with sets (i.e., `domain's form a subclass of `sets')
> then the claim is clearly false, and equality (determined by comparison
> of elements) is indeed a domain-independent equivalence relation.

It isn't domain-independent in my sense, because you have specified one 
specific domain, namely set equality.

>> Even something as straight-forward as "is" can't be an equivalence
>> relation under a domain where identity isn't well-defined.
> 
> You've completely lost me here.  The Python `is' operator is (the
> characteristic function of) an equivalence relation on Python values:
> that's its definition.

Yes, that's because identity is well-defined in Python. I'm saying that 
if identity isn't well-defined, then neither is the 'is' operator, and 
therefore it isn't an equivalence relation. That shouldn't be 
controversial.

> All Python objects are instances of `object' or of some more specific
> class.  The `==' operator on `object' is (the characteristic function
> of) an equivalence relation.  In, fact, it's the same as `is' -- but
> `==' can be overridden by subclasses, and subclasses are permitted --
> according to the interface definition -- to coarsen the relation.  In
> fact, they're permitted to make it not be an equivalence class at all.
> 
> I claim that this is a problem.  

It *can* be a problem, if you insist on using == on arbitrary types while 
still expecting it to be an equivalence relation.

If you drop the requirement that it remain an e-r, then you can apply == 
to arbitrary types. And if you limit yourself to non-arbitrary types, 
then you can safely use (say) any strings you like, and == will remain an 
e-r. 

I /agree/ that domain-specific
> predicates are useful, and can be sufficiently useful that they deserve
> the `==' name -- as well as floats and numpy, I've provided SAGE and
> sympy as examples myself.  But I also believe that there are good
> reasons to want an `equivalence' operator (I'll write it as `=~', though
> I don't propose this as Python syntax -- see below) with the following
> properties:
> 
>   * `=~' is the characteristic function[1] of an equivalence relation,
>     i.e., for all values x, y, z: x =~ y in (True, False); (x =~ x) ==
>     True; if x =~ y then y =~ x; and if x =~ y and y =~ z then x =~ z
>
>   * Moreover, `=~' is a coarsening of `is', i.e. for all values x, y: if
>     x is y then x =~ y.

Ah, but you can't have such a generic e-r that applies across all problem 
domains. Consider:

Let's denote regular, case-sensitive strings using "abc", and special, 
case-insensitive strings using i"abc". So for regular strings, equality 
is an e-r; for case-insensitive strings, equality is also an e-r  (I 
trust that the truth of this is obvious). But if you try to use equality 
on *both* regular and case-insensitive strings, it fails to be an e-r:

i"abc" =~ "ABC" returns True if you use the case-insensitive definition 
of equality, but returns False if you use the case-sensitive definition. 
There is no single definition of equality that is *simultaneously* case-
sensitive and case-insensitive.

> A valuable property might be that x =~ y if x and y are
> indistinguishable without using `is'.

That's a little strong, because it implies that equality must look at 
*everything* about a particular object, not just whatever bits of data 
are relevant for the problem domain.

For example, consider storing data in a dict.

>>> D1 = {-1: 0, -2: 0}
>>> D2 = {-2: 0}
>>> D2[-1] = 0
>>> D1 == D2
True

We certainly want D1 and D2 to be equal. But their history is different, 
and that makes their internal details different, which has detectable 
consequences:

>>> D1
{-2: 0, -1: 0}
>>> D2
{-1: 0, -2: 0}

The same happens with trees. Given a tree structure defined as:

(payload, left-subtree, right-subtree)

do you want the following two trees to be equal?

('b', ('a', None, None), ('c', None, None))

('a', None, ('b', None, ('c', None, None)))

Unless I've made a silly mistake, not only are the payloads of the two 
trees equal, but so are the in-order representation of both. Only the 
specific order the nodes are stored in differ, and that may not be 
important for the specific problem you are trying to solve.

There may be problem domains where the order of elements in a list (or 
tree structure) *is* important, and other problem domains where order is 
irrelevant. One single relation can't cover all such conflicting 
requirements.

-- 
Steven