Rich Comparisons Gotcha

Mark Wooding mdw at distorted.org.uk
Tue Jan 6 07:42:13 EST 2009


Steven D'Aprano <steven at REMOVE.THIS.cybersource.com.au> wrote:

> Such assumptions only hold under particular domains though. You can't
> assume equality is an equivalence relation once you start thinking
> about arbitrary domains.

>From a formal mathematical point of view, equality /is/ an equivalence
relation.  If you have a relation on some domain, and it's not an
equivalence relation, then it ain't the equality relation, and that's
flat.

> But there cannot be any such function which is a domain-independent 
> equivalence relation, not if we're talking about arbitrarily wacky 
> domains.

That looks like a claim which requires a proof to me.  But it could also
do with a definition of `domain', so I'll settle for one of those first.

If we're dealing with sets (i.e., `domain's form a subclass of `sets')
then the claim is clearly false, and equality (determined by comparison
of elements) is indeed a domain-independent equivalence relation.

> Even something as straight-forward as "is" can't be an equivalence
> relation under a domain where identity isn't well-defined.

You've completely lost me here.  The Python `is' operator is (the
characteristic function of) an equivalence relation on Python values:
that's its definition.  You could describe an extension of the `is'
relation to a larger set of items, such that it fails to be an
equivalence relation on that set, but you'd be (rightly) criticized for
failing to preserve one of its two defining properties.  (The other is
that `is' makes distinctions between values which are at least as fine
as any other method, and this property should also be extended .)

Let me have another go.

All Python objects are instances of `object' or of some more specific
class.  The `==' operator on `object' is (the characteristic function
of) an equivalence relation.  In, fact, it's the same as `is' -- but
`==' can be overridden by subclasses, and subclasses are permitted --
according to the interface definition -- to coarsen the relation.  In
fact, they're permitted to make it not be an equivalence class at all.

I claim that this is a problem.  I /agree/ that domain-specific
predicates are useful, and can be sufficiently useful that they deserve
the `==' name -- as well as floats and numpy, I've provided SAGE and
sympy as examples myself.  But I also believe that there are good
reasons to want an `equivalence' operator (I'll write it as `=~', though
I don't propose this as Python syntax -- see below) with the following
properties:

  * `=~' is the characteristic function[1] of an equivalence relation,
    i.e., for all values x, y, z: x =~ y in (True, False); (x =~ x) ==
    True; if x =~ y then y =~ x; and if x =~ y and y =~ z then x =~ z

  * Moreover, `=~' is a coarsening of `is', i.e. for all values x, y: if
    x is y then x =~ y.

A valuable property might be that x =~ y if x and y are
indistinguishable without using `is'.  That would mean immediately that
'xyz' =~ 'xy' + 'z' (regardless of interning, because strings are
immutable).  But for tuples this would imply elementwise comparison,
which may be expensive -- and, in the case of tuples manufactured by C
extensions, nontrivial because manufactured tuples need not be acyclic.
On the other hand, `==' is already recursive on tuples.

We can envisage a collection of different relations, according to which
distinguishing methods we're willing to disallow.  For example, for
numerical types, there are actually a number of interesting relations,
according to whether you think the answers to the following questions
are true or false.

  * Is 1 =~ 1/1?  (Here, 1 is an integer, and 1/1 is a rational number;
    both are the multiplicative identities of their respective rings.
    I'd suggest that it doesn't seem very useful to say `no' here, but
    there might be reasons why one would want type(x) is type(y) if
    x =~ y.)

  * Is 1 =~ 1.0?  (This is trickier.  Numerically the values are equal;
    but the former is exact and the latter inexact, and this is a good
    reason to want a separation.)

Essentially, these are asking whether `type' is a legitimate
distinguisher, and I think that the answer, unhelpful as it may be, is
`sometimes'.

A third useful distinguishing technique is mutation.  Given two
singleton lists whose respective elements compare equivalent, I can
mutate one of them to decide whether the other is in fact the same.  Is
this something which `=~' should distinguish?  Again, the answer is
probably `sometimes'.

To summarize: we're left with at least three different characteristics
which an equivalence predicate might have:

  * efficient (e.g., bounded recursion depth, works on circular values);
  * neglects irrelevant (to whom?) differences of type; and
  * neglects differences due to mutability.

A predicate used to compare set elements or hash-table keys should
probably /respect/ mutability.  (Associating hashing with this
predicate, rather than `==', would coherently allow mutable objects such
as lists to be used as dictionary keys, though they'd be compared by
address.  I don't actually know how useful this would be, but suspect
that it wouldn't.)

Oh, before I go, let me make this very clear: I am /not/ proposing a
language change.  I think the right way to addres these problems is
using existing mechanisms such as generic functions with multimethods.
Syntax can come later if it seems sufficiently important.

[1] I'll settle for it being a partial function, i.e., attempting to
    evaluate x =~ y might raise exceptions, e.g., if x is in some
    invalid state, or perhaps if one or both of x or y is circular,
    though it would be good to minimize such cases.

-- [mdw]



More information about the Python-list mailing list