[Python-checkins] CVS: python/nondist/peps pep-0207.txt,1.5,1.6

Guido van Rossum python-dev@python.org
Mon, 4 Dec 2000 12:32:16 -0800


Update of /cvsroot/python/python/nondist/peps
In directory slayer.i.sourceforge.net:/tmp/cvs-serv12181

Modified Files:
	pep-0207.txt 
Log Message:
Checking in some text.  Most of this is simply an Appendix repeating
what David Ascher said in 1998 (before the starship bites the dust
again).


Index: pep-0207.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0207.txt,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -r1.5 -r1.6
*** pep-0207.txt	2000/11/28 22:23:25	1.5
--- pep-0207.txt	2000/12/04 20:32:13	1.6
***************
*** 2,9 ****
  Title: Rich Comparisions
  Version: $Revision$
! Author: mal@lemburg.com (Marc-Andre Lemburg), guido@python.org (Guido van Rossum)
  Python-Version: 2.1
  Status: Incomplete
  
  
  
--- 2,330 ----
  Title: Rich Comparisions
  Version: $Revision$
! Author: guido@python.org (Guido van Rossum), mal@lemburg.com (Marc-Andre Lemburg)
  Python-Version: 2.1
  Status: Incomplete
  
+ 
+ Abstract
+ 
+     This PEP proposes several new features for comparisons:
+ 
+     - Allow separately overloading of <, >, <=, >=, ==, !=, both in
+       classes and in C extensions.
+ 
+     - Allow any of those overloaded operators to return something else
+       besides a Boolean result.
+ 
+ 
+ Motivation
+ 
+     The main motivation comes from NumPy, whose users agree that A<B
+     should return an array of elementwise comparison outcomes; they
+     currently have to spell this as less(A,B) because A<B can only
+     return a Boolean result or an exception.
+ 
+     An additional motivation is that frequently, types don't have a
+     natural ordering, but still need to be compared for equality.
+     Currenlty such a type *must* implement comparison and thus assign
+     an arbitrary ordering, just so that equality can be tested.
+ 
+     More motivation can be found in the proposals listed under
+     previous work below.
+ 
+ 
+ Previous Work
+ 
+     Rich Comparisons have been proposed before; in particular by David
+     Ascher, after experience with Numerical Python:
+ 
+       http://starship.python.net/crew/da/proposals/richcmp.html
+ 
+     It is also included as an appendix.  In this proposal, David also
+     proposes the addition of an optional 3rd argument to cmp(), as in:
+     cmp(a, b, "<") or cmp(a, b, "!=").
+ 
+ 
+ Concerns
+ 
+     - Backwards compatibility, both at the Python level (classes using
+       __cmp__ need not be changed) and at the C level (extensions
+       defining tp_compare need not be changed).
+ 
+     - When A<B returns a matrix of elementwise comparisons, an easy
+       mistake to make is to use this expression in a Boolean context.
+       Without special precautions, it would always be true.  This use
+       should raise an exception instead.
+ 
+     - If a class overrides x==y but nothing else, should x!=y be
+       computed as not(x==y), or fail?  (I think this is OK; David
+       disagrees.)
+ 
+     - Similarly, should we allow x<y to be calculated from y>x?  (I
+       think this is OK; David agrees.)
+ 
+     - Similarly, should we allow x<=y to be calculated from not(x>y)?
+       (I think this is *not* OK; neither does David.)
+ 
+     - When using comparisons to generate elementwise comparisons, what
+       to do about shortcut operators like A<B<C or ``A<B and C<D''?
+       (David proposes a solution for A<B<C, but it means that ``if
+       A<B:...'' will assume ``if true:...''.
+ 
+ 
+ Solution
+ 
+     To be done.
+ 
+ 
+ Copyright
+ 
+     This document has been placed in the public domain.
+ 
+ 
+ Appendix
+ 
+     Here, for posterity, is most of David Ascher's original proposal.
+     It addresses almost all concerns.
+ 
+ 
+ Abstract
+ 
+     A new mechanism allowing comparisons of Python objects to return
+     values other than -1, 0, or 1 (or raise exceptions) is
+     proposed. This mechanism is entirely backwards compatible, and can
+     be controlled at the level of the C PyObject type or of the Python
+     class definition. There are three cooperating parts to the
+     proposed mechanism:
+ 
+     - the use of the last slot in the type object structure to store a
+     pointer to a rich comparison function
+ 
+     - the addition of special methods for classes
+ 
+     - the addition of an optional argument to the builtin cmp()
+     function.
+ 
+ 
+ Motivation
+ 
+     The current comparison protocol for Python objects assumes that
+     any two Python objects can be compared (as of Python 1.5, object
+     comparisons can raise exceptions), and that the return value for
+     any comparison should be -1, 0 or 1. -1 indicates that the first
+     argument to the comparison function is less than the right one, +1
+     indicating the contrapositive, and 0 indicating that the two
+     objects are equal. While this mechanism allows the establishment
+     of a order relationship (e.g. for use by the sort() method of list
+     objects), it has proven to be limited in the context of Numeric
+     Python (NumPy).
+ 
+     Specifically, NumPy allows the creation of multidimensional
+     arrays, which support most of the numeric operators. Thus:
+ 
+              x = array((1,2,3,4))        y = array((2,2,4,4))
+ 
+     are two NumPy arrays. While they can be added elementwise,:
+ 
+              z = x + y   # z == array((3,4,7,8))
+ 
+     they cannot be compared in the current framework - the released
+     version of NumPy compares the pointers, (thus yielding junk
+     information) which was the only solution before the recent
+     addition of the ability (in 1.5) to raise exceptions in comparison
+     functions.
+ 
+     Even with the ability to raise exceptions, the current protocol
+     makes array comparisons useless. To deal with this fact, NumPy
+     includes several functions which perform the comparisons: less(),
+     less_equal(), greater(), greater_equal(), equal(),
+     not_equal(). These functions return arrays with the same shape as
+     their arguments (modulo broadcasting), filled with 0's and 1's
+     depending on whether the comparison is true or not for each
+     element pair. Thus, for example, using the arrays x and y defined
+     above:
+ 
+              less(x,y) 
+ 
+     would be an array containing the numbers (1,0,0,0).
+ 
+     The current proposal is to modify the Python object interface to
+     allow the NumPy package to make it so that x < y returns the same
+     thing as less(x,y). The exact return value is up to the NumPy
+     package -- what this proposal really asks for is changing the
+     Python core so that extension objects have the ability to return
+     something other than -1, 0, 1, should their authors choose to do
+     so.
+ 
+ Current State of Affairs
+ 
+     The current protocol is, at the C level, that each object type
+     defines a tp_compare slot, which is a pointer to a function which
+     takes two PyObject* references and returns -1, 0, or 1. This
+     function is called by the PyObject_Compare() function defined in
+     the C API. PyObject_Compare() is also called by the builtin
+     function cmp() which takes two arguments.
+ 
+     1. Changes to the C structure for type objects
+ 
+     The last availabel slot in the PyTypeObject, reserved up to now
+     for future expansion, is used to optionally store a pointer to a
+     new comparison function, of type richcmpfunc defined by:
+ 
+            typedef PyObject *(*richcmpfunc)
+                 Py_PROTO((PyObject *, PyObject *, int));
+ 
+     This function takes three arguments. The first two are the objects
+     to be compared, and the third is an integer corresponding to an
+     opcode (one of LT, LE, EQ, NE, GT, GE). If this slot is left NULL,
+     then rich comparison for that object type is not supported (except
+     for class instances whose class provide the special methods
+     described below).
+ 
+     The above opcodes need to be added to the published Python/C API
+     (probably under the names Py_LT, Py_LE, etc.)
+ 
+     2. Additions of special methods for classes
+ 
+     Classes wishing to support the rich comparison mechanisms must add
+     one or more of the following new special methods:
+ 
+              def __lt__(self, other):
+                 ...
+              def __le__(self, other):
+                 ...
+              def __gt__(self, other):
+                 ...
+              def __ge__(self, other):
+                 ...
+              def __eq__(self, other):
+                 ...
+              def __ne__(self, other):
+                 ...
+ 
+     Each of these is called when the class instance is the on the
+     left-hand-side of the corresponding operators (<, <=, >, >=, ==,
+     and != or <>). The argument other is set to the object on the
+     right side of the operator. The return value of these methods is
+     up to the class implementor (after all, that's the entire point of
+     the proposal).
+ 
+     If the object on the left side of the operator does not define an
+     appropriate rich comparison operator (either at the C level or
+     with one of the special methods, then the comparison is reversed,
+     and the right hand operator is called with the opposite operator,
+     and the two objects are swapped. This assumes that a < b and b > a
+     are equivalent, as are a <= b and b >= a, and that == and != are
+     commutative (e.g. a == b if and only if b == a).
+ 
+     For example, if obj1 is an object which supports the rich
+     comparison protocol and x and y are objects which do not support
+     the rich comparison protocol, then obj1 < x will call the __lt__
+     method of obj1 with x as the second argument. x < obj1 will call
+     obj1's __gt__ method with x as a second argument, and x < y will
+     just use the existing (non-rich) comparison mechanism.
+ 
+     The above mechanism is such that classes can get away with not
+     implementing either __lt__ and __le__ or __gt__ and
+     __ge__. Further smarts could have been added to the comparison
+     mechanism, but this limited set of allowed "swaps" was chosen
+     because it doesn't require the infrastructure to do any processing
+     (negation) of return values. The choice of six special methods was
+     made over a single (e.g. __richcmp__) method to allow the
+     dispatching on the opcode to be performed at the level of the C
+     implementation rather than the user-defined method.
+ 
+     3. Addition of an optional argument to the builtin cmp()
+ 
+     The builtin cmp() is still used for simple comparisons. For rich
+     comparisons, it is called with a third argument, one of "<", "<=",
+     ">", ">=", "==", "!=", "<>" (the last two have the same
+     meaning). When called with one of these strings as the third
+     argument, cmp() can return any Python object. Otherwise, it can
+     only return -1, 0 or 1 as before.
+ 
+ Chained Comparisons
+ 
+     Problem
+ 
+     It would be nice to allow objects for which the comparison returns
+     something other than -1, 0, or 1 to be used in chained
+     comparisons, such as:
+ 
+              x < y < z
+ 
+     Currently, this is interpreted by Python as:
+ 
+              temp1 = x < y
+              if temp1:
+                return y < z
+              else:
+                return temp1     
+ 
+     Note that this requires testing the truth value of the result of
+     comparisons, with potential "shortcutting" of the right-side
+     comparison testings. In other words, the truth-value of the result
+     of the result of the comparison determines the result of a chained
+     operation. This is problematic in the case of arrays, since if x,
+     y and z are three arrays, then the user expects:
+ 
+             x < y < z
+ 
+     to be an array of 0's and 1's where 1's are in the locations
+     corresponding to the elements of y which are between the
+     corresponding elements in x and z. In other words, the right-hand
+     side must be evaluated regardless of the result of x < y, which is
+     incompatible with the mechanism currently in use by the parser.
+ 
+     Solution
+ 
+     Guido mentioned that one possible way out would be to change the
+     code generated by chained comparisons to allow arrays to be
+     chained-compared intelligently. What follows is a mixture of his
+     idea and my suggestions. The code generated for x < y < z would be
+     equivalent to:
+ 
+              temp1 = x < y
+              if temp1:
+                temp2 = y < z
+                return boolean_combine(temp1, temp2)
+              else:
+                return temp1     
+ 
+     where boolean_combine is a new function which does something like
+     the following:
+ 
+              def boolean_combine(a, b):
+                  if hasattr(a, '__boolean_and__') or \
+                     hasattr(b, '__boolean_and__'):
+                      try:
+                          return a.__boolean_and__(b)
+                      except:
+                          return b.__boolean_and__(a)
+                  else: # standard behavior
+                      if a:
+                          return b
+                      else: 
+                          return 0
+ 
+     where the __boolean_and__ special method is implemented for
+     C-level types by another value of the third argument to the
+     richcmp function. This method would perform a boolean comparison
+     of the arrays (currently implemented in the umath module as the
+     logical_and ufunc).
+ 
+     Thus, objects returned by rich comparisons should always test
+     true, but should define another special method which creates
+     boolean combinations of them and their argument.
+ 
+     This solution has the advantage of allowing chained comparisons to
+     work for arrays, but the disadvantage that it requires comparison
+     arrays to always return true (in an ideal world, I'd have them
+     always raise an exception on truth testing, since the meaning of
+     testing "if a>b:" is massively ambiguous.
+ 
+     The inlining already present which deals with integer comparisons
+     would still apply, resulting in no performance cost for the most
+     common cases.