[Python-checkins] CVS: python/nondist/peps pep-0207.txt,1.5,1.6
Guido van Rossum
python-dev@python.org
Mon, 4 Dec 2000 12:32:16 -0800
Update of /cvsroot/python/python/nondist/peps
In directory slayer.i.sourceforge.net:/tmp/cvs-serv12181
Modified Files:
pep-0207.txt
Log Message:
Checking in some text. Most of this is simply an Appendix repeating
what David Ascher said in 1998 (before the starship bites the dust
again).
Index: pep-0207.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0207.txt,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -r1.5 -r1.6
*** pep-0207.txt 2000/11/28 22:23:25 1.5
--- pep-0207.txt 2000/12/04 20:32:13 1.6
***************
*** 2,9 ****
Title: Rich Comparisions
Version: $Revision$
! Author: mal@lemburg.com (Marc-Andre Lemburg), guido@python.org (Guido van Rossum)
Python-Version: 2.1
Status: Incomplete
--- 2,330 ----
Title: Rich Comparisions
Version: $Revision$
! Author: guido@python.org (Guido van Rossum), mal@lemburg.com (Marc-Andre Lemburg)
Python-Version: 2.1
Status: Incomplete
+
+ Abstract
+
+ This PEP proposes several new features for comparisons:
+
+ - Allow separately overloading of <, >, <=, >=, ==, !=, both in
+ classes and in C extensions.
+
+ - Allow any of those overloaded operators to return something else
+ besides a Boolean result.
+
+
+ Motivation
+
+ The main motivation comes from NumPy, whose users agree that A<B
+ should return an array of elementwise comparison outcomes; they
+ currently have to spell this as less(A,B) because A<B can only
+ return a Boolean result or an exception.
+
+ An additional motivation is that frequently, types don't have a
+ natural ordering, but still need to be compared for equality.
+ Currenlty such a type *must* implement comparison and thus assign
+ an arbitrary ordering, just so that equality can be tested.
+
+ More motivation can be found in the proposals listed under
+ previous work below.
+
+
+ Previous Work
+
+ Rich Comparisons have been proposed before; in particular by David
+ Ascher, after experience with Numerical Python:
+
+ http://starship.python.net/crew/da/proposals/richcmp.html
+
+ It is also included as an appendix. In this proposal, David also
+ proposes the addition of an optional 3rd argument to cmp(), as in:
+ cmp(a, b, "<") or cmp(a, b, "!=").
+
+
+ Concerns
+
+ - Backwards compatibility, both at the Python level (classes using
+ __cmp__ need not be changed) and at the C level (extensions
+ defining tp_compare need not be changed).
+
+ - When A<B returns a matrix of elementwise comparisons, an easy
+ mistake to make is to use this expression in a Boolean context.
+ Without special precautions, it would always be true. This use
+ should raise an exception instead.
+
+ - If a class overrides x==y but nothing else, should x!=y be
+ computed as not(x==y), or fail? (I think this is OK; David
+ disagrees.)
+
+ - Similarly, should we allow x<y to be calculated from y>x? (I
+ think this is OK; David agrees.)
+
+ - Similarly, should we allow x<=y to be calculated from not(x>y)?
+ (I think this is *not* OK; neither does David.)
+
+ - When using comparisons to generate elementwise comparisons, what
+ to do about shortcut operators like A<B<C or ``A<B and C<D''?
+ (David proposes a solution for A<B<C, but it means that ``if
+ A<B:...'' will assume ``if true:...''.
+
+
+ Solution
+
+ To be done.
+
+
+ Copyright
+
+ This document has been placed in the public domain.
+
+
+ Appendix
+
+ Here, for posterity, is most of David Ascher's original proposal.
+ It addresses almost all concerns.
+
+
+ Abstract
+
+ A new mechanism allowing comparisons of Python objects to return
+ values other than -1, 0, or 1 (or raise exceptions) is
+ proposed. This mechanism is entirely backwards compatible, and can
+ be controlled at the level of the C PyObject type or of the Python
+ class definition. There are three cooperating parts to the
+ proposed mechanism:
+
+ - the use of the last slot in the type object structure to store a
+ pointer to a rich comparison function
+
+ - the addition of special methods for classes
+
+ - the addition of an optional argument to the builtin cmp()
+ function.
+
+
+ Motivation
+
+ The current comparison protocol for Python objects assumes that
+ any two Python objects can be compared (as of Python 1.5, object
+ comparisons can raise exceptions), and that the return value for
+ any comparison should be -1, 0 or 1. -1 indicates that the first
+ argument to the comparison function is less than the right one, +1
+ indicating the contrapositive, and 0 indicating that the two
+ objects are equal. While this mechanism allows the establishment
+ of a order relationship (e.g. for use by the sort() method of list
+ objects), it has proven to be limited in the context of Numeric
+ Python (NumPy).
+
+ Specifically, NumPy allows the creation of multidimensional
+ arrays, which support most of the numeric operators. Thus:
+
+ x = array((1,2,3,4)) y = array((2,2,4,4))
+
+ are two NumPy arrays. While they can be added elementwise,:
+
+ z = x + y # z == array((3,4,7,8))
+
+ they cannot be compared in the current framework - the released
+ version of NumPy compares the pointers, (thus yielding junk
+ information) which was the only solution before the recent
+ addition of the ability (in 1.5) to raise exceptions in comparison
+ functions.
+
+ Even with the ability to raise exceptions, the current protocol
+ makes array comparisons useless. To deal with this fact, NumPy
+ includes several functions which perform the comparisons: less(),
+ less_equal(), greater(), greater_equal(), equal(),
+ not_equal(). These functions return arrays with the same shape as
+ their arguments (modulo broadcasting), filled with 0's and 1's
+ depending on whether the comparison is true or not for each
+ element pair. Thus, for example, using the arrays x and y defined
+ above:
+
+ less(x,y)
+
+ would be an array containing the numbers (1,0,0,0).
+
+ The current proposal is to modify the Python object interface to
+ allow the NumPy package to make it so that x < y returns the same
+ thing as less(x,y). The exact return value is up to the NumPy
+ package -- what this proposal really asks for is changing the
+ Python core so that extension objects have the ability to return
+ something other than -1, 0, 1, should their authors choose to do
+ so.
+
+ Current State of Affairs
+
+ The current protocol is, at the C level, that each object type
+ defines a tp_compare slot, which is a pointer to a function which
+ takes two PyObject* references and returns -1, 0, or 1. This
+ function is called by the PyObject_Compare() function defined in
+ the C API. PyObject_Compare() is also called by the builtin
+ function cmp() which takes two arguments.
+
+ 1. Changes to the C structure for type objects
+
+ The last availabel slot in the PyTypeObject, reserved up to now
+ for future expansion, is used to optionally store a pointer to a
+ new comparison function, of type richcmpfunc defined by:
+
+ typedef PyObject *(*richcmpfunc)
+ Py_PROTO((PyObject *, PyObject *, int));
+
+ This function takes three arguments. The first two are the objects
+ to be compared, and the third is an integer corresponding to an
+ opcode (one of LT, LE, EQ, NE, GT, GE). If this slot is left NULL,
+ then rich comparison for that object type is not supported (except
+ for class instances whose class provide the special methods
+ described below).
+
+ The above opcodes need to be added to the published Python/C API
+ (probably under the names Py_LT, Py_LE, etc.)
+
+ 2. Additions of special methods for classes
+
+ Classes wishing to support the rich comparison mechanisms must add
+ one or more of the following new special methods:
+
+ def __lt__(self, other):
+ ...
+ def __le__(self, other):
+ ...
+ def __gt__(self, other):
+ ...
+ def __ge__(self, other):
+ ...
+ def __eq__(self, other):
+ ...
+ def __ne__(self, other):
+ ...
+
+ Each of these is called when the class instance is the on the
+ left-hand-side of the corresponding operators (<, <=, >, >=, ==,
+ and != or <>). The argument other is set to the object on the
+ right side of the operator. The return value of these methods is
+ up to the class implementor (after all, that's the entire point of
+ the proposal).
+
+ If the object on the left side of the operator does not define an
+ appropriate rich comparison operator (either at the C level or
+ with one of the special methods, then the comparison is reversed,
+ and the right hand operator is called with the opposite operator,
+ and the two objects are swapped. This assumes that a < b and b > a
+ are equivalent, as are a <= b and b >= a, and that == and != are
+ commutative (e.g. a == b if and only if b == a).
+
+ For example, if obj1 is an object which supports the rich
+ comparison protocol and x and y are objects which do not support
+ the rich comparison protocol, then obj1 < x will call the __lt__
+ method of obj1 with x as the second argument. x < obj1 will call
+ obj1's __gt__ method with x as a second argument, and x < y will
+ just use the existing (non-rich) comparison mechanism.
+
+ The above mechanism is such that classes can get away with not
+ implementing either __lt__ and __le__ or __gt__ and
+ __ge__. Further smarts could have been added to the comparison
+ mechanism, but this limited set of allowed "swaps" was chosen
+ because it doesn't require the infrastructure to do any processing
+ (negation) of return values. The choice of six special methods was
+ made over a single (e.g. __richcmp__) method to allow the
+ dispatching on the opcode to be performed at the level of the C
+ implementation rather than the user-defined method.
+
+ 3. Addition of an optional argument to the builtin cmp()
+
+ The builtin cmp() is still used for simple comparisons. For rich
+ comparisons, it is called with a third argument, one of "<", "<=",
+ ">", ">=", "==", "!=", "<>" (the last two have the same
+ meaning). When called with one of these strings as the third
+ argument, cmp() can return any Python object. Otherwise, it can
+ only return -1, 0 or 1 as before.
+
+ Chained Comparisons
+
+ Problem
+
+ It would be nice to allow objects for which the comparison returns
+ something other than -1, 0, or 1 to be used in chained
+ comparisons, such as:
+
+ x < y < z
+
+ Currently, this is interpreted by Python as:
+
+ temp1 = x < y
+ if temp1:
+ return y < z
+ else:
+ return temp1
+
+ Note that this requires testing the truth value of the result of
+ comparisons, with potential "shortcutting" of the right-side
+ comparison testings. In other words, the truth-value of the result
+ of the result of the comparison determines the result of a chained
+ operation. This is problematic in the case of arrays, since if x,
+ y and z are three arrays, then the user expects:
+
+ x < y < z
+
+ to be an array of 0's and 1's where 1's are in the locations
+ corresponding to the elements of y which are between the
+ corresponding elements in x and z. In other words, the right-hand
+ side must be evaluated regardless of the result of x < y, which is
+ incompatible with the mechanism currently in use by the parser.
+
+ Solution
+
+ Guido mentioned that one possible way out would be to change the
+ code generated by chained comparisons to allow arrays to be
+ chained-compared intelligently. What follows is a mixture of his
+ idea and my suggestions. The code generated for x < y < z would be
+ equivalent to:
+
+ temp1 = x < y
+ if temp1:
+ temp2 = y < z
+ return boolean_combine(temp1, temp2)
+ else:
+ return temp1
+
+ where boolean_combine is a new function which does something like
+ the following:
+
+ def boolean_combine(a, b):
+ if hasattr(a, '__boolean_and__') or \
+ hasattr(b, '__boolean_and__'):
+ try:
+ return a.__boolean_and__(b)
+ except:
+ return b.__boolean_and__(a)
+ else: # standard behavior
+ if a:
+ return b
+ else:
+ return 0
+
+ where the __boolean_and__ special method is implemented for
+ C-level types by another value of the third argument to the
+ richcmp function. This method would perform a boolean comparison
+ of the arrays (currently implemented in the umath module as the
+ logical_and ufunc).
+
+ Thus, objects returned by rich comparisons should always test
+ true, but should define another special method which creates
+ boolean combinations of them and their argument.
+
+ This solution has the advantage of allowing chained comparisons to
+ work for arrays, but the disadvantage that it requires comparison
+ arrays to always return true (in an ideal world, I'd have them
+ always raise an exception on truth testing, since the meaning of
+ testing "if a>b:" is massively ambiguous.
+
+ The inlining already present which deals with integer comparisons
+ would still apply, resulting in no performance cost for the most
+ common cases.