[pypy-svn] rev 2692 - pypy/trunk/doc/objspace

arigo at codespeak.net arigo at codespeak.net
Sat Dec 27 17:44:46 CET 2003


Author: arigo
Date: Sat Dec 27 17:44:45 2003
New Revision: 2692

Modified:
   pypy/trunk/doc/objspace/multimethod.txt
Log:
A possible clean specification of multimethods.


Modified: pypy/trunk/doc/objspace/multimethod.txt
==============================================================================
--- pypy/trunk/doc/objspace/multimethod.txt	(original)
+++ pypy/trunk/doc/objspace/multimethod.txt	Sat Dec 27 17:44:45 2003
@@ -1,143 +1,251 @@
-========================
-PyPy MultiMethod
-========================
-
-Notes on Multimethods
-------------------------
-
-Interpreter-level classes correspond to implementations of application-level
-types. Several implementations can be given for the same type (e.g. several
-ways to code strings or dictionaries), and conversely the same implementation
-can cover several types (e.g. all instances of user-defined types currently
-share the same implementation).
-
-The hierarchy among the classes used for the implementations is convenient
-for implementation purposes. It is not related to any application-level type
-hierarchy.
-
-Dispatch
--------------------
-
-Multimethods dispatch by looking in a set of registered functions. Each
-registered function has a signature, which defines which object implementation
-classes are accepted at the corresponding argument position.
-
-The name 'W_ANY' is a synonym for 'W_Object' (currently, possibly 'object'
-later). As it accepts anything, it is the only way to guarantee that the
-registered function will be called with exactly the same object as was
-passed originally. ATTENTION: in all other cases the argument received by
-the function may have been converted in some way. It must thus not be
-considered to be 'id'entical to the original argument. For example it should
-not be stored in a data structure, nor be queried for type, nor be used for
-another multimethod dispatch -- the only thing you should do is read and
-write its internal data.
-
-For example, 'getattr(obj, attr)' is implemented with a ``W_StringObject`` second
-argument when all it needs is just the name of the attr, and with a W_ANY
-when the 'attr' object could be used as a key in ``obj.__dict__``.
-
-
-Delegation
----------------
-
-Delegation is a transparent conversion mechanism between object
-implementations. The conversion can give a result of a different type
-(e.g. int -> float) or of the same type (e.g. W_VeryLongString -> str).
-There is a global table of delegators. We should not rely on the delegators
-to be tried in any particular order, or at all (e.g. the int -> float delegator
-could be ignored when we know that no registered function will accept a float
-anyway).
-
-Delegation is also used to emulate inheritance between built-in types
-(e.g. bool -> int). This is done by delegation because there is no reason
-that a particular implementation of a sub-type can be trivially typecast
-to some other particular implementation of the parent type; the process might
-require some work.
+=========================
+MultiMethods and Coercion
+=========================
+
+Introduction
+------------
+
+A "multimethod" is the generalization of the OOP notion of "method".  
+Theoretically, a method is a "message name" and signature attached to a
+particular base class, which is implementated in the class or its subclasses.  
+To do a "method call" means to send a message to an object, using a message
+name and actual arguments.  We call "message dispatch" the operation of
+finding which actual implementation is suitable for a particular call.  For
+methods, a message is dispatched by looking up the class of the "self" object,
+and finding an implementation in that class, or in its base classes, in a
+certain order.
+
+A multimethod is a message name and signature that can have implementations
+that depend not only on the class of the first "self" argument, but on the
+class of several arguments.  Because of this we cannot use Python's nice model
+of storing method implementations as functions, in the attributes of the
+class.
+
+Here is a common implementation of multimethods: they are instances of a
+specific MultiMethod class, and the instances are callable (there is a
+__call__ operator on MultiMethod).  When a MultiMethod is called, a dispatch
+algorithm is used to find which, among the registered implementations, is the
+one that should be called; this implementation is then immediately called. The
+most important difference with normal methods is that the MultiMethod object
+to call is no longer syntactically attached to classes.  In other words,
+whereas a method is called with ``obj.somemethod(args)``, a multimethod is
+called much like a function, e.g. ``dosomething(obj1, obj2, obj3...)``.  You
+have to find the MultiMethod object ``dosomething`` in some namespace; it is
+no longer implicitely looked up in the namespace of the "self" object.
+
+In PyPy the MultiMethod objects are stored in the object space instance, thus
+``space.add`` is the name of a MultiMethod.  The ``space`` argument is not
+used for the dispatch, but just as a convenient place to put the MultiMethod
+object.
+
+
+Concept Trees
+-------------
+
+A multimethod is a "signature" and a set of "implementations", which are
+regular Python functions.  The difficulty is to figure out an algorithm that
+should be used to know, given actual argument values, which implementation
+should be called.
+
+This algorithm should only depend on the types of the arguments. For
+explicitness we will *not* use the Python class inheritance at all (because it
+is better suited to normal methods, and anyway it can be emulated if needed).  
+Thus the following diagram looks like inheritance diagrams, but it is actually
+just an explicitely specified tree::
+
+       Value
+          \
+           \
+          Float
+          /   \
+         /     \
+     Integer    \
+       / \       V
+      /   \
+     T     U
+
+This diagram contains three Python types T, U and V, and three "concepts"
+Integer, Float and Value, which are just names.  The types T and U could be
+two different ways to implement integers (e.g. machine-sized ints, and
+variable-sized longs), and the type V could be the IEEE doubles.  Note that in
+this model the real Python types are the "leaves" of the tree.
+
+Consider the multimethod ``add(Value, Value)``.  The signature specifies that
+the two arguments should be two Values (remember that Value is not a type,
+just some "concept"; you cannot ask whether a Python object is a Value or
+not).  Suppose that this multimethod has got three implementations:
+``add(T,T)``, ``add(U,T)`` and ``add(V,V)``.  If you call ``add(t1,t2)`` with
+two objects of type ``T``, the first implementation is used; if you call it
+with a ``U`` and a ``T``, the second one is used; and if you call it with two
+``V``, the third one is used.
+
+But if you call it with another pattern of types, there is no direct match.  
+To be able to satisfy the call, at least one of the arguments will have to be
+converted to another type.  This is where the shape of the tree becomes
+important.  Remember that the signature of the multimethod is ``add(Value,
+Value)``.  The two ``Value`` here mean that conversions are allowed inside of
+the part of the tree that is below ``Value``.  (This is the tree shown above;
+maybe there are other "concepts" above ``Value`` outside the picture, but they
+are ignored.)  The intuitive meaning of the signature is: "add() is an
+operation between two Values".  It allows a object of type T to be converted
+into an object of type U or V and vice-versa, as long as the objects have the
+same "Value", in an intuitive sense.  An example of conversion that destroys
+the Value would be casting a T object into an instance of ``object``, which is
+a parent class of any Python class.
+
+
+Conversion
+----------
+
+All conversions that don't change the "Value" of an object can be registered
+as Python functions.  For example::
+
+    def f(x):
+        return V(x)
+
+might be a conversion from T to V.  But we can say more: this conversion is
+more precisely defined at the level of "Float" in the tree above.  
+Similarily, a conversion from T to U would probably be defined at the
+"Integer" level.
+
+Suppose that we have defined these two conversions, from T to V and from T to
+U.  Suppose now that we call ``add(t,v)`` where ``t`` is of type T and ``v``
+is of type V.  Clearly, we want ``t`` to be converted into a V, which allows
+``add(V,V)`` to be called.  To find this conversion, the algorithm looks into
+the subconcepts of ``Value``, starting from the leaves and going back to the
+higher levels, until it can satisfy the call request by inserting conversions
+registered at these levels.
+
+Starting from the lower levels and going up allows the tree to prioritize the
+solutions: it is better to try to convert between Integers, and only if that
+fails, to try to convert at the Float level, which might promote integer
+objects into floats.
+
+
+Multimethod signature
+---------------------
+
+The signature ``add(Value, Value)`` of the multimethod is essential to specify
+that conversions are indeed allowed for the addition.  In other multimethods,
+some arguments might play different roles.  Consider a multimethod for
+"in-place addition": as this operation might mutate its first argument, it
+must never be automatically converted.  This is expressed by saying that the
+signature of this multimethod is ``inplace_add(Identity, Value)`` where
+``Identity`` is another concept that intuitively means "a Python object whose
+identity is important".  ``Identity`` would not appear in a tree, or if it
+would, it would not have conversions between its subtypes.
+
+Note how a real Python object of type T can either be an "Integer" or an
+"Identity" depending on its role in a multimethod call.  This is why we cannot
+use normal inheritance as the (global) conversion tree: which tree to use
+depends on the role of the argument, which changes in different positions of
+different multimethods.
+
+This approach is general enough to allow arguments to play very different
+roles.  For example, the same mecanisms could be used for an argument that
+must be an integer: the multimethod signature would specify ``Integer``
+instead of ``Value``.  It still allows conversion between integer
+representations, but not from float to int.
+
+In PyPy, some "concepts" are tied to the application-level types: ``Integer``
+corresponds to the application-level ``class int``, and ``Float`` to ``class
+float``.  The T, U and V types are interpreter-level implementations, and they
+are normally not visible at application-level.  It is then natural to define
+what the method ``int.__add__(self, other)`` should do: it should require an
+``Integer`` as its first argument, but it could be T or U -- appropriate
+conversions can be done, as long as we don't convert ``self`` outside the
+realm of ``Integer``.
 
 
-Types
----------
-
-Types are implemented by the class W_TypeObject. This is where inheritance
-and the Method Resolution Order are defined, and where attribute look-ups
-are done.
-
-Instances of user-defined types are implemented as W_UserObjects.
-A user-defined type can inherit from built-in types (maybe more than one,
-although this is incompatible with CPython). The W_UserObject delegator
-converts the object into any of these "parent objects" if needed. This is
-how user-defined types appear to inherit all built-in operator
-implementations.
-
-Delegators should be able to invoke user code; this would let us
-implement special methods like __int__() by calling them within a
-W_UserObject -> int delegator.
-
-Specifics of multimethods
----------------------------
-
-Multimethods dispatch more-specific-first, left-to-right (i.e. if there is
-an exact match for the first argument it will always be tried first).
-
-Delegators are automatically chained (i.e. A -> B and B -> C would be
-combined to allow for A -> C delegation).
-
-Delegators do not publish the class of the converted object in advance,
-so that the W_UserObject delegator can potentially produce any other
-built-in implementation. This means chaining and chain loop detection cannot
-be done statically (at least without help from an analysis tool like the
-translator-to-C). To break loops, we can assume (unless a particular need
-arises) that delegators are looping when they return an object of an
-already-seen class.
-
-Registration
---------------------
-
-The register() method of multimethods adds a function to its database of
-functions, with the given signature. A function that raises
-!FailedToImplement causes the next match to be tried.
-
-'delegate' is the special unary multimethod that should try to convert
-its argument to something else. For greater control, it can also return
-a list of 2-tuples (class, object), or an empty list for failure to convert
-the argument to anything. All delegators will potentially be tried, and
-recursively on each other's results to do chaining.
-
-A priority ordering between delegators is used. See ``objspace.PRIORITY_*``.
+Conversion multimethods
+-----------------------
 
+As conversion functions are linked to levels in the tree, and there can be
+several conversions for each level, they are much like multimethods
+themselves.  In other words, for each "concept" (Value, Float, Integer) we can
+introduce a multimethod (valueconv, floatconv, integerconv) with the
+corresponding signature (``valueconv(Value)``, ``floatconv(Float)``,
+``integerconv(Integer)``).  Specific conversion functions are implementations
+of one of these multimethods.  For example, if ``g`` is a T-to-U conversion,
+it is an implementation of ``integerconv(Integer)``, with the type ``g(T)``.
+
+The job of the multimethod dispatcher algorithm is to insert the appropriate
+implementations of the allowed ``xxxconv(Xxx)`` multimethods until the call
+can be satisfied.
+
+A nice point of view is that these conversion multimethod are identity
+functions (i.e. functions that do nothing, and return their argument
+unmodified): ``integerconv(Integer)`` is the abstract function that takes an
+Integer and just returns it; an implementation like ``g(T)`` actually takes a
+T and returns a U, which is different, but when you look at it abstractedly at
+the Integer level, you just see an Integer input argument and the same Integer
+result.
 
-Translation
------------------------
 
-The code in multimethod.py is not supposed to be read by the
-translator-to-C. Special optimized code will be generated instead
-(typically some kind of precomputed dispatch tables).
-
-Delegation is special-cased too. Most delegators will be found
-to return an object of a statically known class, which means that
-most of the chaining and loop detection can be done in advance.
-
-
-Multimethod slicing
-------------------------
-
-Multimethods are visible to user code as (bound or unbound) methods
-defined for the corresponding types. (At some point built-in functions
-like len() and the operator.xxx() should really directly map to the
-multimethods themselves, too.)
-
-To build a method from a multimethod (e.g. as in 'l.append' or
-'int.__add__'), the result is actually a "slice" of the whole
-multimethod, i.e. a sub-multimethod in which the registration table has
-been trimmed down. (Delegation mechanisms are not restricted for sliced
-multimethods.)
-
-Say that C is the class the new method is attached to (in the above
-examples, respectively, C=type(l) and C=int). The restriction is
-based on the registered class of the first argument ('self' for the
-new method) in the signature. If this class corresponds to a fixed
-type (as advertised by 'statictype'), and this fixed type is C or a
-superclass of C, then we keep it.
+Algorithm
+---------
 
-Some multimethods can also be sliced along their second argument,
-e.g. for __radd__().
+Here is a suggested algorithm.  Roughly, it assumes that arguments with the
+same abstract signature (e.g. ``Value`` in ``add(Value, Value)``) work
+together, but arguments with distinct signature are independent.
+
+Assume that the signature of a multimethod is ``m(C1,...,Cn)``, and we want to
+dispatch the call ``m(A1,...,An)``, where the arguments have types
+``T1,...,Tn`` respectively.  Each type ``Ti`` must appear in the subtree below
+``Ci``, otherwise it is a TypeError.
+
+We use a single set S of types and concepts, which will grow until it is large
+enough to contain the appropriate conversion functions::
+
+    S = { }   # empty set
+    sortedmatches = []
+    while 1:
+        find_matches_in(S)
+        i = the largest integer in {1,...,n} such that Ci not in S
+                       or break if there isn't any such i any more
+        C = the first item in order(Ci) such that C not in S
+        add C into S
+        also add into S the whole subtree of C
+
+where ``order(C)`` is a kind of "method resolution order" of everything
+*under* ``C`` (instead of *over* ``C`` for Python's MRO).  For example,
+following Python 2.2::
+
+    def order(C):
+        lst = []
+        for j in range(n,0,-1):
+            if Tj in subtree(C):
+                lst += [Tj, parent(Tj), parent(parent(Tj)), ..., C]
+        for each D that appears more than once in lst:
+            remove all but the last occurrence of D in lst
+        return lst
+
+The algorithm in Python 2.3 is slightly different, and could probably be used
+instead, though the difference should not be significant for the kind of trees
+we are using.
+
+Finally::
+
+    def find_matches_in(S):
+        matches = list_matches_in(S)
+        remove from matches the items already in sortedmatches
+        if len(matches) > 1:
+            warning ambiguity, or maybe use performance hints
+        sortedmatches += matches
+
+    def list_matches_in(S):
+        conv = { implementations of the conversion multimethod of C
+                 for C in S }
+        combine in all possible ways the functions in conv to change
+                 the types T1,...,Tn into the types U1,...,Un of an
+                 implementation of the multimethod to call
+
+The resulting ``sortedmatches`` list contains, in preference order, the
+implementations that are available to be dispatched to.  We generally just
+call the first one, but it (or any later one) may raise FailedToImplement; in
+this case the dispatcher tries the next one.
+
+The rest of the algorithm are implementation and performance tricks, e.g. it
+should try to call a given conversion function only once and remember the
+value of the converted argument in case we need it again after a
+FailedToImplement.


More information about the Pypy-commit mailing list