[Python-3000] Adaptation [was:Re: Iterators for dict keys, values, and items == annoying :)]

Sat Apr 1 20:50:51 CEST 2006

On Apr 1, 2006, at 8:31 AM, Aahz wrote:
    ...
> Seriously, I can almost see why you think adaptation is a huge  
> gain, but
> every time I start looking closer, I get bogged down in trying to
> understand adaptation registration.  Do you have a simple way of
> explaining how that works *well* and *simply*?  Because unless one can
> handle registration cleanly, I don't understand how adaptation can be
> generally useful for Python.  Conversely, if adaptation  
> registration is
> *NOT* required, please explain that in simple terms.

Yes, the ability to register adapters is indeed necessary to gain the  
full benefits of "protocol adaptation". Let's see if I'm up to  
explaining things a bit more clearly than I've managed in the past,  
by "abstracting out" just adaptation and registration from the many  
cognate issues (you can find more real-world approaches in Zope,  
Twisted or PyProtocols, but perhaps I can better show the underlying  
idea by sidestepping practical concerns such as performance  
optimization and effective error-checking).

Suppose, just for simplicity and to separate this issue from issues  
related to "how should interfaces or protocols be formalized", that  
we choose, as the one and only way to identify each protocol, a  
unique-string, such as the half-hearted 'com.python.stdlib.index'  
example I mentioned earlier.  It's probably an oversimplification --  
it entirely leaves the issue of defining what syntactic, semantic,  
and pragmatic assurances a protocol makes up to human-readable  
documentation, and attempts no enforcement of such promises; the  
reason I posit this hypothesis now is to sidestep the long,  
interesting and difficult discussion of that issue, and focus on  
adaptation and registration.

Second simplification: let's ignore inheritance. Again, I'm not  
advocating this as a pratical approach: the natural choice, when  
looking for an adaptation of type T to protocol P, is instead, no  
doubt, to walk T's MRO, looking for the adaptation of each of T's  
bases, and stop when the first one is met, and this has many  
advantages (but also some issues, partly due to the fact that  
sometimes inheritance is used more for implementation purposes than  
for Liskovian purity, etc, etc).  In the following, however, I'll  
just assume that "adaptation is not inherited" -- just as a  
simplification.  Similarly, I will assume "no transitivity":  
adaptation in the following is assume to go from a type T to a  
protocol P (the latter identified by a unique string), in a single  
step which either succeeds or fails, period. The advantages of  
transitivity, like those of inheritance, are no doubt many, but they  
can be discussed separately. I do not even consider the possibility  
of having an inheritance structure among protocols, even though it  
might be very handy; etc, etc.

So, each entity which we can call a "registration of adaptation" (ROA  
for short) is a tuple ( (T, P), A) where:
     T is a type;
     P is a unique string identifying a protocol;
     A is a callable, such that, for any direct instance t of T  
(i.e., one such that type(t) is T), A(t) returns an object (often t  
itself, or a wrapper over t) which claims "I satisfy all the  
constraints which, together, make up protocol P".  A(t) may also  
raise some exception, in which case the adaptation attempt fails and  
the exception propagates.

For example, A may often be the identity function:

def identity(x):
     return x

where a ROA of ((T, P), identity) means "all direct instances of T  
satisfy protocol P without need for wrappers or whatever".

The adaptation registry is a mapping from (T, P) to A, where each ROA  
is an item -- (T, P) the key, A the value.  The natural  
implementation would of course be a simple dict.

Consider for example the hypothetical protocol P  
'com.python.stdlib.index'.  We could define it to mean: an object o  
satisfies P iff int(o) is directly usable as index into a sequence,  
with no loss of information.

So, a sequence type's __getitem__ would be...:
     def __getitem__(self, index):
         adapted_index = adapt(index, 'com.python.stdlib.index')
         i = int(adapted_index)
         # continue indexing using the int 'i', which is asserted to  
be the correct index equivalent of argument 'index'

instead of today's (roughly):

     def __getitem__(self, index):
         i = index.__index__()
         # etc, as above

Whoever (python-dev, in this case;-) invents the protocol P, besides  
using it in methods such as __getitem__, might also pre-register  
adapters (often, identity) for the existing types which are known to  
satisfy protocol P.  I.e., during startup, it would execute:

register_adapter(int, 'com.python.stdlib.index', identity)
register_adapter(long, 'com.python.stdlib.index', identity)

This is similar to what the inventor would do today (inserting an  
__index__, at Python level, or tp_index, at C level, into the types)  
but not "invasive" of the types (which doesn't matter here, since the  
protocol's inventor also owns the types in question).

Once the protocol P is published, the author of a type T which does  
respect/satisfy P would also execute register_adapter calls (instead  
of modifying T by adding __index__/tp_index).  Say that instances of  
T do already satisfy the protocol, then identity is the correct adapter:

register_adapter(T, 'com.python.stdlib.index', identity)

But say that some T1 doesn't quite satisfy the protocol but can  
easily be adapted to it.  For example, instances t of T1 might not  
support int(t), or support it in a way that's not correct for the  
protocol, but (again for example) T1 might offer an existing as_index  
method that would do what's required.  Then, a non-identity adapter  
is needed:

class wrapT1(object):
     def __init__(self, t):
         self.t = t
     def __int__(self):
         return self.t.as_index()

register_adapter(T1, 'com.python.stdlib.index', wrapT1)

The big deal here is that protocol P and type T1 may have been  
developed independently and separately, and yet a third-party author  
can still author and register such a wrapT1 adapter, and as long as  
that startup code has executed, the application will be able to use  
T1 instances as indexes into sequences "transparently".  The author  
of the application code need not even be aware of the details: he or  
she may just choose to import a third-party module  
"implement_T1_adaptations.py" just as he or she imports the framework  
supplying T1 and the one(s) using protocol P.  This separation of  
concerns into up to 4 groups of developers (authors, respectively,  
of: a framework defining/using protocol P; a framework supplying type  
T1; a framework adapting T1 to P; an application using all of the  
above) always seems overblown for "toy" examples that are as simple  
as this one, of course -- and even in the real world in many cases a  
developer will be wearing more than one of these four hats.  But  
adaptation *allows* the separation of "ownership" concerns by  
affording *non-invasive* operation.

Here is a simple reference implementation of adaptation under all of  
these simplifying assumptions:

_global_registry = {}

def register_adapter(T, P, A, registry=_global_registry):
     registry[T, P] = A

def adapt(t, P, registry=_global_registry):
     return registry[type(t), P]

Now, a million enrichments and optimizations can obviously be  
imagined and discussed: the many simplifying assumptions I've been  
making to try to isolate adaptation and registration down to its  
simplest core leave _ample_ space for a lot of that;-).  But I hope  
that instead of immediately focusing on (premature?-) optimizations,  
and addition of functionality of many kinds, we can focus on the  
salient points I've been trying to make:

a. protocols may be identified quite arbitrarily (e.g. by unique- 
strings), though specific formalizations are also perfectly possible  
and no doubt offer many advantages (partial error-checking, less  
indirectness, ...): there is no strict need to formalize interfaces  
or protocols in order to add protocol-adaptation to Python

b. the possibility of supplying, consuming and adapting-to protocols  
becomes non-invasive thanks to registration

In this simplified outline I've supported *only* registration as the  
one and only way to obtain adaptation -- conceptually, through the  
identity function, registration can indeed serve the purpose,  
although optimizations are quite obviously possible.

I should probably also mention what I mean by "protocol": it's a very  
general term, potentially encompassing interfaces (sets of methods  
with given signatures -- a "syntactic" level), design-by-contract  
kinds of constraints (a "semantic" level), and also the fuzzier but  
important kinds of constraints that linguists call "pragmatic" (for  
example, the concept of "integer usable as an index into a sequence  
without significant loss of information" is definitely a pragmatic  
constraint, not formalizable as semantics).

Alex