Yet Another PEP: Query Protocol Interface or query

Tue Mar 27 12:19:15 EST 2001

Warning: this is a long message, but I think it's worth the time :-)

It's good to know that my Adapter/Proxy implementation was useful. I'm 
quite surprised also, because I'm relatively new to Python, but I found 
this technique to be extremely flexible while keeping code simple and 
readable ("pythonically correct" :-) I dreamed of similar things in my 
Object Pascal/C++ days...

As for your example: the proposal seems to be moving toward to a more 
intelligent protocol adapter. I think it is better to have 'dumb' adapters 
(in fact as dumb as possible), and helper methods to do the adaptation. In 
fact i remember some thoughts by JvR and Alex Martelli about a related 
topic - what to put inside constructors, and why it's good practice to use 
class factories. So I'm advocating the following definitions, with a 
thorough explanation following:

1) the "protocol specification" (which may be given as a class,
    interface, or simply as a list of methods);

2) the "protocol instance", which may be any Python object that
    conforms to the protocol specification;

3) the "adapter", or "proxy", which is a special Python object
    that implements *only* the protocol specification, and
    nothing more. The adapter hides all details; the adapted
    object is "opaque".

4) the "protocol identifier" is a unique string that can be used
    whenever a protocol specification is desired.

In the end we talk a little bit about some efficiency issues I'm worried about.

-----
1) What is the protocol specification

You see that I'm using the word "protocol" in a very broad meaning here. A 
protocol is an "abstract" beast, and it may be specified by a number of ways.

1.1) We can use a class to specify the protocol. For instance, the UserDict 
class can be used as the specification of the generic dictionary protocol.

1.2) We can use a interface to specify the protocol. In fact I dont see 
much difference between an "abstract class" defined with stub methods and a 
interface declaration. There are a few points that PEP245 is already 
tackling, such as different behavior when inheriting class and implementing 
interfaces. Also note that interfaces are a good point to check for method 
signatures (parameter types and so on).

1.3) A list of methods is a very simple and effective way to describe a 
interface. In fact, while we don't have any synctatic support for 
interfaces, I think that a simple list of strings is the best approach for 
protocol specification. It allows us to focus on the adaptation layer, 
while leaving the rest of the discussion for the types-sig and PEP245 guys 
<wink>.

It is relatively easy to convert between these three formats. I'm thinking 
about ways to support all the three approaches in my adapter/proxy module. 
I have done some work on it already; I'll post it soon.

-----
2) What is the protocol instance

2.1) Any Python object that implements a protocol is a protocol instance. 
In this sense, any dictionary - built in or inherited from UserDict - is an 
instance of the dictionary protocol.

2.2) Using this definition, adapt() job is to *return a protocol instance*. 
It may be the object itself (returning self in the __adapt__ module); it 
may be another generic object that handles the adaptation between the 
desired interface and the one implemented in the object; or it may be a 
special adapter built on the fly.

2.3) The code that I have running today is equivalent to the last case as 
outlined above - it is the special "adapter/proxy" object.

-----
3) What is an "adapter/proxy" object

It is possible to build a new object in Python, on the fly, to act as a 
proxy to another object.

3.1) This proxy object does not have methods of it's own. Instead of having 
methods bound to itself, all the methods it contains are bound to other 
objects. So the object ends up acting as a transparent proxy for other objects.

3.2) The transparent proxy is as efficient as possible, because there are 
no intermediate calls. All the calls go directly to the target object.

3.3) The resulting proxy object is opaque. It hides implementations details 
from the user. This is considered to be a good practive in object-oriented 
programming.

3.4) A generic adapter factory function can be used to build a new adapter 
from the "protocol specification", given in any of the three ways described 
in (1.1), (1.2) or (1.3). In fact, this is what my helper funciont 
currently does.

3.5) I would call "proxy" objects of this type that dont implement any 
behavior of their own. "Adapters" may use similar techniques, but they can 
implement supplementary methods. In some cases the "adapter" behavior may 
be needed, but in *most* cases, I think that the proxy approach is sufficient.

I believe that proxy objects built this way are the best answer for "adapt" 
calls, also for security reasons. I'll give an example:

Suppose that you have a class to handle user account profiles. You have two 
interfaces - one is the "public" one, and the other is the "administrator" 
interface. Your class could return an interface for public code to give 
visibility only to the public functions. To get to administrative 
interface, you must "log in" in some meaningful way. However, if you just 
return "self" in any case, it is fairly easy for "public authorized" code 
to figure out how to access the administrative functions, because they're 
all visible by anyone. Worse - this can happen by mistake.

What is the advantage of using interfaces to do that, instead of using, let 
us say, two different classes? The main advantage that I see is that we can 
keep the data syncronized between the two "protocol instances", because 
they both are pointing to the same target object.

Note that using Python, it is possible to follow the pointers through the 
method (using im_self) to get to the target object. However, this is not 
immediatelly obvious, and it at least avoid that someone end up using wrong 
methods by mistake. If *absolutely* needed, we could devise a way to deny 
access to the im_* attributes of the method through a proxy object, but 
this is beyond the scope of this discussion.

-----
4) What is a protocol identifier

The first draft if this PEP used strings as a parameter to specify the 
protocol in the __adapt__ call. Somewhere along it's development, this has 
been changed to a class parameter. However, it is not clear that this is 
sufficiently flexible.

My proposal is to define a unique protocol identifier as a string with a 
very-well defined format:

4.1) The protocol identifier is a unique string that identifies every protocol

4.2) The protocol identifier should be structured in a way that makes 
parsing easier for automated tools. It also must be usable as means of 
documentation inside the program. In the other hand, it must be kept simple 
to avoi clutter, keeping the code clear and easy to understand. There are 
two main proposals:

4.2.1) Using a DNS-like syntax. For example, org.python.UserDict;

4.2.2) Using a URL-like syntax. For example, python.org/protocol/userdict.xml.

The second approach has one advantage. It makes it possible to use the URL 
returned to actually retrieve some meaningful description of the object. Of 
course, this can't be mandatory, because not all applications are 
web-enabled; but anyway it makes fairly easy for the developer to locate 
and retrieve the documentation.

4.3) When presented by means of classes or interfaces (cases 1.1 and 1.2 
above), we should define an attribute called __protocol__ to retrieve the 
protocol identifier.

4.4) A 'protocol factory' could be implemented as a standard module. This 
module would contain:

4.4.1) One dictionary of well-known "protocol specifications", indexed by 
the "protocol identifier";

4.4.2) One function to automatically retrieve the XML representation of the 
protocol and use it to generate a valid "protocol specification", that 
could in turn be supplied to the adapter factory function.

4.4.3) One function to generate the XML representation of any give protocol 
specification (ok, this is not exactly a factory function, but is related 
to the job).

-----
Efficiency issues

I'm a little bit afraid of handling so much information and processing 
inside __adapt___ methods. These methods can be very useful, and there is a 
possibility to start using them a lot as part of a 
'programming-by-contract' paradigm. There are some ways to solve this problem:

1) Building helper structures on the __init__ method to make for a faster 
lookup inside __adapt__. For now I think that this is the best solution.

2) *If, and when* we get interfaces as part of the language, it is possible 
to dream about some optimizations. Namely, interface method tables could be 
implemented as direct access vectors instead of dictionaries. This can be 
done if we think of interfaces as the "immutable" counterpart to "classes".

Carlos Ribeiro

Yet Another PEP: Query Protocol Interface or __query__

Yet Another PEP: Query Protocol Interface or query