Yet Another PEP: Query Protocol Interface or __query__
Carlos Alberto Reis Ribeiro
cribeiro at mail.inet.com.br
Tue Mar 27 12:19:15 EST 2001
Warning: this is a long message, but I think it's worth the time :-)
It's good to know that my Adapter/Proxy implementation was useful. I'm
quite surprised also, because I'm relatively new to Python, but I found
this technique to be extremely flexible while keeping code simple and
readable ("pythonically correct" :-) I dreamed of similar things in my
Object Pascal/C++ days...
As for your example: the proposal seems to be moving toward to a more
intelligent protocol adapter. I think it is better to have 'dumb' adapters
(in fact as dumb as possible), and helper methods to do the adaptation. In
fact i remember some thoughts by JvR and Alex Martelli about a related
topic - what to put inside constructors, and why it's good practice to use
class factories. So I'm advocating the following definitions, with a
thorough explanation following:
1) the "protocol specification" (which may be given as a class,
interface, or simply as a list of methods);
2) the "protocol instance", which may be any Python object that
conforms to the protocol specification;
3) the "adapter", or "proxy", which is a special Python object
that implements *only* the protocol specification, and
nothing more. The adapter hides all details; the adapted
object is "opaque".
4) the "protocol identifier" is a unique string that can be used
whenever a protocol specification is desired.
In the end we talk a little bit about some efficiency issues I'm worried about.
-----
1) What is the protocol specification
You see that I'm using the word "protocol" in a very broad meaning here. A
protocol is an "abstract" beast, and it may be specified by a number of ways.
1.1) We can use a class to specify the protocol. For instance, the UserDict
class can be used as the specification of the generic dictionary protocol.
1.2) We can use a interface to specify the protocol. In fact I dont see
much difference between an "abstract class" defined with stub methods and a
interface declaration. There are a few points that PEP245 is already
tackling, such as different behavior when inheriting class and implementing
interfaces. Also note that interfaces are a good point to check for method
signatures (parameter types and so on).
1.3) A list of methods is a very simple and effective way to describe a
interface. In fact, while we don't have any synctatic support for
interfaces, I think that a simple list of strings is the best approach for
protocol specification. It allows us to focus on the adaptation layer,
while leaving the rest of the discussion for the types-sig and PEP245 guys
<wink>.
It is relatively easy to convert between these three formats. I'm thinking
about ways to support all the three approaches in my adapter/proxy module.
I have done some work on it already; I'll post it soon.
-----
2) What is the protocol instance
2.1) Any Python object that implements a protocol is a protocol instance.
In this sense, any dictionary - built in or inherited from UserDict - is an
instance of the dictionary protocol.
2.2) Using this definition, adapt() job is to *return a protocol instance*.
It may be the object itself (returning self in the __adapt__ module); it
may be another generic object that handles the adaptation between the
desired interface and the one implemented in the object; or it may be a
special adapter built on the fly.
2.3) The code that I have running today is equivalent to the last case as
outlined above - it is the special "adapter/proxy" object.
-----
3) What is an "adapter/proxy" object
It is possible to build a new object in Python, on the fly, to act as a
proxy to another object.
3.1) This proxy object does not have methods of it's own. Instead of having
methods bound to itself, all the methods it contains are bound to other
objects. So the object ends up acting as a transparent proxy for other objects.
3.2) The transparent proxy is as efficient as possible, because there are
no intermediate calls. All the calls go directly to the target object.
3.3) The resulting proxy object is opaque. It hides implementations details
from the user. This is considered to be a good practive in object-oriented
programming.
3.4) A generic adapter factory function can be used to build a new adapter
from the "protocol specification", given in any of the three ways described
in (1.1), (1.2) or (1.3). In fact, this is what my helper funciont
currently does.
3.5) I would call "proxy" objects of this type that dont implement any
behavior of their own. "Adapters" may use similar techniques, but they can
implement supplementary methods. In some cases the "adapter" behavior may
be needed, but in *most* cases, I think that the proxy approach is sufficient.
I believe that proxy objects built this way are the best answer for "adapt"
calls, also for security reasons. I'll give an example:
Suppose that you have a class to handle user account profiles. You have two
interfaces - one is the "public" one, and the other is the "administrator"
interface. Your class could return an interface for public code to give
visibility only to the public functions. To get to administrative
interface, you must "log in" in some meaningful way. However, if you just
return "self" in any case, it is fairly easy for "public authorized" code
to figure out how to access the administrative functions, because they're
all visible by anyone. Worse - this can happen by mistake.
What is the advantage of using interfaces to do that, instead of using, let
us say, two different classes? The main advantage that I see is that we can
keep the data syncronized between the two "protocol instances", because
they both are pointing to the same target object.
Note that using Python, it is possible to follow the pointers through the
method (using im_self) to get to the target object. However, this is not
immediatelly obvious, and it at least avoid that someone end up using wrong
methods by mistake. If *absolutely* needed, we could devise a way to deny
access to the im_* attributes of the method through a proxy object, but
this is beyond the scope of this discussion.
-----
4) What is a protocol identifier
The first draft if this PEP used strings as a parameter to specify the
protocol in the __adapt__ call. Somewhere along it's development, this has
been changed to a class parameter. However, it is not clear that this is
sufficiently flexible.
My proposal is to define a unique protocol identifier as a string with a
very-well defined format:
4.1) The protocol identifier is a unique string that identifies every protocol
4.2) The protocol identifier should be structured in a way that makes
parsing easier for automated tools. It also must be usable as means of
documentation inside the program. In the other hand, it must be kept simple
to avoi clutter, keeping the code clear and easy to understand. There are
two main proposals:
4.2.1) Using a DNS-like syntax. For example, org.python.UserDict;
4.2.2) Using a URL-like syntax. For example, python.org/protocol/userdict.xml.
The second approach has one advantage. It makes it possible to use the URL
returned to actually retrieve some meaningful description of the object. Of
course, this can't be mandatory, because not all applications are
web-enabled; but anyway it makes fairly easy for the developer to locate
and retrieve the documentation.
4.3) When presented by means of classes or interfaces (cases 1.1 and 1.2
above), we should define an attribute called __protocol__ to retrieve the
protocol identifier.
4.4) A 'protocol factory' could be implemented as a standard module. This
module would contain:
4.4.1) One dictionary of well-known "protocol specifications", indexed by
the "protocol identifier";
4.4.2) One function to automatically retrieve the XML representation of the
protocol and use it to generate a valid "protocol specification", that
could in turn be supplied to the adapter factory function.
4.4.3) One function to generate the XML representation of any give protocol
specification (ok, this is not exactly a factory function, but is related
to the job).
-----
Efficiency issues
I'm a little bit afraid of handling so much information and processing
inside __adapt___ methods. These methods can be very useful, and there is a
possibility to start using them a lot as part of a
'programming-by-contract' paradigm. There are some ways to solve this problem:
1) Building helper structures on the __init__ method to make for a faster
lookup inside __adapt__. For now I think that this is the best solution.
2) *If, and when* we get interfaces as part of the language, it is possible
to dream about some optimizations. Namely, interface method tables could be
implemented as direct access vectors instead of dictionaries. This can be
done if we think of interfaces as the "immutable" counterpart to "classes".
Carlos Ribeiro
More information about the Python-list
mailing list