[Python-Dev] Son of PEP 246, redux
Phillip J. Eby
pje at telecommunity.com
Thu Jan 13 03:57:07 CET 2005
This is a pretty long post; it starts out as discussion of educational
issues highlighted by Clark and Ian, but then it takes the motivation for
PEP 246 in an entirely new direction -- possibly one that could be more
intuitive than interfaces and adapters as they are currently viewed in
Zope/Twisted/PEAK etc., and maybe one that could be a much better fit with
Guido's type declaration ideas. OTOH, everybody may hate the idea and
think it's stupid, or if they like it, then Alex may want to strangle me
for allowing doubt about PEP 246 to re-enter Guido's head. Either way,
somebody's going to be unhappy. <wink>
At 05:54 PM 1/12/05 -0500, Clark C. Evans wrote:
> String -> PathName -> File
> String -> StringIO -> File
Okay, after reading yours and Ian's posts and thinking about them some
more, I've learned some really interesting things.
First, adapter abuse is *extremely* attractive to someone new to the
concept -- so from here on out I'm going to forget about the idea that we
can teach people to avoid this solely by telling them "the right way to do
it" up front.
The second, much subtler point I noticed from your posts, was that *adapter
abuse tends to sooner or later result in adapter diamonds*.
And that is particularly interesting because the way that I learned how NOT
to abuse adapters, was by getting slapped upside the head by PyProtocols
pointing out when adapter diamonds had resulted!
Now, that's not because I'm a genius who put the error in because I
realized that adapter abuse causes diamonds. I didn't really understand
adapter abuse until *after* I got enough errors to be able to have a good
intuition about what "as a" really means.
Now, I'm not claiming that adapter abuse inevitably results in a detectable
ambiguity, and certainly not that it does so instantaneously. I'm also not
claiming that some ambiguities reported by PyProtocols might not be
perfectly harmless. So, adaptation ambiguity is a lot like a PyChecker
warning: it might be a horrible problem, or it might be that you are just
doing something a little unusual.
But the thing I find interesting is that, even with just the diamonds I
ended up creating on my own, I was able to infer an intuitive concept of
"as a", even though I hadn't fully verbalized the concepts prior to this
lengthy debate with Alex forcing me to single-step through my thought
processes.
What that suggests to me is that it might well be safe enough in practice
to let new users of adaptation whack their hand with the mallet now and
then, given that *now* it's possible to give a much better explanation of
"as a" than it was before.
Also, consider this... The larger an adapter network there is, the
*greater* the probability that adapter abuse will create an ambiguity --
which could mean faster learning.
If the ambiguity error is easily looked up in documentation that explains
the as-a concept and the intended working of adaptation, so much the
better. But in the worst case of a false alarm (the ambiguity was
harmless), you just resolve the ambiguity and move on.
>Originally, Python may ship with the String->StringIO and
>StringIO->File adapters pre-loaded, and if my code was reliant upon
>this transitive chain, the following will work just wonderfully,
>
> def parse(file: File):
> ...
>
> parse("helloworld")
>
>by parsing "helloworld" content via a StringIO intermediate object. But
>then, let's say a new component "pathutils" registers another adapter pair:
>
> String->PathName and PathName->File
>
>This ambiguity causes a few problems:
>
> - How does one determine which adapter path to use?
> - If a different path is picked, what sort of subtle bugs occur?
> - If the default path isn't what you want, how do you specify
> the other path?
The *real* problem here isn't the ambiguity, it's that Pathname->File is
"adapter abuse". However, the fact that it results in an ambiguity is a
useful clue to fixing the problem. Each time I sat down with one of these
detected ambiguities, I learned better how to define sensible interfaces
and meaningful adaptation. I would not have learned these things by simply
not having transitive adaptation.
>| As I think these things through, I'm realizing that registered
>| adaptators really should be 100% accurate (i.e., no information loss,
>| complete substitutability), because a registered adapter that seems
>| pragmatically useful in one place could mess up unrelated code, since
>| registered adapters have global effects.
>
>I think this isn't all that useful; it's unrealistic to assume that
>adapters are always perfect. If transitive adaptation is even
>permitted, it should be unambiguous. Demanding that adaption is
>100% perfect is a matter of perspective. I think String->StringIO
>and StringIO->File are perfectly pure.
The next thing that I realized from your posts is that there's another
education issue for people who haven't used adaptation, and that's just how
precisely interfaces need to be specified.
For example, we've all been talking about StringIO like it means something,
but really we need to talk about whether it's being used to read or write
or both. There's a reason why PEAK and Zope tend to have interface names
like 'IComponentFactory' and 'IStreamSource' and other oddball names you'd
normally not give to a concrete class. An interface has to be really
specific -- in the degenerate case an interface can end up being just one
method. In fact, I think that something like 10-15% of interfaces in PEAK
have only one method; I don't know if it's that high for Zope and Twisted,
although I do know that small interfaces (5 or fewer methods) are pretty
normal.
What this also suggests to me is that maybe adaptation and interfaces are
the wrong solution to the problems we've been trying to solve with them --
adding more objects to solve the problems created by having lots of
objects. :)
As a contrasting example, consider the Dylan language. The Dylan concept
of a "protocol" is a set of generic functions that can be called on any
number of object types. This is just like an interface, but
inside-out... maybe you could call it an "outerface". :)
The basic idea is that a file protocol would consist of functions like
'read(stream,byteCount)'. If you implement a new file-like type, you "add
a method" to the 'read' generic function that implements 'read' for your
type. If a type already exists that you'd like to use 'read' with, you can
implement the new method yourself.
There are some important ramifications there. First, there's no
requirement to implement a complete interface; the system is already
reduced to *operations* rather than interfaces. Second, a different choice
of method names isn't a reason to need more interfaces and adapters. As
more implementations of some basic idea (like stream-ness) exist, it
becomes more and more natural to *share* common generic functions and put
them in the stdlib, even without any concrete implementation for them,
because they now form a standard "meeting point" for other libraries.
Third, Ka-Ping Yee has been arguing that Python should be able to define
interfaces that contain abstract implementation. Well, generic functions
can actually *do* this in a straightforward fashion; just define the
default implementation of that operation as delegating to other
operations. There still needs to be some way to "bottom out" so you don't
end up with endless recursive delegation -- although you could perhaps just
catch the recursion error and inspect the traceback to tell the user, "must
implement one of these operations for type X". (And this could perhaps be
done automatically if you can declare that this delegating implementation
is an "abstract method".)
Fourth, and this is *really* interesting (but also rather lengthy to
explain)... if all functions are generic (just using a fast-path for the
nominal case of only one implementation), then you can actually construct
adapters automatically, knowing precisely when an operation is "safe".
Let me explain. Suppose that we have a type, SomeType. It doesn't matter
if this type is concrete or an interface, we really don't care. The point
is that this type defines some operations, and there is an outside
operation 'foo' that relies on some set of those operations.
We then have OtherType, a concrete type we want to pass to 'foo'. All we
need in order to make it work, is *extend the generic functions in SomeType
with methods that take a different 'self' type*! Then, the operation
'adapt(instOfOtherType,SomeType)' can assemble a simple proxy containing
methods for just the generic functions that have an implementation
available for OtherType.
The result of this is that now any type can be the basis for an interface,
which is very intuitive. That is, I can say, "implement file.read()" for
my object, and somebody who has an argument declared as "file" will be able
to use my object as long as they only need the operations I've
implemented. However, unlike using method names alone, we have unambiguous
semantics, because all operations are grounded in some fixed type or
location of definition that specifies the *meaning* of that operation.
Another benefit of this approach is that it lessens the need for transitive
adaptation, because over time people converge towards using common
operations, rather than continually reinventing new ones. In this
approach, all "adaptation" is endpoint to endpoint, but there are rarely
any actual adapters involved, unless a set of related operations actually
requires keeping some state. Instead, you simply define an implementation
of an operation for some concrete type.
I'm running out of time to explore this idea further, alas. Up to this
point, what I'm proposing would work *beautifully* for adaptations that
don't require the adapter to add state to the underlying object, and ought
to be intuitively obvious, given an appropriate syntax. E.g.:
class StringIO:
def read(self, bytes) implements file.read:
# etc...
could be used to indicate the simple case where you are conforming to an
existing operation definition. A third-party definition, of the same thing
might look like this:
def file.read(self: StringIO, bytes):
return self.read(bytes)
Assuming, of course, that that's the syntax for adding an implementation to
an existing operation.
Hm. You know, I think the stateful adapter problem could be solved too, if
*properties* were also operations. For example, if 'file.fileno' was
implemented as a set of three generic functions (get/set/delete), then you
could maybe do something like:
class socket:
# internally declare that our fileno has the semantics
# of file.fileno:
fileno: int implements file.fileno
or maybe just:
class socket implements file:
...
could be shorthand for saying that anything with the same name as what's in
'file' has the same semantics. OTOH, that could break between Python
versions if a new operation were added to 'file', so maybe as verbose as
the blow-by-blow declarations are, they'd be safer semantically.
Anyway, if we were a third party externally declaring the correspondence
between socket.fileno and file.fileno, we could say:
# declare how to get a file.fileno for a socket instance
def file.fileno.__get__(self: socket):
return self.fileno
Now, there isn't any need to have a separate "adapter" to store additional
state; with appropriate name mangling it can be stored in the unadapted
object, if you like.
This isn't a fully thought-out proposal; it's all a fairly
spur-of-the-moment idea. I've been playing with generic functions for a
while now, but only recently started doing any "heavy lifting" with
them. However, in one instance, I refactored a PEAK module from being 400+
lines of implementation (plus 8 interfaces and lots of adaptation) down to
just 140 lines implementation and one interface -- with the interface being
pure documentation. And the end result was more flexible than the original
code. So since then I've been considering whether adaptation is really the
be-all end-all for this sort of thing, and Clark and Ian's posts made me
start thinking about it even more seriously.
(One interesting data point: the number of languages with some kind of
pattern matching, "guards" or other generic function variants seems to be
growing, while Java (via Eclipse) is the only other system I know of that
has anything remotely like PEP 246.)
So maybe the *real* answer here is that we should be looking at solutions
that might prevent the problems that adapters are meant to solve, from
arising in the first place! Generic functions might be a good place to
look for one, although the downside is that they might make Python look
like a whole new language. OTOH, type declarations might do that anyway.
A big plus, by the way, of the generic function approach is that it does
away with the requirement for interfaces altogether, except as a semantic
grouping of operations. Lots of people dislike interfaces, and after all
this discussion about how perfect interface-to-interface adaptation has to
be, I'm personally becoming a lot less enamored with interfaces too!
In general, Python seems to like to let "natural instinct" prevail. What
could be more natural than saying "this is how to implement a such-and-such
method like what that other guy's got"? It ain't transitive, but if
everybody tends to converge on a common "other guy" to define stuff in
terms of (like 'file' in the stdlib), then you don't *need* transitivity in
the long run, except for fairly specialized situations like pluggable IDE's
(e.g. Eclipse) that need to dynamically connect chains between different
plugins. Even there, the need could be minimized by most operations
grounding in "official" abstract types. And abstract methods -- like a
'file.readline()' implementation for any object that supports 'file.read()'
-- could possibly take care of most of the rest.
Generic functions are undoubtedly more complex to implement than PEP 246
adaptation. My generic function implementation comprises 3323 lines of
Python, and it actually *uses* PEP 246 adaptation internally for many
things, although with more work it could probably do without it.
However, almost half of those lines of code are consumed by a mini-compiler
and mini-interpreter for Python expressions; a built-in implementation of
generic functions might be able to get away without having those parts, or
at least not so many of them. Also, my implementation supports full
predicate dispatch, not just multimethod dispatch, so there's probably even
more code that could be eliminated if it was decided not to do the whole
nine yards.
Back on the downside, this looks like an invitation to another "language
vs. stdlib" debates, since PEP 246 in and of itself is pure library. OTOH,
Guido's changing the language to add type declarations anyway, and generic
functions are an excellent use case for them. Since he's going to be
flamed for changing the language anyway, he might as well be hanged for a
sheep as for a goat. :)
Oh, and back on the upside again, it *might* be easier to implement actual
type checking with this technique than with PEP 246, because if I write a
method expecting a 'file' and somebody calls it with a 'Foo' instance, I
can maybe now look at the file operations actually used by the method, and
then see if there's an implementation for e.g. 'file.read' defined anywhere
for 'Foo'. And, comparable type checking algorithms are more likely to
already exist for other languages that include generic functions, than to
exist for PEP 246-style adaptation.
Okay, I'm really out of time now. Hate to dump this in as a possible
spoiler on PEP 246, because I was just as excited as Alex about the
possibility of it going in. But this whole debate has made me even less
enamored of adaptation, and more interested in finding a cleaner, more
intuitive way to do it.
More information about the Python-Dev
mailing list