[Python-3000] Generic functions

Tue Apr 4 10:02:19 CEST 2006

At 11:03 PM 4/3/2006, Ian Bicking wrote:
>Guido van Rossum wrote:
>>On 4/3/06, Ian Bicking <ianb at colorstudy.com> wrote:
>>>As an alternative to adaptation, I'd like to propose generic functions.
>>>   I think they play much the same role, except they are much simpler to
>>>use and think about.
>>Given that Phillip Eby is another proponent of generic functions I
>>seriously doubt the latter.

Hm.  :)

Rather than branch all over the map, let me focus for a moment on a 
very simple type of generic function - the kind that is essentially 
equivalent to PEP 246-style adaptation.  This will make it easier to 
see the relationship.

In the RuleDispatch package, these simple generic functions are 
defined using dispatch.on and f.when(), like this:

     import dispatch

     @dispatch.on('ob')  # name of the argument to dispatch based on
      def pprint(ob):
           """This is a pretty-print function"""

     @pprint.when(object)
     def pprint(ob):
          print repr(ob)

     @pprint.when(list)
      def pprint(ob):
           # code for the list case

Now, this is exactly equivalent to the much longer code that one 
would write to define an IPrettyPrintable interface with a pprint() 
method and adapter classes to define the implementation 
methods.  Yes, it's a convenient example - but it also corresponds to 
a fairly wide array of problems, that also happen to be a significant 
number of uses for adaptation.

As Ian mentioned, I wrote PyProtocols first, and discovered generic 
functions afterward.  In fact, 'dispatch.on()' and friends are 
actually implemented internally using PyProtocols adapters.  For 
operations that do not require any state to be saved across method 
invocations, and for interfaces with a small number of methods (e.g. 
just one!), generic functions of this type are a more compact and 
expressive approach.

The kicker is this: for this subset of generic functions, it doesn't 
matter whether you make generic functions primitive, or you make 
adaptation primitive, because either can be implemented in terms of 
the other.  For example, if you wanted to implement adaptation using 
generic functions, you would just use a function for each interface 
or protocol, and the "methods" of the function would be the adapter factories.

Anyway, this type of generic function is essentially similar to 
Python's tables today for things like pretty printing, pickling, and 
so on.  It can be viewed as merely a syntactic convenience coupled 
with standardization of how such an operation's registry will work, 
both for registration and lookups.

In essence, Python's standard library already has generic functions, 
they are just implemented differently for each operation that needs 
such a registry.

>>Watch it though. it may be a great example to explain generic
>>functions. But it may be the only example, and its existence may not
>>be enough of a use case to motivate the introduction of gneric
>>functions.

It's sufficiently general to encompass just about any "visitor" 
pattern.  I wrote a short article on this a while back:

http://peak.telecommunity.com/DevCenter/VisitorRevisited

>>Whoa! First of all, my gut reaction is already the same as for
>>adaptation: having a single global registry somehow feels wrong. (Or
>>is it not global? "internal" certainly sounds like that's what you
>>meant; but for methods this seems wrong, one would expect a registry
>>per class, or something like that.)

This confusion is due to Ian mixing methods and functions in the 
example.  Generic functions are *functions*.  If you put them in 
classes, Python makes them methods.  But there's nothing magical 
about that - it's still a function.  The VisitorRevisited article 
explains this better, and in a way that doesn't delve into the 
technicalities.  Also, its examples are actually working ones based 
on a fixed spec (i.e., the RuleDispatch implementation), so there's 
no handwaving to get in the way.

>>Next, I wonder what the purpose of the PrettyPrinter class is. Is it
>>just there because the real pprint module defines a class by that
>>name? Or does it have some special significance?
>
>It's there because it is matching the pprint module.  Also it holds 
>some state which is useful to keep separate from the rest of the 
>arguments, like the current level of indentation.

Note that this isn't really needed, nor necessarily ideal.  If I were 
really writing a pretty printer, I'd put indentation control in an 
output stream argument, which would allow me to reuse an 
IndentedStream class for other purposes.  It would then suffice to 
have a single pprint(ob,stream=IndentedStream(sys.stdout)) generic 
function, and no need for a PrettyPrinter class.

>>Are generic functions
>>really methods? Can they be either?
>
>They can be either.

They're really honest-to-goodness Python *functions* (with extra 
stuff in the function __dict__).  That means they behave like 
ordinary functions when it comes to being methods.  They can be 
instance, class, or static methods, or not methods at all.

>>Ah, the infamous "when" syntax again, which has an infinite number of
>>alternative calling conventions, each of which is designed to address
>>some "but what if...?" objection that might be raised.

RuleDispatch actually has a fixed number of when() signatures, and 
the one Ian gave isn't one of them.  Single-dispatch functions' 
when() takes a type or an interface, or a sequence of types or 
interfaces.  Predicate-dispatch functions take a predicate object, or 
a string containing a Python expression that will be compiled to 
create a predicate object.

None of RuleDispatch's when() decorators take keyword arguments at 
the moment, at least not in the way of Ian's example.

>>What does when(object=list) mean? Does it do an isinstance() check?
>
>Yes; I think RuleDispatch has a form (though I can't remember what 
>the form is -- it isn't .when()).

For simple single-dispatch, it's @some_function.when(type).  So in 
this case it'd be @pprint.when(list).

>>Is there any significance to the name pformat_list? Could I have
>>called it foobar? Why not just pformat?
>
>Just for tracebacks, and for example to make it greppable.

Actually, there's one other reason you might want a separate name, 
and that's to reuse the code in an explicit upcall.  You could then 
explicitly invoke pformat_list() on something that wasn't an instance 
of the 'list' type.  For example, you could just call 
"pprint.when(SomeListLikeType)(pformat_list)" to register 
pformat_list as the method to be called for SomeListLikeType, as well 
as for 'list' and its subclasses.

If it were just called 'pformat', there'd be no name by which you 
could access the original function.  But you *are* allowed to call it 
whatever you want.  If it's called 'pformat' in this case, you just 
lose access to it; the name 'pformat' remains bound to the overall 
generic function, rather than to the specific implementation.

>>>* It requires cooperation from the original function (pformat -- I'm
>>>using "function" and "method" interchangably).
>>Thereby not helping the poor reader who doesn't understand all of this
>>as well as you and Phillip apparently do.

Right.  Don't interchange them, they have different meanings.  A 
generic function is just a function.  It's not a method unless you make it one.

Generic functions can *have* methods, however.  Each *implementation* 
for a generic function, like each reducer in the pickling registry, 
is called a "method" *of* the generic function.  This terminology is 
lifted straight from Lisp, but I'm not attached to it.  If anybody 
has a better terminology, in fact, I'm all for it!

In CLOS, by the way, there are no object methods, only generic 
function methods.  Rather than add methods to classes, you add them 
to the generic functions.  As with adaptation, this is 100% 
equivalent in computational terms.

What's *not* equivalent is the user interface.  If you are writing a 
closed-ended program in any language, it really doesn't matter 
whether the objects have methods or the functions do.  You're just 
writing a program, so how you organize it is a matter of taste.

However, if you are writing an extensible library, then generic 
functions (especially multiple-dispatch ones) have a *tremendous* 
advantage, because you aren't forced to pick only one way to 
expand.  And if you don't have generic functions in your language, 
you'll just reinvent them in your library -- like with pickle and 
copy_reg in the Python stdlib.

The advantage that they offer is twofold:  1) you can add new 
operations that cut across all types, existing or future.  2) you can 
add new types that can work with any operation, existing or future.

Side note: Ruby dodges this problem by making classes 
open-ended.  Since you can add a new __special__ method to any 
existing class (even one you didn't write, including built-in types), 
you don't need to have operation-specific type lookup registries (ala 
pickle/copy_reg), and for that matter you don't need adaptation!  The 
only problem you might run into is namespace clashes, and I'm not 
sure how Ruby addresses this.

Anyway, there are lots of ways to skin these cats, the main 
difference is in how the user sees things and expresses their 
ideas.  If you are creating a closed system used by a single 
developer, you don't need any of this.  But if you need an extensible 
library, you need a mechanism for extension that allows third parties 
to implement operation A for type B, even if they did not create the 
libraries containing A and B.  Adaptation, generic functions, and 
Ruby-style open classes are all computationally-equivalent mechanisms 
for doing this, in that you could translate a given library to use 
any of the three techniques.

>>I'm guessing that you are contrasting it with len(), which could be
>>seen as a special kind of built-in "generic function" if one squints
>>enough, but one that requires the argument to provide the __len__
>>magic method. But since len() *does* require the magic method, doesn't
>>that disqualify it from competing?

You could view this as being a generic function with only one method:

     @len.when(object)
      def len(ob):
            return ob.__len__()

And which of course is not extensible.  :)

>Yes, this is in contrast with len(), which achieves its goal because 
>the people who write the len() function write the entire language, 
>and can put a __len__ on whatever they want ;)  For other cases 
>magic-method based systems tend to look like:
>
>def pprint(object):
>     if isinstance(object, list): ...
>     elif isinstance(object, tuple): ...
>     ...
>     elif hasattr(object, '__pprint__'):
>         object.pprint()
>     else:
>         print repr(object)
>
>That is, all the built in objects get special-cased and other 
>objects define a magic method.

Yeah, and then there are all the libraries like pydoc that have these 
huge if-else trees and aren't extensible because they don't even have 
a magic method escape or any kind of registry you can extend.  A 
uniform way to do this kind of dispatching means uniform 
extensibility of libraries.

>>>* The function is mostly self-describing.
>>Perhaps once you've wrapped your head around the when() syntax. To me
>>it's all magic; I feel like I'm back in the situation again where I'm
>>learning a new language and I haven't quite figured out which
>>characters are operators, which are separators, which are part of
>>identifiers, and which have some other magical meaning. IOW it's not
>>describing anything for me, nor (I presume) for most Python users at
>>this point.

I'd be interested to know if you still feel that way after reading 
VisitorRevisited, since it doesn't do any hand-waving around what when() does.

>>Tell us more about the registration machinery. Revealing that (perhaps
>>simplified) could do a lot towards removing the magical feel.

My simple generic function implementation is actually implemented 
using adaptation - I create a dummy interface for the generic 
function, and then define adapters that return the implementation 
functions.  IOW, adapt(argument,dummy_interface) returns the function 
to actually call.  The actual generic function object is a wrapper 
that internally does something like:

       def wrapper(*args,**kw):
             return adapt(some_arg, dummy_interface)(*args, **kw)

However, if I were implementing a generic function like this "from 
scratch" today, I'd probably just make the function __dict__ contain 
a registry dictionary, and the body of the function would be more like:

       def wrapper(*args,**kw):
             for cls in type(some_arg).__mro__:
                  if cls in wrapper.registry:
                       return wrapper.registry[cls](*args, **kw)

Voila.  Simple generic functions.  Of course, the actual 
implementation today is complex because of support for classic 
classes and even ExtensionClasses, and having C code to speed it up, etc.

>>>* Magic methods do *not* have this import problem, because once you have
>>>an object you have all its methods, including magic methods.
>>Well, of course that works only until you need a new magic method.
>
>Yes, pluses and minuses ;)

Ruby of course solves this by letting you add the magic methods to 
existing classes, and Python allows this for user-defined types, 
although we look down on it as "monkey patching".

>>>Type-based generic functions and adaptation are more-or-less equivalent.
>>>   That is, you can express one in terms of the other, at least
>>>functionally if not syntactically.
>>Could you elaborate this with a concrete example?

Here's pprint() written as an interface and adaptation:

class IPPrintable(Interface):
      def pprint():
            """pretty print the object""

def pprint(ob):
       adapt(ob, IPPrintable).pprint()

In reality, this would be much more code, because I've left out the 
adapter classes to adapt each type and give it a pprint() method.

>>>Anyway, I think generic functions are very compatible with Python syntax
>>>and style, and Python's greater emphasis on what an object or function
>>>can *do*, as opposed to what an object *is*, as well as the use of
>>>functions instead of methods for many operations.  People sometimes see
>>>the use of functions instead of methods in Python as a weakness; I think
>>>generic functions turns that into a real strength.
>>Perhaps. The import-for-side-effect requirement sounds like a
>>showstopper though.

In practice, it doesn't happen that often, because you're either 
adding an operation to some type of yours, or adding a type to some 
operation of yours.  And in the case where you're adding operation A 
to type B and you own neither of the two -- you're still not 
importing for side-effect except in the sense that the code that 
defines the A(B) operation is part of your application anyway.  The 
case where there is some *third-party* module that implements A(B) is 
quite rare in my experience.

However, you could have such a scenario *now*, if "operation A" is 
"pickle", for example, and B is some type that isn't ordinarily 
picklable.  Such a situation requires importing for side effects now, 
if you're not the one who wrote the pickling operation for it.

In fact, it doesn't matter *what* approach you use to provide 
extensibility; if it allows user C to define operation A on type B, 
it requires user D to import C's code for side-effects if they want 
to use it.  This equally applies to adaptation and to Ruby's open 
classes, as it does to generic functions.  It's simply the logical 
consequence of having third-party extensibility!