[Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc.

Wed May 9 21:54:46 CEST 2007

On 5/1/07, Phillip J. Eby <pje at telecommunity.com> wrote:
> Comments and questions appreciated, as it'll help drive better explanations
> of both the design and rationales.  I'm usually not that good at guessing
> what other people will want to know (or are likely to misunderstand) until
> I get actual questions.

I haven't read it all yet. But my first comment is "This PEP is HUGE!"
922 lines. Is there any way you could shorten it or split it up in
more manageable chunks? My second comment is that there are to few
examples in the PEP.

> The API will be implemented in pure Python with no C, but may have
> some dependency on CPython-specific features such as ``sys._getframe``
> and the ``func_code`` attribute of functions.  It is expected that
> e.g. Jython and IronPython will have other ways of implementing
> similar functionality (perhaps using Java or C#).
>
>
> Rationale and Goals
> ===================
>
> Python has always provided a variety of built-in and standard-library
> generic functions, such as ``len()``, ``iter()``, ``pprint.pprint()``,
> and most of the functions in the ``operator`` module.  However, it
> currently:
>
> 1. does not have a simple or straightforward way for developers to
>     create new generic functions,

I think there is a very straightforward way. For example, a generic
function for token handling could be written like this:

    def handle_any(val):
        pass

    def handle_tok(tok, val):
       handlers = {
           ANY        : handle_any,
           BRANCH     : handle_branch,
           CATEGORY   : handle_category
       }
       try:
           return handlers[tok](val)
       except KeyError, e:
           fmt = "Unsupported token type: %s"
           raise ValueError(fmt % tok)

This is an idiom I have used hundreds of times. The handle_tok
function is generic because it dispatches to the correct handler based
on the type of tok.

> 2. does not have a standard way for methods to be added to existing
>     generic functions (i.e., some are added using registration
>     functions, others require defining ``__special__`` methods,
>     possibly by monkeypatching), and

When does "external" code wants to add to a generic function? In the
above example, you add to the generic function by inserting a new
key-value pair in the handlers list. If needed, it wouldn't be very
hard to make the handle_tok function extensible. Just make the
handlers object global.

> 3. does not allow dispatching on multiple argument types (except in
>     a limited form for arithmetic operators, where "right-hand"
>     (``__r*__``) methods can be used to do two-argument dispatch.

Why would you want that?

> The ``@overload`` decorator allows you to define alternate
> implementations of a function, specialized by argument type(s).  A
> function with the same name must already exist in the local namespace.
> The existing function is modified in-place by the decorator to add
> the new implementation, and the modified function is returned by the
> decorator.  Thus, the following code::
>
>      from overloading import overload
>      from collections import Iterable
>
>      def flatten(ob):
>          """Flatten an object to its component iterables"""
>          yield ob
>
>      @overload
>      def flatten(ob: Iterable):
>          for o in ob:
>              for ob in flatten(o):
>                  yield ob
>
>      @overload
>      def flatten(ob: basestring):
>          yield ob
>
> creates a single ``flatten()`` function whose implementation roughly
> equates to::
>
>      def flatten(ob):
>          if isinstance(ob, basestring) or not isinstance(ob, Iterable):
>              yield ob
>          else:
>              for o in ob:
>                  for ob in flatten(o):
>                      yield ob
>
> **except** that the ``flatten()`` function defined by overloading
> remains open to extension by adding more overloads, while the
> hardcoded version cannot be extended.

I very much prefer the latter version. The reason is because the
"locality of reference" is much worse in the overloaded version and
because I have found it to be very hard to read and understand
overloaded code in practice.

Let's say you find some code that looks like this:

    def do_stuff(ob):
        yield obj

    @overload
    def do_stuff(ob : ClassA):
        for o in ob:
            for ob in do_stuff(o):
                yield ob

    @overload
    def do_stuff(ob : classb):
        yield ob

Or this:

    def do_stuff(ob):
        if isinstance(ob, classb) or not isinstance(ob, ClassA):
            yield ob
        else:
            for o in ob:
                for ob in do_stuff(o):
                    yield ob

With the overloaded code, you have to read EVERY definition of
"do_stuff" to understand what the code does. Not just every definition
in the same module, but every definition in the whole program because
someone might have extended the do_stuff generic function.

What if they have defined a do_stuff that dispatch on ClassC that is a
subclass of ClassA? Good luck in figuring out what the code does.

With the non-overloaded version you also have the ability to insert
debug print statements to figure out what happens.

> For example, if someone wants to use ``flatten()`` with a string-like
> type that doesn't subclass ``basestring``, they would be out of luck
> with the second implementation.  With the overloaded implementation,
> however, they can either write this::
>
>      @overload
>      def flatten(ob: MyString):
>          yield ob
> or this (to avoid copying the implementation)::
>
>      from overloading import RuleSet
>      RuleSet(flatten).copy_rules((basestring,), (MyString,))

That may be great for flexibility, but I contend that it is awful for
reality. In reality, it would be much simpler and more readable to
just rewrite the flatten method:

    def flatten(ob):
        flat = (isinstance(ob, (basestring, MyString)) or
                not isinstance(ob, Iterable))
        if flat:
            yield ob
        else:
            for o in ob:
                for ob in flatten(o):
                    yield ob

Or change MyString so that it derives from basestring.

> Most of the functionality described in this PEP is already implemented
> in the in-development version of the PEAK-Rules framework.  In
> particular, the basic overloading and method combination framework
> (minus the ``@overload`` decorator) already exists there.  The
> implementation of all of these features in ``peak.rules.core`` is 656
> lines of Python at this writing.

I think PEAK is a great framework and that generic functions are great
for those who likes it. But I'm not convinced that writing multiple
dispatch functions the way PEAK prescribes is better than the any of
the currently used idioms.

I first encountered them when I tried fix a bug in the jsonify.py
module in TurboGears (now relocated to the TurboJSON package). It took
me about 30 minutes to figure out how it worked (including manual
reading). Had not PEAK style generic functions been used, it would
have taken me 2 minutes top.

So IMHO, generic functions certainly are useful for some things, but
not useful enough. Using them as a replacement for ordinary multiple
dispatch techniques is a bad idea.

-- 
mvh Björn