[Python-3000] pep 3124 plans

Mon Jul 30 21:45:33 CEST 2007

At 02:20 PM 7/30/2007 -0400, Jim Jewett wrote:
>On 7/21/07, Phillip J. Eby <pje at telecommunity.com> wrote:
>
> >... If you have to use @somegeneric.before and
> > @somegeneric.after, you can't decide on your own to add
> > @somegeneric.debug.
>
> > However, if it's @before(somegeneric...), then you can add
> > @debug and @authorize and @discount and whatever else
> > you need for your
> > application, without needing to monkeypatch them in.
>
>I honestly don't see any difference here.  @somegeneric.method implies
>that somegeneric is an existing object, and even that it already has
>rules for combining .before and .after; it can just as easily have a
>rule for combining arbitrary methods.

I don't understand what you're saying or how it relates to what I said above.

If you define a new kind of method qualifier (e.g. @discount), then 
all existing generic functions aren't suddenly going to grow a 
'.discount' attribute.  That's what the above discussion is about -- 
how you *access* qualifier decorators.

>If you're saying that @discount could include its own combination
>rules, then each method needs to repeat the boilerplate to pick apart
>the current decision tree.

Still don't understand you.  Method combination is done with a 
generic function called "combine_actions" which takes two arbitrary 
"method" objects and returns a new "method" representing their 
combination.  There is no boilerplate or picking anything apart.

>   The only compensating "advantage" I see is
>that the decision tree could be changed arbitrarily from anywhere,
>even as "good practice."  (Since my new @thumpit decorator would takes
>the generic as an argument, you won't see the name of the generic in
>my file; you might never see it there was iteration involved.)

Decision trees are generated from a flat collection of rules; they're 
not directly manipulated.  In the default implementation (based on 
Guido's prototype), the "tree" is just a big dictionary mapping 
tuples of types to "method" objects created by combining all the 
methods whose signatures are implied by that tuple of types.  It's 
also sparse, in that it doesn't contain type combinations that 
haven't been looked up yet.  So there isn't really any tree that you 
could "change" here.

There's just a collection of rules, where a rule consists of a 
predicate, a definition order, a "body" (function), and a method 
factory.  A predicate is a collection of possible signatures (e.g. 
the sequence of applicable types) -- i.e., an OR of ANDs.

To actually build a tree, rules are turned into a set of "cases", 
where each case consists of one signature from the rule's predicate, 
plus a method instance created using the signature, body, and 
definition order.  (Not all methods care about definition order, just 
ones like before/after.)

In the default engine (loosely based on Guido's prototype), these 
cases are merged by using combine_actions() on any cases with the 
same signature, and stored in a dictionary called the 
"registry".  The registry is built up incrementally as you add methods.

When you call the function, a type tuple is built and looked up in 
the cache.  If nothing is found in the cache, we loop over the 
*entire* registry, and build up a derived method, like this (actual 
code excerpt):

     try:
         f = cache[types]
     except KeyError:
         # guard against re-entrancy looking for the same thing...
         action = cache[types] = self.rules.default_action
         for sig in self.registry:
             if sig==types or implies(types, sig):
                 action = combine_actions(action, self.registry[sig])
         f = cache[types] = action
     return f(*args)

The 'self.rules.default_action' is to method objects what zero is to 
numbers -- the start of the summing.  Ordinarily, the default action 
is a NoMethodFound object -- a perfectly valid "method" 
implementation whose behavior is to raise an error.  All other method 
types have higher combination precedence than NoMethodFound, so it 
always sinks to the end of any combination of methods.

The relevant generic functions here are implies(), combine_actions(), 
and overrides() -- where combine_actions() calls overrides() to find 
out which action should override the other, and then returns 
overriding_action.override(overridden_action).

The overrides() relationship of two actions of the same type (e.g. 
two Around methods), is defined by the implies() relationship of the 
action signatures.  For Before/After methods, the definition order is 
used to resolve any ambiguity in the implies().

The .override() of a method is usually a new instance of the same 
method type, but with a "tail" that points to the overridden method, 
so that next_method will do the right thing.

There are more details than this, of course, but the point is that 
method combination is 100% orthogonal to the dispatch tree 
mechanism.  You can build any kind of dispatch engine you want, just 
by using combine_actions to combine the actions.  The action types 
themselves only need to know how to .override() a lower precedence 
method and .merge() with a same-precedence method.  And there needs 
to be an overrides() relationship defined between all pairs of method 
types, but in my current version of the implementation, overrides() 
is automatically transitive for any type-level relationship.

So if you define a type that overrides Around, then it also overrides 
anything that Around overrides.  So, for the most part you just say 
what types you want to override (and/or be overridden by), and maybe 
add a rule for how to compare two methods of your type (if the 
default of comparing by the implies() of signatures isn't sufficient).

The way that generic functions make this incredible orthogonality and 
flexibility possible is itself an argument for generic functions, 
IMO.  Certainly, it's a hell of an argument for implementing generic 
functions in terms of other generic functions, which is why I did 
it.  It beats the crap out of my previous implementation approaches, 
which had way too much coupling between method combination and 
tree-building and rules and cases and whatnot.

Separating these ideas into different functional/conceptual domains 
makes the whole thing easier to understand -- as long as you're not 
locked into procedural-implementation thinking.  If you want to think 
step-by-step, it's potentially a vast increase in complication.  On 
the other hand, it's like thinking about reference counting while 
writing Python code.  Sure, you need to drop down to that level every 
now and then, but it's a waste of time to think about it 90% of the 
time.  Being able to have a class of things that you *don't* think 
about is what makes Python a higher-level language than the C it's 
implemented with.

In the same way, generic functions are a higher-level version of OO 
-- you get to think in terms of a domain's abstract operations, like 
implication, overriding, and combination in this example.

The domain abstractions are not an "interface", nor are they methods 
or object types.  They're more like "concepts", except that the term 
"concept" has been abused to refer to much lower-level things that 
can attach to only one object within an operation.

The concept of implication is that there are imply-ers and imply-ees 
-- a role for each argument, each of which is an implicit interface 
or abstract object type.

In traditional OO and even interfaces, there are considerable limits 
to your ability to specify such partial interfaces and the 
relationships between them, forcing you to choose arbitrary and 
implementation-defined organization to put them in.  You then have to 
force-fit objects to have the right methods, because you didn't 
define an x.is_implied_by(y) relationship, only a x.implies(y) relationship.

Thing is, a *relationship* doesn't belong to one side or the other -- 
it's a *relationship*.  A third, independent thing.  Like a GF method.

In any program, these relationships already exist, and you still have 
to understand them.  They're just forced into whatever pattern the 
designer chose or had thrust upon them to make them fit the 
at-best-binary nature of OO methods, instead of called out as 
explicit relationships, following the form of the problem domain.

>I realize that subclasses are theoretically just as arbitrary, but
>they aren't in practice.

Right -- and neither are generic functions in normal usage.  The only 
reason you think that subclasses aren't arbitrary is because you're 
used to the ways that things get force-fitted into those 
relationships.  Whereas, with GF's, the program can simply model the 
application domain relationships, and you're going to know what 
patterns will follow because they'll reflect the application domain.

For example, if you see implies() and combine_actions() and 
overrides(), are you going to have any problems knowing when you see 
a type, whether these GF's might have methods for that type?  You'll 
know when to *look* for such a method, because you know what roles 
the arguments play in each GF.  If the type might play such a role, 
then you'll want to know *how* it plays that role in connection with 
specific collaborators or circumstances -- and you'll know what 
method implementations to look for.

It's ridiculously simple in practice, even though it sounds hard in 
theory.  That's the very problem in fact -- in neither subclassing 
nor GF's can you solve such problems *in theory*.  You can only solve 
them in *practice*, because it's only in the context of a specific 
program that you have any domain knowledge to apply -- i.e., 
knowledge about what general kinds of things the program is supposed 
to do and what general kinds of things it does them with.

If you have that general knowledge, it's just as easy to handle one 
organization as the other -- but the GF-based version gives you the 
option of having a module that defines lots of basic "kinds of things 
it's supposed to do" up front, so that you have an idea of how to 
understand the "things it does them with" when you encounter them.

>You can certainly say now that configuration specialization should be
>in one place, and that dispatching on parameter patterns like
>
>(*            # ignored
>, :int        # actual int subclass
>, :Container  # meets the Container ABC
>, 4<val<17.3  # value-specific rule
>)
>
>is a bad idea

But I *don't* say that.  What I say is that in practice, there are 
only a few natural places to *put* such a definition:

* near the definition of Container (or int, but that's a builtin in this case)

* near the definition of the generic function being overloaded

* in a "concern-based" grouping, e.g. an appropriate module that 
groups together matters for some application-domain concept.  (For 
example, an "ordering_policy" module might contain overrides for a 
variety of generic functions that relate to inventory, shipping, and 
billing, within the context of placing orders.)

* in an application-designated catchall location

Which of these locations is "best" depends on the overall size of the 
program.  A one-module program is certainly small enough to not need 
to pick one.  As a system gets bigger, some of the other usage 
patterns become more applicable.

>-- but whenever I look at an application from the
>outside, well-organized configuration data is a rare exception.

That may be -- but one enormous advantage of generic functions is 
that you can always relocate your method definitions to a different 
module or different part of the same module without affecting the 
meaning of the program, as long as all the destination modules are 
imported by the time you execute any of the functions.

In other words, if a program is messy, you can clean it up -- heck, 
it's potentially safer to do with an automatic refactoring tool, than 
other types of refactorings in Python.  (e.g., changing the signature 
of a 'foo()' method is difficult to do safely because you don't 
necessarily know whether two arbitrary methods *named* 'foo' are 
semantically the same, whereas generic functions are objects, not names.)