Method Underscores?

Alex Martelli aleaxit at yahoo.com
Thu Oct 21 12:11:04 CEST 2004


Chris S. <chrisks at NOSPAM.udel.edu> wrote:
   ...
> I realize that. My point is why?

I think both I and Andrew answered that: by stropping specialnames, any
new version of Python can add a specialname without breaking backwards
compatibility towards existing programs written normally and decently.

> Why is the default not object.len()? 
> The traditional object oriented way to access an object's attribute is
> as object.attribute.

It's one widespread way, dating from Simula, but there are others --
Smalltalk, arguably the first fully OO language, uses juxtaposition
("object message"), as does Objective-C; Perl currently uses -> (though
Perl 6 will use dots); and so on.  But this lexical/syntactical issue
isn't really very relevant, as long as we're talking *single-dispatch*
OO languages (a crucial distinction, of which, more later).

The main reason <builtin-name>(*objects) constructs exist is quite
different: in the general case, such constructs can try _several_ ways
to perform the desired operation, depending on various special methods
that the objects' classes might define.  The same applies to different
(infix or prefix) syntax sugar like, say, "a + b", which has exaclty the
same semantics as operator.add(a, b).

Consider the latter case, for example.  operator.add(a, b) is NOT the
same thing as a.__add__(b).  It does first try exactly that, _if_
type(a) defines a specialmethod __add__ [[net of coercion issues, which
are a complication we can blissfully ignore, as they're slowly fading
away in the background, thanks be]].  But if type(a) does not define
__add__, or if the call to type(a).__add__(a, b) returns NotImplemented,
then operator.add(a, b) continues with a second possibility: if type(b)
defines __radd__, then type(b).__radd__(b, a) is tried next.

If only type(a) was consulted, there would be either restrictions or
_very_ strange semantics.  The normal approach would result in
restrictions: I could not define a new type X that knows what it means
to "add itself to an int" on either side of the + sign.  Say that N was
an instance of said new type X: then, trying 23+N (or operator.add(23,
N), same thing) would call int.__add__(23, N), but int being an existing
type and knowing nothing whatsoever about X would have to refuse the
responsibility, so 23+N would fail.  The alternative would be to make
EVERY implementation of __add__ all over the whole world responsible for
delegating to "the other guy's __radd__" in case it doesn't know what to
do -- besides the boilerplate of all those 'return
other.__radd__(self)', this also means hardwiring every single detail of
the semantics of addition forevermore -- no chance to add some other
possibility or enhancement in the future, ever, without breaking
backwards compatibility each and every time.  If that had been the path
taken from day one, we could NOT "blissfully ignore" coercion issues,
because that was the original approach -- every single implementation of
__add__ would have to know about coercion and apply it, and it would
never be possible to remove or even de-emphasize coercion in future
versions of the language without a major backwards-incompatible jump
breaking just about all code existing out there (tens of millions of
lines of good working Python).

The issue is even stronger for some other operations, such as
comparisons.  It used to be that (net of coercion etc) a<b (or
equivalently operator.lt(a, b)) meant essentially:
    a.__cmp__(b) < 0

But then more specific specialmethods were introduced, such as __lt__
and __gt__ -- so, now, a<b means something like:
    if hasattr(type(a), '__lt__'):
        result = type(a).__lt__(a, b)
        if result is not NotImplemented: return result
    if hasattr(type(b), '__gt__'):
        result = type(b).__gt__(b, a)
        if result is not NotImplemented: return result
    if hasattr(type(a), '__cmp__'):
and so on (implementation is faster, relying on 'method slots' computed
only when a type is built or modified, but, net of dynamic lookups, this
is basically an outline of the semantics).

This is a much better factoring, since types get a chance to define some
_specific_ comparisons without implying they're able to do _all_ kinds
of comparisons.  And the migration from the previous semantics to the
current one, without breaking backwards compatibility, was enabled only
by the idea that specialnames are stropped as such.  An existing type
might happen to define a method 'lt', say, having nothing to do with
comparisons but meaning "amount of liters" or something like that, and
that would be perfectly legitimate since there was nothing special or
reserved about the identifier 'lt'.  If the special method for 'less
than' comparison was looked for with the 'lt' identifier, though, there
would be an accidental collision with the 'lt' method meaning something
quite unrelated -- and suddenly all comparisons involving instances of
that type would break, *SILENTLY!!!* (the worst kind of breakage),
returning results quite unrelated to the semantics of comparison.
*SHUDDER*.

Many of these issues have to do with operations between two objects,
often ones that can be indicated by special syntax (infix) or by calling
functions in module operator, but not always (consider 'divmod', for
example; there's no direct infix-operator syntax for that; or
three-arguments 'pow', ditto).  Indeed, in good part one could see the
problem as due to the fact that Python, like most OO languages, does
SINGLE dispatching: the FIRST argument (often written to the left of a
dot) plays a special and unique role, the other arguments "just go along
for the ride" unless special precautions are taken.  An OO language
based on multiple dispatching (generic functions and multimethods, for
example, like Dylan) would consider and try to match the types of ALL
arguments in the attempt to find the right specific multimethod to call
in order to compute a given generic function call [[don't confuse that
with C++'s "overloads" of functions, which are solved at compiletime,
and thus don't do _dispatching_ stricto sensu; here, we _are_ talking
about dispatching, which happens at runtime based on the runtime types
of objects, just like the single-dispatching of a C++'s "virtual"
method, the single-dispatching of Java, Python, etc etc]].

So, one might say that much of the substance of Python's approach is a
slightly ad-hoc way to introduce a modest amount of multiple dispatch in
what is basically a single-dispatch language.  To some extent, there is
truth in this.  However, it does not exhaust the advantages of Python's
approach.  Consider, for example, the unary function copy.copy.  Since
it only takes one argument, it's not an issue of single versus multiple
dispatch.  Yet, it can and does try multiple possibilities, in much the
same spirit as the above sketch for operator.lt!  Since copy.py is
implemented in Python, I strongly suggest you read its sources in
Python's standard library and see all it does -- out of which, calling
type(theobject).__copy__(theobject) is just one of many possibilities.

Again, not all of these possibilities existed from day one.  You can
download and study just about all Python versions since 1.5.2 and maybe
earlier and see how copy.py evolved over the years -- always without
breaking backwards compatibility.  I think it would be instructive.  If
the names involved had not been specially stropped ones, the smooth and
backwards compatible functional enrichment could not have happened.

Of course, one MIGHT single out some specific functionality and say
"this one is and will forever remain unary (1-argument), never needed
the multiple-possibilities idea and never will, so in THIS one special
case we don't need the usual pattern of 'foo(x) tries
type(x).__foo__(x)' and we'll use x.foo() instead".  Besides the risk of
misjudging ("ooops it DOES need an optional second argument or different
attempts, now what? we're hosed!"), there is the issue of introducing an
exception to the general rule -- one has to learn by rote which
operations are to be involved as x.foo() and which ones as foo(x) or
other special syntax, prefix or infix.  There is one example of a
special method (of relatively recent vintage, too) implicitly invoked
but not _named_ as such, namely 'next'; Guido has publically repented of
his design choice in that case -- it's a _wart_, a minor irregularity in
a generally very regular language, and as such may hopefully be remedied
next time backwards compatibilities may be introduced (in the
transition, a few years from now, from 2.* to 3.0).

> For those that are truly attached to the antiquated 
> C-style notation, I see no reason why method(object) and object.method()
> cannot exist side-by-side if need be.

The implementation of a builtin named 'method' (or sometimes some
equivalent special syntax) is up to Python itself: what exactly it does
or try can be changed, carefully, to enhance the language without
backwards incompatibility.  The implementation of a method named
'method' (or equivalently '__method__') in type(obj) is up to whoever
(normally a Python user) codes type(obj).  It cannot be changed
retroactively throughout all types while leaving existing user code
backwards-compatible.

It's hard to make predictions, especially about the future, but it's
always possible that we may want to tweak the semantics of a builtin
method (or equivalent special syntax) in the future.  Say it's
determined by carefully double-blind empirical studies that a common
error in Python 2.8 is for programmers to ask for len(x) where x is an
iterator (often from a built-in generator expression &c) which _does_
expose a __len__ for the sole purpose of allowing acceletation of
list(x) and similar operations; we'd like to raise a LenNotAvailable
exception to help programmers diagnose such errors.  Thanks to the
existence of built-in len, it's easy; the 'len' built-in becomes:
    if hasattr(type(x), '__length_not_measurable__'):
        raise LenNotAvailable
    if not hasattr(type(x), '__len__'):
        raise TypeError
    return type(x).__len__(x)
or the like.  All existing user-coded type don't define the new flag
__length_not_measurable__ and thus are unaffected and stay backwards
compatible; generator expressions or whatever we want to forbid taking
the len(...) of sprout that flag, so len(x) raises when we need it to.
((or we could have a new specialmethod __len_explicitly_taken__ to call
on explicit len(x) if available, preempting normal __len__, for even
greater flexibility -- always WITH backwards compatibility; in the
needed case that specialmethod would raise LenNotAvailable itself)).

Most likely len(x) will need no such semantics change, but why preclude
the possibility?  AND at the cost of introducing a gratuitous divergence
between specialmethods which will never need enhancements (or at least
will be unable to get them smoothly if they DO need them;-) and ones
which may ("richer" operations such as copying, hashing, serializing...
ones it would be definitely hubristic to tag as "will never need any
enhancement whatsoever").


> Using method(object) instead of 
> object.method() is a throwback from Python's earlier non-object oriented
> days,

Python has never had any "non-object oriented days": it's been OO from
day one.  There have been changes to the object model (all of my above
discourse is predicated on the new-style OM, where the implied lookups
for specialmethods are always on type(x), while the classic OM had the
problematic feature of actual lookups on x, for example), but typical
application-level code defining and using classes and objects would look
just about the same today as in Python 1.0 (I never used that, but at
least I _did_ bother studying some history before making assertions that
may be historically unsupportable).

> and something which should be phased out by p3k. Personally, I'd 

Don't hold your breath: it's absolutely certain that it will not be
phased out.  It's a widespread application of the "template method"
design pattern in the wider sense, a brilliant design idea, and, even
were Python to acquire multiple dispatch (not in the cards, alas), guess
what syntax sugar IS most suited to multiple-dispatch OO...?  Right:
func(a, b, c).  The syntax sugar typical of single-dispatch operation
gives too much prominence to the first argument, in cases in which all
arguments cooperate in determining which implementation the operation
gets dispatched to.

Now that Python has acquired a "common base for all objects" (class
object itself), it would be feasible (once backwards compatibility can
be removed) to move unary built-ins there and out of the built-in
namespace.  While this would have some advantage in freeing up the
built-in namespace, there are serious issues to consider, too.  Built-in
names are not 'reserved' in any way, and new ones may always be
introduced.  In the general case a built-in name performs a "Template
Method" DP, so objects would not and should not _override_ that method,
but rather define the auxiliary methods that the special calls.  For all
reasons already explained, the auxiliary methods' names should be
stropped (to avoid losing future possibilities of backwards compatible
language enhancement).  So what would the net advantage be?  The syntax
sugar of making you use one more character, obj.hash() rather than
hash(obj), while creating a new burden to explain to all and sundry that
they _shouldn't_ 'def hash' in their own classes but rather 'def
do_hash' and the like...?  Add to that the sudden divergence between
one-argument operations (which might sensibly be treated this way) and
two- and three- argument ones (which should not migrate to object for
the already-mentioned issue of single vs multiple dispatch), and it
seems to me the balance tilts overwhelmingly into NOT doing it.

> like to see the use of underscores in name-mangling thrown out 
> altogether, as they "uglify" certain code and have no practical use in a
> truly OO language, but I'm sure that's a point many will disagree with
> me on.

No doubt.  Opinions are strongest on the issues closest to syntax sugar
and farthest away from real depth and importance; it's an application of
one of Parkinson's Laws (the amount of time devoted to debating an issue
at a board meeting is inversely proportional to the amount of money
depending on that issue, if I correctly recall the original
formulation).  For me, as long as there's stropping where there SHOULD
be stropping, exactly what sugar is used for the stropping is quite a
secondary issue.  If you want to name all intended-as-private attributes
private_foo rather than _foo, all hooks-for-TMDP-operations as do_hash
rather than __hash__, and so on, I may think it's rather silly, but not
an issue of life and death, as long as all the stropping kinds that can
ever possibly be needed are clearly identified.  _Removing_ the
stroppings altogether, OTOH, would IMHO be a serious technical mistake.

Seriously, I don't think there's any chance of this issue changing in
Python, including not just Python 3.0, which _will_ happen one day a few
years from now and focus on simplifying things by removing historically
accumulated redundant ways to perform some operatons, but also the
mythical "Python 3000" which might or might not one day eventuate.

If you really think it's important, I suggest you consider other good
languages that may be closer to your taste, including for example Ruby
(which uses stropping, via punctuation, for totally different purposes,
such as denoting global variables vs local ones), and languages that
claim some derivation from Python (at least in syntax), such as Boo.


Alex



More information about the Python-list mailing list