Proposal: default init

Tue Nov 14 02:03:48 EST 2000

Hmm.  Okay, let me restate things this way.

Suppose I'm doing XML programming and write a class which implements
both the ContentHandler and ErrorHandler interfaces.

from xml.sax import handler

class MyHandler(handlers.ContentHandler, handlers.ErrorHandler):
    def __init__(self):
        handlers.ContentHandler.__init__(self)
    def ...

By symmetry, it looks like there should also be a
        handlers.ErrorHandler.__init__(self)

but this is incorrect because ErrorHandler does not have an __init__.

At some point the code goes through code review.  The reviewer sees
only one of the bases is explicitly called, so must check the ErrorHandler
class to check that it indeed does not have an __init__.

It would be nice if all base classes implemented an __init__.  They
do not (eg, ErrorHandler).  Thus, to remove possible confusion, the
contientious developer using a library might write the class with
a comment, like:

class MyHandler(handler.ContentHandler, handler.ErrorHandler):
    def __init__(self):
        handler.ContentHandler.__init__(self)
        # ErrorHandler doesn't have an __init__

    def ...

That solves some of the code review problem because it removes the
work of remembering why the symmetry might not be present.   It doesn't
completely solve the review problem because a reviewer might wonder if
an __init__ was added in newer versions of the library.

Changing the base class is not an unusual case, even for well designed
code.  However, in most cases the API can be extended without breaking
old code dependent on the library.  The easiest example is that of
adding new parameters but putting them after the old parameters and
giving them appropriate default arguments.

Suppose I want to add new functionality to the ErrorHandler class.
For argument's sake, suppose I wanted to attach a Locator to it, just
like the ContentHandler does (so the default _locator is None).

The obvious way to do this is to modify the ErrorHandler class to
look like:

class ErrorHandler:
    def __init__(self):
        self._locator = None
    def setDocumentLocator(self, locator):
        self._locator = locator
    ... rest of the ErrorHandler definition ...

Anyone creating an ErrorHandler will be unaffected by this change.
However, any classes derived from ErrorHandler will not work properly
unless they are modified to call ErrorHandler.__init__(self).

Now suppose that all Python classes have a default __init__ method
which does nothing.  Then the MyHandler implementation could be

class MyHandler(handler.ContentHandler, handler.ErrorHandler):
    def __init__(self):
        handler.ContentHandler.__init__(self)
        handler.ErrorHandler.__init__(self)

    def ...

(unless there was a good reason not to call the base class's constructor,
and I do know some good reasons).  The call to ErrorHandler.__init__ is
unneeded for current code because it doesn't do anything.  It *is* needed
for future proofing, as it allows the underlying APIs to expand so long
as they remain backwards compatable.

So the advantages of my proposal are:
  1) it's easier to see/review that constructors are written correctly
  2) it allows for more types of backwards compatible API changes

Hmm, there's also a 3)  Suppose ErrorHandler gains an __init__ which
sets up some variables.  Because MyHandler never calls it, those variables
are never set.  It might be a while before someone uses ones of the new
functions of the new ErrorHandler base class, which fails because of
the missing variable.  The error message will be unexpected and a bit
more complicated because it is really the result of another error - not
calling the __init__.  So 3) is that it more readily catches non-backwards
compatible API changes.

Oh, and another solution is for everyone to always include an explicit
__init__ in base classes, even when empty and not needed.  Not everyone
does this (there are about 5 classes I found in the 2.0 distribution
of base classes without an __init__) so the need for an explicit __init__
needs to be better documented and disseminated.

Alex Martelli wrote:
>class Eggs(Spam):
>    def __init__(self):
>        self.scrambled = 0
>        superinit = getattr(Spam,'__init__',None)
>        if callable(superinit): superinit(self)

Hmmm.  If I did this (and I agree this isn't something people should do)
then I would write it as:

        if hasattr(Spam, '__init__'):
            Spam.__init__(self)

This gives the expected behaviour for admittedly bizzare errors like

class Spam:
  __init__ = 9

>One hardly expects derived-classes to stay unchanged when the
>base-class's structure changes, after all.

If it is possible to modify the base class in a backwards compatible
manner (eg, with default arguments) then the library is overall more
useful because the dependent libraries don't need to be changed as
often.

I don't expect class libraries to remain unchanged.  Instead, I'm
trying to promote a way to minimize the effects of those changes on
existing code.

>> However, in some sense the API for Spam did not change.  Suppose
>
>In *what* sense?  Only that of a call to Spam(), I guess --
>which internally does the call-if-exists on __init__.  But,
>clearly, not for derived-classes of Spam -- for them, Spam's
>API did change, and how!

I mean it in that sense.  You can create a Spam if it has an __init__
or if it doesn't.  I believe it would be useful if derived classes
has the same ability to call Spam's constructor, even if a user
defined one does not exist.

>> all classes have a built-in method called __init__ which takes no
>> parameters and does nothing but is called if there is no user defined
>> call.  Then I could have written
>
>You mean, something equivalent to:
>    def __init__(self): pass
>?

Yes.

>That would break currently-working code if an __init__ existed
>in any _other_ base-class:
[ example removed  defining classes A:, B:, C(A,B): and D(C): where
D's __init__ calls C.__init__ which really resolves to B's __init__]

Yes, you are right.  I hadn't thought of that problem.  But that just
emphasises my concern.  Is it useful that adding an __init__ to A
completely changes which initializer is called?  I feel it's more
detrimental than useful.

Suppose instead that base classes (true base classes, not derived
ones like 'C' or 'D' in this example) always had a do nothing __init__.
Then  D's __init__ would always call A's __init__, unless C defined
its own __init__ to use B's instead.  You can get the behaviour of
ignoring one constructor over the other, but you have to ask for it.

Explicit over implicit.

Still, that would break existing code.  As another possibility,
getattr(klass, '__init__') could always succeed, and return the
dummy function if it isn't otherwise defined.  This would mean that

  ErrorHandler.__init__(self)

always works, and that D's call of C's __init__ would still get B's
__init__.  Only if there isn't an __init__ in the class tree would
this empty function be returned.

But that isn't clean since it's hard to reproduce the behaviour in
Python.  The following would always report that C uses A's __init__
even though it really uses B's.

# find where the __init__ is located
for base in klass.__bases__:
  if hasattr(base, "__init__"):
     print "Using the one in", base
     break
else:
  print "No __init__"

To work, the code would have to become

for base in klass.__bases__:
  if hasattr(base, "__init__"):
    x = base.__init__
    if x is not sys.default_init:
      print "Using the one in", base
      break
else:
  print "Using the default __init__"

I also don't like the special case it puts on __init__ (although
it would only be in attribute lookups of classes).

>So, if nothing else, the "if there is no user
>defined call" _has_ to be specified *much* more
>precisely -- the "phantom do-nothing __init__"
>should "appear" under exactly _what_ cases...?

Since I don't like my new proposal, I'll go back to the old one.
Base classes which are not derived from any other class and which
do not contain a user defined __init__ will have an __init__ added
to them, of the form

   def __init__(self): pass

>"not hasattr(X, '__init__')" does not seem to
>be a sufficient condition for a do-nothing
>__init__ to be synthesized for class X -- note
>that class A, above, satisfies this, yet doing
>the synthesis on it *would* break currently
>working code for class D.

Yes, it would break code.  So I'll push this proposal for Py3K.  I'll
still believe that code written to take advantage of that behaviour is
poorly written and deserves to be broken.

Until then, all of my base classes will now have an __init__ even
if not explicity needed and I will advocate that others do the same.

>Maybe the determination would have to be context
>dependent -- and unless we're willing to shatter
>a lot of current implication relationships among
>hasattr, getattr, ..., that seems a biggie.  I.e.,
>hasattr(A,'__init__'), for *one and the same*
>class object A, would have to be true in certain
>cases, false in others... seems pretty thorny to
>sort out.

I don't follow.  I would like hasattr(A, '__init__') to always
be true.

>>   class Eggs(Spam):
>>     def __init__(self):
>>       Spam.__init__(self)
>>       self.scrambled = 0
>
>Why not always code like this, if you want to 'never'
>have to (so modestly!) refactor derived-classes?

I think you've switched to hyperbole in your rhetoric.  My original
post did not use the word 'never' in it.  My goal is to foster the
ability for libraries to have API changes without breaking dependent
code.

>Or check as above, if you insist on coding base-classes as:
>
>>   class Spam:
>>     pass

I don't insist on it.  In my original post I said "I updated one of
my libraries".  What I should have said was, "I updated my local
copy of someone else's library."  I can insist on whatever sort of
local coding style I want, but I can't force that on everyone else.

Doing the explicit check is inelegant.  It would be nice if the language
supported another approach.

>> having a different interface than the functionally identical
>>   class Spam:
>>     def __init__(self):
>>       pass
>
>These are FAR from "functionally identical", BTW.  See
>above...!  The latter stops __init__ from being looked
>for in other, parallel bases-classes; the former, of
>course, doesn't.

Now I know you've switched rhetorical styles, since you're repeating
yourself.  You already said you can see at least similarity when you
stated:  "In *what* sense?  Only that of a call to Spam(), I guess"

So I'll repeat myself :)  I believe that code which depends on that
behaviour is error prone and should not be encouraged.  Breaking
code, even poor code, is a bad thing so I'll not advocate changes
for 2.x.

>Even apart from such lookup differences, claiming these
>classes are functionally identical is like making the
>same claim for
>
>class Foo1:
>    def bar(self): pass
>
>vs
>
>class Foo2:
>    pass
>
>They simply *aren't*.  You can, if you wish, call .bar
>on instances of class Foo1 (without effect); you can't
>do it on instances of class Foo2.

That argument is only true for the current implementation of Python.
What I'm suggesting (now that you've pointed out that it breaks existing
code) is that an alternate Python, with slightly different behaviours,
would be more useful.  Suppose all base classes did have an __init__,
which are either user defined or a default do-nothing version.  Then

class Foo:
  pass

and

class Foo:
  def __init__(self):
    pass

really will be identical.

There is even precedent for Python adding data to existing data structures.
The exec command takes a dictionary containing the allowed functions.
If the dict doesn't have a __builtins__, one is added to it.

>>> a = {}
>>> exec "1" in a
>>> a
{'__builtins__': {'cmp': <built-in function cmp>, 'dir': <built-in function
dir>,
 ...

>Why is __init__ any different?  Because, in certain
>cases, it's called 'on your behalf' by the class itself,
>when it's used as a callable to instantiate it?

That's my point.  The list of "certain cases" is too small and should be
broaded to include explicit calls to __init__ in addition to implicit
ones via instantiation.  I know that's not the way it currently works.
That's why this is a proposal for change.

>But
>that is no different from, say, __add__ -- *that*, too,
>is called 'indirectly' (by using + as an operator on
>an instance of the class)...

There's no default __add__ so this isn't a good example.  A better
one is __cmp__.  In any case, it is different because there's no need
to make an explict call to __add__.  Instead, it can be hidden behind
a +, or operator.add().  Just like the call to __cmp__ can be hidden
in a cmp().  In addition, all of those operations are unary or binary.
The __init__ function is the only one which can require an arbitrary,
user-specified number of arguments and still be doing the right thing.

(Meaning you can have "def __cmp__(self, a, b, c):" but it will fail
because it cannot be used as expected.  You can define an __init__ to
take 3 arguments.)

In addition, if you believe in programming with invariants then __init__
is special because it's the only external method which is called when
the object is not in an invariant state.  With this viewpoint, calling
a base's __init__ is also special because that's the only way to put
the base class into its invariant form.  Indeed, in some languages
(like C++) the base class's constructors are called before the derived
ones, while the various operator methods are not (like operator++,
operator[], etc.)

So there are some natural differences between __init__ and the other
special methods.

Also, most of the special functions in Python are immutable, while
__init__ is definitely mutable.  Perhaps the best comparison would be
to something like __iadd__, which can be implemented in the generic
sense (I think - still inexperienced with 2.0 additions) as:

  class C(A,B):
    def __iadd__(self, obj):
        x = A.__iadd__(self, obj)
        return B.__iadd__(x, obj)

In which case I agree, the __i*__ methods will have the same problems
as __init__.  But I'll say again that __init__ is special because it
is so widely used.

>Python's rules for inheritance are crystal-simple: their
>utter, refreshing simplicity is exactly what makes it all
>so awesomely *powerful*.

I am not asking to change how inheritance works.  (Okay, I did propose
one such above, but I didn't like it.)  I'm proposing that all base
classes contain an __init__ even if there isn't a user defined one.
This has an effect on how lookup works, but I'll bet there isn't any
good quality, non-trivial code which depends on that fact.  (And if
there are, it's easy to fix.)

>> Comments?  Any reason why this would be a bad thing?
>
>Convenience, Simplicity, Power: pick any two.
>
>Python is one of the few software environments I know
>that has _mostly_ managed to eschew the misleading lure
>of "Convenience", in favour of the far-preferable
>beacons of Simplicity and Power.

And I argue that my proposal makes things simpler to understand
with no loss of power.  It removes some convenience because the
lookup rules, as applied to __init__, find an implementation in
the first base class and so do not need to search in parallel
base classes.  So you have to put in explicit mechanisms to
tell which __init__ should be used.  Because, if there are multiple
parent classes, the derived class will almost likely call all of
them and not just one.

>Thus, for example,
>no automatisms (which would be VERY convenient indeed,
>far more than your more limited proposal...) in the
>calls of such methods as __init__ and __del__ on base
>classes -- you must do it yourself, explicitly, if and
>when you want;

As for example C++, which automatically calls base class constructors
before the derived class.  But (to repeat above) even in this case
they only do that for the constructor-like operations.  There is no
default operator+= which implements all of the base class +=s.  (Or
maybe there is?  It's been a few years.)  So there is precedent for
constructors being special.

Besides, I'm not argueing for a change in how this works.  You still
need explicit calls to the parents' __init__ methods.  The difference
is that each base class is guaranteed to have one.

>Python itself only deals with the 'first'
>call to such methods, found with the same getattr rules
>that always apply to _all_ method lookups.  A trifle
>less convenience, a ton more simplicity & the resulting
>power...

I'm not suggesting that the getattr rules be changed, only that each
base class always define an __init__.

Why is the current behaviour simplier?  I've already pointed out two
downsides (harder to verify using code inspection and more fragile
with modifications to the API).  What are the advantages of keeping
the implementation the way it is, in terms of simplicity?  (Other than
breaks existing code.)

                    Andrew
                    dalke at acm.org

Proposal: default __init__

Proposal: default init