[AstroPy] Coding Guidelines draft (comments encouraged)

Tue Aug 2 10:15:44 EDT 2011

On 07/25/2011 04:45 PM, Erik Bray wrote:
> On 07/25/2011 08:09 AM, Erik Tollerud wrote:
>> I have made a new version of the coding guidelines and posted them to:
>>    http://astropy.org/development/codeguide.html
>>
>> This version incorporates most of the feedback we've gotten so far on
>> this thread, so if you've suggested something, you might want to look
>> to make sure what's in there is what you had in mind.  In particular,
>> Michael, you may want to look at the new phrasing for the data file
>> limit.  Additionally, a few of you may want to look over the new
>> phrasing for when C extensions are acceptable.
>>
>>
>> One thing I have *not* yet altered is the section on super() and
>> multiple inheritance, as the discussion does not seem to have
>> concluded.
>
> Allow me to add a little more fuel to the discussion then :)
>
> First, for some additional motivating background, here's a post by Guido
> on the history of class method resolution orders (MROs) in Python, and
> the rationale for the current solution:
> http://python-history.blogspot.com/2010/06/method-resolution-order.html
>
> It's easier reading than any PEP on the topic, so I would consider it
> worth reading for anyone who would like more background.
>
> The recommendation given in the current Coding Guidelines completely
> breaks the intended MRO for __init__().  In this example A.__init__() is
> called twice (as well as object.__init__(), though this does nothing)!
> The effective MRO is D->B->A->object->C->A->object.  You can see this
> for yourself if you take the original example and add some print statements:
>
>   >>>  class A(object):
> ...     inits = 0
> ...     def __init__(self):
> ...             self.inits += 1
> ...             if self.inits>  1:
> ...                     print 'Calling A.__init__() again!'
> ...             else:
> ...                     print 'Calling A.__init__() for the first time.'
> ...
>   >>>  class B(A):
> ...     def __init__(self):
> ...             print 'Calling B.__init__().'
> ...             A.__init__(self)
> ...
>   >>>  class C(A):
> ...     def __init__(self):
> ...             print 'Calling C.__init__().'
> ...             A.__init__(self)
> ...
>   >>>  class D(C, B):
> ...     def __init__(self):
> ...             print 'Calling D.__init__().'
> ...             B.__init__(self)
> ...             C.__init__(self)
>   >>>  D()
> Calling D.__init__().
> Calling B.__init__().
> Calling A.__init__() for the first time.
> Calling C.__init__().
> Calling A.__init__() again!
> <__main__.D object at 0x2afb22dd7750>
>
> While this may be harmless in some cases, it's most likely not what the
> developer really wants.  And it's completely different from what Python
> would otherwise do which is D->C->B->A->object.  This is a much more
> sensible MRO (and based on a well thought out algorithm that handles
> numerous corner cases that you might not think of immediately).  This is
> also very difficult to do without cooperative super() calls.
>
> Having some classes *not* using super() will break any code that does
> try to use them as part of a cooperative super() call.  So the better
> policy is to *require* the use of super(), especially in __init__()
> which is the most common case--I can't emphasize this enough :).
>
> The problems with this, I think, are overstated:  In the simplest and
> most common case of single-inheritance using super(Foo, self).__init__()
> is equivalent to Foo.__init__(self).  In the case of inheriting from two
> classes with no overlapping functionality--orthogonal bases as James put
> it--again it's a moot point.
>
> super() *only* adds complexity and/or confusion in diamond-like or even
> more complex class hierarchies.  But in these cases super() will always
> make sure that the "right" thing is done.  What Python's algorithm
> considers "right" may not always be exactly what the developer wants,
> but in the case of multiple inheritance Python follows its Zen by having
> One Right Way to do it.  And for a complex and confusing subject like
> multiple inheritance that's a good thing.
>
> Admittedly, the semantics for using super() in Python 2.x are
> cumbersome.  Python 3 at least allows calling super() with no arguments,
> which is the same as super(__class__, self)--the most common way of
> calling it.  But that's just Python 3 :)

I know people have been busy, myself included, but I wonder if anyone 
has had time to consider the use of super() and the defense thereof.

I would change the coding guidelines read opposite of how they do 
currently: Use of super() should be encouraged except where one knows 
for sure that it's not going to be appropriate, due to the need for a 
customized method resolution order.  What *should* be discouraged is the 
use of multiple inheritance except in the cases of orthogonal bases, or 
if one is really sure what they're doing.

On an unrelated matter, I have one other suggestion for the coding 
guidelines:

A top level package's __init__.py should be importable under all 
circumstances, and should not too much "stuff" in it, for lack of a 
better word.  To explain: Several of our own packages have an 
__init__.py that contains a lot of functionality, such that when I try 
to import it, it might crash (due to some prerequisite not being met), 
take a very long time (due to it importing many submodules 
unnecessarily), or it might have actual side effects like writing to the 
filesystem.

All of these behaviors should be discouraged.  In particular it can 
cause problems for installation and test frameworks which use the Python 
import mechanism to discover packages and modules on the path.  Merely 
importing the package should not fail due to, for example, an 
environment variable not being set.

There are valid reasons for the __init__.py containing functionality, in 
particular for ease of use at the Python command line.  But they should 
be designed so that they can at least be imported without side-effects. 
  This would mean cleanly disabling certain functionality if any 
prerequisites are not bet, and delaying any side-effects until it's 
certain that they are necessary.

Just to give a concrete example (that's long since been fixed, actually) 
matplotlib used to try to write out a default cache/config directory 
upon import, which could fail if the process it was running in didn't 
have permission to create that directory.  That's the kind of thing that 
should be avoided.  I think that a note to that effect should be added 
to the coding guidelines.  I'd be happy to write up a draft if people 
agree with this.

Thanks,
Erik