Does Python really follow its philosophy of "Readability counts"?

Sun Jan 25 14:56:31 EST 2009

Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> writes:

> On Fri, 23 Jan 2009 21:36:59 -0500, Luis Zarrabeitia wrote:
>> It makes sense... if the original author is an egotist who believes he
>> must control how I use that library.
>
> Then I guess Guido must be such an egotist, because there's plenty of 
> internals in Python that you can't (easy) mess with.

Time for some reflection.  (Apposite word, as it turns out.)

For the avoidance of doubt, I shall grant (and not grudgingly):

  * Abstraction is a useful tool in building complex systems.

  * Separating an interface from its implementation reduces the
    cognitive burden on people trying to reason about the system
    (including when doing design, developing clients, or trying to do
    more formal kinds of reasoning).

  * It also makes maintenance of the implementation easier: in the cases
    where this it's possible to improve the implementation without
    changing the interface, clients can benefit without having to be
    changed.

I think that one of the reasons this conversation is going on for so
long is that we haven't really talked much about what kinds of `messing'
we're talking about.

I think that, most of the time when I'm inconvenienced by some
abstraction, it's because it's hiding something that I wanted to see --
in a read-only fashion.  The implementation knows some fact that, for
whatever reason, it's unwilling to reveal to me.  I understand that, in
some future version, the implementation might change and this fact might
not be available then, or that it's an artifact of the way the
implementation works in some environment -- but for whatever reason
(debugging is a typical one as was pointed out upthread) it turns out
that I'm actually interested in this fact.  Revealing it to me can't
actually hurt the invariants of the system, though I need to be somewhat
careful about how long I assume it's correct.  Of course, that should be
entirely my responsibility.

It's this common problem of wanting to dig out some piece of information
which I'm really worried about.  And `enforced data hiding' just slams
the door in my face.  I'm not best pleased by the idea.

Anyway, in this regard, the CPython implementation is pretty much a
paragon of virtue.  It lets one get at almost everything one could want
and a whole lot else besides.

> Yes you could, and you could hack the OS to manipulate data behind the
> scenes, and you could build a hardware device to inject whatever data
> you want directly into the memory. You can do any of those things. So
> what?
>
> Data hiding isn't about some sort of mythical 100% certainty against
> any imaginable failure mode. Data hiding is a tool like any other, and
> like all tools, it has uses and misuses, and it works under some
> circumstances and not others.
>
> If you don't get 100% certainty that there will never be a failure no 
> matter what, what do you get? Just off the top of my head, it:

How much of these do you /lose/ by having a somehat more porous
interface?

> * makes it easier for an optimising compiler to give fast code if it 
> doesn't have to assume internal details can be changed;

Irrelevant for read-only inspection.  For making modifications, this
might be a valid point, though (a) I'm unaware of any compilers
sufficiently aggressive to make very effective use of this, and (b) I'm
probably willing to accommodate the compiler by being sufficiently
careful about my hacking.  That is: go ahead, use a fancy compiler, and
I'll cope as best I can.

> * makes it easier to separate interface from implementation when you
> can trust that the implementation actually isn't being used;

Irrelevant for read-only inspection.  For making modifications: you
carry on assuming that the interface is being used as you expect, and
I'll take on the job of reasoning about invariants and making sure that
everything continues to work.

> * gives the developer more freedom to change the implementation;

For read-only inspection, I might lose if you stop providing the
information I want; I'll need to change my code, but you don't need to
care.  Probably if your implementation has changed that much, the
information isn't relevant any more anyway.

Besides, if your implementation changes break my code, I get to keep
both pieces, and you get to laugh.  What's the big deal?

> * makes it possible for meaningful correctness proofs;

Irrelevant for read-only inspection.  For making modifications, I'll
take on the responsibility for amending the proofs as necessary.

> * reduces the amount of interconnections between different parts of your 
> program by ensuring that all interaction goes through the interface 
> instead of the implementation;

For read-only inspection, I'm not sure this matter much -- if your
implementation knows a fact that I want, then either I'll get it through
your interface or dredge it out of your implementation's guts, but the
module coupling's there either way.  (If there was a better way to
obtain that fact, then I should just have used the better way instead --
but in the case where it's a fact about your implementation's state
there probably isn't a better way.)  Similarly for modifications,
actually: if I have a need to change your implementation's state
somehow, I can do that through the interface or under the table, but
there's a coupling either way.

> * which in turn reduces the amount of testing you need to do;

See above.

> Well, you tell me: does it make sense for Guido to have decided to
> make it hard for pure Python developers to cause buffer overflows?

Yes.

That said, I'm glad that it's /possible/ to write unsafe programs in
Python.  It means that the right escape-hatches are present.

What I'm really complaining about are the kinds of interfaces -- which I
see all to often in languages where people have embraced this kind of
mandatory `information hiding' overenthusiastically -- where (a) the
right features aren't there to begin with, and (b) the escape hatches
are either messing or /extremely/ inconvenient.  Java programs often
seem to be like this.

But CPython bends over backwards to provide useful information about its
state: all those wacky attributes on functions and code objects and so
on.  Without this kind of care, I'm pretty sure that mandatory hiding is
far worse as a cure than people diddling about inside other modules'
implementation details is as a disease.

I'm expecting you to argue that programmers would be too sensible to
hide interesting information behind their mandatory-data-hiding, and I
should just give them some credit.  Maybe: but the situation is
different.  Firstly, we wouldn't be asking for this feature if we were
willing to gave programmers some credit for acting responsibly when they
dig about in another module's innards.  Secondly, while it's certainly
possible to mess up when poking about, the damage is fairly localized;
if I'm overprotective about my mandatory hiding, I can screw other
people.

I guess that if overriding the controls was as easy as

        with naughty_hacking:
          ## stuff ...

I wouldn't complain.  (But I think the effect ought to be scoped
/lexically/ rather than dynamically, so that grep works properly.)

-- [mdw]