Does Python really follow its philosophy of "Readability counts"?

Fri Jan 23 07:30:51 EST 2009

Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> writes:

> On Thu, 22 Jan 2009 19:10:05 +0000, Mark Wooding wrote:
>> Well, your claim /was/ just wrong.  But if you want to play dumb: the
>> interface is what's documented as being the interface.
>
> But you miss my point.

Evidently.

> We're told Python doesn't have private attributes. We're told that
> we're allowed to "mess with the internals", we're *encouraged* to do
> so: Python gives you the freedom to do so, and any suggestion that
> freedom might be reduced even a tiny bit is fought passionately.

Your deduction skills are faulty.

  * Python gives us the freedom to do so, and we fight to protect that
    freedom -- yes.

  * But interpreting that as encouragement is wrong.  It's permission,
    not encouragement.  If you don't want to, that's fine, and we won't
    think less of you.

Many things are possible which aren't, as a general rule, good ideas.
Misinterpreting permission as encouragement will lead you to doing many
stupid things.

> When people ask how to implement private attributes, they're often
> told not to bother even using single-underscore names. When it is
> suggested that Python should become stricter, with enforced data
> hiding, the objections come thick and fast: people vehemently say that
> they like Python just the way it is, that they want the ability to
> mess with the internals.

> You even argued that you disliked data structures implemented in C and
> preferred those written in Python because you have more ability to
> mess with the private attributes. In context, I had just mentioned
> that lists' internals were inaccessible from Python code. I neglected
> to give an example at the time, but a good example is the current
> length of the list.

Umm... I'm pretty sure that that's available via the `len' function,
which is tied to list.__len__ (via the magic C-implemented-type mangler,
in C).  Though it's read-only -- and this is a shame, 'cos it'd be nice
to be able to adjust the length of a list in ways which are more
convenient than

  * deleting or assigning to a trailing slice, or
  * augmenting or assigning to a trailing zero-width slice

(Perl has supported assigning to $#ARRAY for a long time.  Maybe that's
a good argument against it.)

> Consider the experience of Microsoft and Apple. No matter how often
> they tell people not to mess with the internals, people do it anyway,
> and always believe that their reason is a good reason.

And Microsoft and Apple can either bend over backwards to preserve
compatibility anyway (which effectively rewards the misbehaviour) or
change the internals.  I'd prefer that they did the latter.

There are times when messing with internals is the only way to get
things done; but there's a price to be paid for doing that, and the
price is compatibility.  The internals will change in later versions,
and your code will break, in subtle and complex ways.  It's not always
an easy decision to make -- but I'm glad it's me that gets to decide,
and not some random who neither knows nor cares much about the problem
I'm trying to solve.

It's also important to bear in mind that programs' lifetimes vary.  Some
programs are expected to live for years; some programs only for a week
or so; and some for just long enough to be typed and executed once
(e.g., at the interactive prompt).  That Python is useful for all these
kinds of program lifetimes is testament to its designers' skill.
Programmers can, and should!, make different tradeoffs depending on the
expected lifetime of the program they're writing. 

If I type some hacky thing at ipython, I know it's going to be executed
there and then, and if the implementation changes tomorrow, I just don't
care.

If I'm writing a thing to solve an immediate problem, I won't need it
much past next week, and I'll still probably get away with any awful
hacking -- but there's a chance I might reuse the program in a year or
so, so I ought to put a comment in warning the reader of a possible
bitrot site.

If I'm writing a thing that's meant to last for years, I need to plan
accordingly, and it's probably no appropriate to hack with internals
without a very good reason.

Making these kinds of decisions isn't easy.  It requires experience,
subtle knowledge of how the systems one's using work, and occasionally a
little low cunning.  And sometimes one screws up.

> And Python culture encourages that behaviour (albeit the consequences
> are milder: no buffer overflows or core dumps).
>
> Add to that the culture of Open Source that encourages reading the source 
> code. You don't need to buy a book called "Undocumented Tips and Tricks 
> for Python" to discover the internals. You just need to read the source 
> code.

Indeed.  Very useful.

Example: for my cryptographic library bindings, I needed to be able to
convert between Python's `long's and my library's `mp's.  I have a
choice between doing it very slowly (using shift and masking operators
on the `long') or fast (by including Python/longintrepr.h and digging
about by hand).  I chose to do it the fast way.  I'm quite prepared to
rewrite my conversion code (64 lines of it) if the internals change;
that I haven't had to yet indicates that my judgement of the stability
of the internal representation was about right.  The most important
point is that, /had/ I turned out to be wrong, I'd only have myself to
blame.

> And then you have at least two places in the standard library where 
> _attributes are *explicitly* public:

And documented as being so.  It's a convention, with explicitly
documented exceptions.  That's a slight shame because it weakens the
convention, but it's not a disaster.

> Given this permissive culture, any responsible library writer must
> assume that if he changes his so-called "private" attributes, he will
> break other people's code.

He will, but he can also assume that the maintainers of that code are
/willing/ to see it break.

This is tough on people who depend on internals by accident.  Maybe
they'll learn to be more careful.  It's not a pleasant way to learn, but
it's a lesson worth learning anyway.

> In principle it could break just as much code as if he didn't even
> bother flagging them with a leading underscore, which is probably why
> many people don't even bother with _names.

This comes down to documentation.  The Python standard library is
largely quite well documented, and is clear about what assumptions one
can make and what one can't.  In the absence of such clear
documentation, we're left with conventions -- _things are likely to
change in future so avoid messing on them if you don't want stuff to
break.

> In other words, if you make it easy for people to mess with your
> internals, if you have a culture that allows and even encourages them
> to mess with your internals, then you don't have internals. Everything
> is de facto public.

And here you've made a semantic leap that I'm afraid I just can't
follow.

> No, cmp() can return an infinite number of values. It just never does,
> at least not yet, but it might. But when Guido himself says that cmp()
> can return three values, can you blame people for acting as if cmp()
> can return three values?

Possibly not!  It's worth thinking about codifying the existing practice
and documenting the more constrained behaviour.  Note that the C
interface -- the tp_compare slot (Python/C API Reference Manual 10.3) --
/is/ defined to return -1, 0, or +1; so presumably the performance
issues have already been considered.

I think all of this comes down to issues of trust and responsibility.
Python, though its `we're all consenting adults' approach, encourages a
culture where we trust one another to make decisions for ourselves, and
to take responsibility for the consequences of those decisions.

Language features such as attribute (or member) visibility or access
control, on the other hand, imply a culture without trust, and with an
built-in assumption of irresponsibility.  That seems rather unpleasant
to me.

Suppose that you write a Python library module and release it.  I find
that it's /almost/ the right thing for some program of mine, but it
doesn't quite work properly unless I hack about like so... perfect!  I'm
a happy bunny; you've gained a user (maybe that's a good thing, maybe it
isn't!).  Now, I've hacked about in your module's internal stuff: how
has this affected you?  Answer: not at all; you probably didn't feel a
thing.  You release a new version with improved internal structure and
my program breaks: how has this affected you?  Answer: still not at all.
How did it affect me?  Quite a bit, but then again, I knew what I was
getting into.  I gambled and lost; oh, well, that happens sometimes.

I've not dealt with granularity much yet; but that's easy.  Basically,
decisions should be made at the level at which the consequences of those
decisions are felt.  This isn't directly practical, but there are
mechanisms to manage it: users generally delegate technical decisions to
the development team; maybe there's a hierarchy in the dev team.  And
the team members need to be trusted not to make decisions at the wrong
level.  If you can't manage that, then Python probably isn't a good
match for the team; replace one or the other.

Finally, I notice that you completely snipped the part of my reply which
dealt with your ConfigParser module.  I'm going to assume that this
means that you accepted that part of my response.

-- [mdw]