[Python-Dev] Purpose of Doctests [Was: Best practices for Enum]

Mon May 20 01:27:33 CEST 2013

On Sat, May 18, 2013 at 11:41 PM, Raymond Hettinger <
raymond.hettinger at gmail.com> wrote:

>
> On May 14, 2013, at 9:39 AM, Gregory P. Smith <greg at krypto.org> wrote:
>
> Bad: doctests.
>
>
> I'm hoping that core developers don't get caught-up in the "doctests are
> bad meme".
>

So long as doctests insist on comparing the repr of things being the number
one practice that people use when writing them there is no other position I
can hold on the matter.  reprs are not stable and never have been.
 ordering changes, hashes change, ids change, pointer values change,
wording and presentation of things change.  none of those side effect
behaviors were ever part of the public API to be depended on.

That one can write doctests that don't depend on such things as the repr
doesn't ultimately matter because the easiest thing to do, as encouraged by
examples that are pasted from an interactive interpreter session into docs,
is to have the interactive interpreter show the repr and not add code to
check things in a accurate-for-testing manner that would otherwise make the
documentation harder for a human to read.

Instead, we should be clear about their primary purpose which is to test
> the examples given in docstrings.   In many cases, there is a great deal
> of benefit to docstrings that have worked-out examples (see the docstrings
> in the decimal module for example).  In such cases it is also worthwhile
> to make sure those examples continue to match reality. Doctests are
> a vehicle for such assurance.  In other words, doctests have a perfectly
> legitimate use case.
>

I really do applaud the goal of keeping examples in documentation up to
date.  But doctest as it is today is the wrong approach to that. A repr
mismatch does not mean the example is out of date.

We should continue to encourage users to make thorough unit tests
> and to leave doctests for documentation.  That said, it should be
> recognized that some testing is better than no testing.  And doctests
> may be attractive in that regard because it is almost effortless to
> cut-and-paste a snippet from the interactive prompt.  That isn't a
> best practice, but it isn't a worst practice either.
>

Not quite, they at least tested something (yay!) but it is uncomfortably
close to a worst practice.

It means someone else needs to come understand the body of code containing
this doctest when they make an unrelated change that triggered a behavior
change as a side effect that the doctested code may or may not actually
depend on but does not actually declare its intent one way or another for
the purposes of being a readable example rather than accurate test.

bikeshed colors: If doctest were never called a test but instead were
called docchecker to not imply any testing aspect that might've helped (too
late? the cat's out of the bag).  Or if it never compared anything but
simply ran the example code to generate and update the doc examples from
the statements with the current actual results of execution instead of
doing string comparisons...  (ie: more of an documentation example "keep up
to date" tool)

Another meme that I hope dispel is the notion that the core developers
> are free to break user code (such as doctests) if they believe the
> users aren't coding in accordance with best practices.   Our goal is to
> improve their lives with our modifications, not to make their lives
> more difficult.
>

Educating users how to apply best practices and making that easier for them
every step of the way is a primary goal. Occasionally we'll have to do
something annoying in the process but we do try to limit that.

In my earlier message I suggested that someone improve doctest to not do
dumb string comparisons of reprs. I still think that is a good goal if
doctest is going to continue to be promoted. It would help alleviate many
of the issues with doctests and bring them more in line with the issues
many people's regular unittests have. As Tres already showed in an example,
individual doctest using projects jump through hoops to do some of that
today; centralizing saner repr comparisons for less false failures as an
actual doctest feature just makes sense.

Successful example: We added a bunch of new comparison methods to unittest
in 2.7 that make it much easier to write tests that don't depend on
implementation details such as ordering. Many users prefer to use those new
features; even with older Python's via unittest2 on pypi. It doesn't mean
users always write good tests, but a higher percentage of tests written are
more future proof than they were before because it became easier.

Currently, we face an adoption problem with Python 3.  At PyCon,
> an audience of nearly 2500 people said they had tried Python 3
> but weren't planning to convert to it in production code.  All of the
> coredevs are working to make Python 3 more attractive than Python 2,
> but we also have to be careful to not introduce obstacles to conversion.
> Breaking tests makes it much harder to convert (especially because
> people need to rely on their tests to see if the conversion was
> successful).
>

Idea: I don't believe anybody has written a fixer for lib2to3 that applies
fixers to doctests.  That'd be an interesting project for someone.

Now you've got me wondering what Python would be like if repr, `` and
__repr__ never existed as language features. Upon first thoughts, I
actually don't see much downside (no, i'm not advocating making that
change).  Something to ponder.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20130519/17a00094/attachment-0001.html>