[Python-Dev] Purpose of Doctests [Was: Best practices for Enum]

Steven D'Aprano steve at pearwood.info
Mon May 20 04:33:44 CEST 2013

On 20/05/13 09:27, Gregory P. Smith wrote:
> On Sat, May 18, 2013 at 11:41 PM, Raymond Hettinger <
> raymond.hettinger at gmail.com> wrote:
>> On May 14, 2013, at 9:39 AM, Gregory P. Smith <greg at krypto.org> wrote:
>> Bad: doctests.
>> I'm hoping that core developers don't get caught-up in the "doctests are
>> bad meme".
> So long as doctests insist on comparing the repr of things being the number
> one practice that people use when writing them there is no other position I
> can hold on the matter.  reprs are not stable and never have been.

I think this *massively* exaggerates the "problem" with doc tests. I make heavy use of them, and have no problem writing doc tests that work in code running over multiple versions, including from 2.4 through 3.3. Objects that I write myself, I control the repr and can make it as stable as I wish. Many built-in types also have stable reprs. The repr for small ints is not going to change, the repr for floats like 0.5, 0.25, 0.125 etc. are stable and predictable, lists and tuples and strings all have stable well-defined reprs. Dicts are a conspicuous counter-example, but there are trivial work-arounds.

Doc tests are not limited to a simple-minded "compare the object's repr". You can write as much, or as little, scaffolding around the test as you need. If the scaffolding becomes too large, that's a sign that the test doesn't belong in documentation and should be moved out, perhaps into a unit test, or perhaps into a separate "literate testing" document that can be as big as necessary without overwhelming the doc string.

>   ordering changes, hashes change, ids change, pointer values change,
> wording and presentation of things change.  none of those side effect
> behaviors were ever part of the public API to be depended on.

Then don't write doctests that depend on those things. It really is that simple. There's no rule that says doctests have to test the entire API. Doctests in docstrings are *documentation first*, so you write tests that make good documentation.

The fact that things that are not stable parts of the API can be tested is independent of the framework you use to do the testing. If I, as an ignorant and foolish developer, wrote a unit test like this:

class MyDumbTest(unittest.TestCase):
     def testSpamRepr(self):
         x = Spam(arg)
         self.assertEquals(repr(x), "<Spam object at 0x123ab>")

we shouldn't conclude that "unit tests are bad", but that MyDumbTest is bad and needs to be fixed. Perhaps the fix is to re-write the test to care less about the exact repr. (Doctest's ellipsis directive is excellent for that.) Perhaps the fix is to give the Spam object a stable repr that doesn't suck. Or perhaps the fix is to just say, this doesn't need to be a test at all. (And doctest has a directive for that too.) They are all good solutions to the "problem" of unit testing things that aren't part of the API, and they are also good solutions to the same problem when it comes to doctests.

> I really do applaud the goal of keeping examples in documentation up to
> date.  But doctest as it is today is the wrong approach to that. A repr
> mismatch does not mean the example is out of date.

No, it means that either the test was buggy, or the test has failed.

I must admit that I don't understand what you think happens with doc testing in practice. You give the impression that there are masses of doc tests being written that look like this:

>>> x = Spam(arg)
>>> print(x)
<Spam object at 0xb7cf5d70>

and therefore the use of doc tests are bad because it leads to broken tests. But I don't understand why you think that nobody has noticed that this test will have failed right from the start, and will have fixed it immediately. I suppose it is possible that some people write doc tests but never run them, not even once, but that's no different from those who write unit tests but never run them. They're hardly representative of the average developer, who either doesn't write tests at all, or who both writes and runs them and will notice if they fail.

> In my earlier message I suggested that someone improve doctest to not do
> dumb string comparisons of reprs. I still think that is a good goal if
> doctest is going to continue to be promoted. It would help alleviate many
> of the issues with doctests and bring them more in line with the issues
> many people's regular unittests have. As Tres already showed in an example,
> individual doctest using projects jump through hoops to do some of that
> today; centralizing saner repr comparisons for less false failures as an
> actual doctest feature just makes sense.

If a test needs to jump through hoops to work, then it doesn't belong as a test in the doc string. It should be a unit test, or possibly a separate test file that can be as big and complicated as needed. If you want to keep it as an example, but not actually run it, doctest has a skip directive. There's no need to complicate doctest by making it "smarter" (and therefore more likely to be buggy, harder to use, or both).

> Successful example: We added a bunch of new comparison methods to unittest
> in 2.7 that make it much easier to write tests that don't depend on
> implementation details such as ordering. Many users prefer to use those new
> features; even with older Python's via unittest2 on pypi.

And that's great, it really is, I'm not being sarcastic. But unit testing is not in competition to doc testing, they are complimentary, not alternatives. If you're not using both, then you're probably missing out on something.


More information about the Python-Dev mailing list