Re: [Python-ideas] doctest
On Feb 17, 2012 4:12 PM, "Nick Coghlan"
On Sat, Feb 18, 2012 at 7:57 AM, Mark Janssen
wrote:
Anyway... of course patches welcome, yes... ;^)
Not really. doctest is for *testing code example in docs*. If you try to use it for more than that, it's likely to drive you up the wall, so proposals to make it more than it is usually don't get a great reception (docs patches to make it's limitations clearer are generally welcome, though). The stdib solution for test driven development is unittest (the vast majority of our own regression suite is written that way - only a small proportion uses doctest).
This pessimistic attitude is why doctest is challenging to work with at times, not anything to do with doctest's actual model. The constant criticisms of doctest keep contributors away, and keep its many resolvable problems from being resolved.
An interesting third party alternative that has been created recently is behave: http://crate.io/packages/behave/
This style of test is why it's so sad that doctest is ignored and unmaintained. It's based on testing patterns developed by people who care to promote what they are doing, but I'm of the strong opinion that they are inferior to doctest. Ian
On 18 February 2012 04:24, Ian Bicking
On Feb 17, 2012 4:12 PM, "Nick Coghlan"
wrote: On Sat, Feb 18, 2012 at 7:57 AM, Mark Janssen
wrote:
Anyway... of course patches welcome, yes... ;^)
Not really. doctest is for *testing code example in docs*. If you try to use it for more than that, it's likely to drive you up the wall, so proposals to make it more than it is usually don't get a great reception (docs patches to make it's limitations clearer are generally welcome, though). The stdib solution for test driven development is unittest (the vast majority of our own regression suite is written that way - only a small proportion uses doctest).
This pessimistic attitude is why doctest is challenging to work with at times, not anything to do with doctest's actual model. The constant criticisms of doctest keep contributors away, and keep its many resolvable problems from being resolved.
Personally I think there are several fundamental problems with doctest *as a unit testing tool*. doctest is *awesome* for testing documentation examples but in particular this one: * Every line becomes an assertion - in a unit test you typically follow the arrange -> act -> assert pattern. Only the results of the *assertion* are relevant to the test. (Obviously unexpected exceptions at any stage are relevant....). With doctest you have to take care to ensure that the exact output of *every line* of your arrange and act steps also match, even if they are irrelevant to your assertion. (The arrange and act steps will often include lines where you are creating state, and their output is irrelevant so long as they put the right things in place.) The particular implementation of doctest means that there are additional, potentially resolvable problems that are also a damn nuisance in a unit testing fail: Execution of an individual testing section continues after a failure. So a single failure results in the *reporting* of potentially many failures. The problem of being dependent on order of unorderable types (actually very difficult to solve). Things like shared fixtures and mocking become *harder* (although by no means impossible) in a doctest environment. Another thing I dislike is that it encourages a "test last" approach, as by far the easiest way of generating doctests is to copy and paste from the interactive interpreter. The alternative is lots of annoying typing of '>>>' and '...', and as you're editing text and not code IDE support tends to be worse (although this is a tooling issue and not a problem with doctest itself). So whilst I'm not against improving doctest, I don't promote it as a unit testing tool and disagree that it is suited to that task. All the best, Michael Foord
An interesting third party alternative that has been created recently is behave: http://crate.io/packages/behave/
This style of test is why it's so sad that doctest is ignored and unmaintained. It's based on testing patterns developed by people who care to promote what they are doing, but I'm of the strong opinion that they are inferior to doctest.
Ian
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html
On 27 February 2012 20:35, Michael Foord
On 18 February 2012 04:24, Ian Bicking
wrote: On Feb 17, 2012 4:12 PM, "Nick Coghlan"
wrote: On Sat, Feb 18, 2012 at 7:57 AM, Mark Janssen <
dreamingforward@gmail.com> wrote:
Anyway... of course patches welcome, yes... ;^)
Not really. doctest is for *testing code example in docs*. If you try to use it for more than that, it's likely to drive you up the wall, so proposals to make it more than it is usually don't get a great reception (docs patches to make it's limitations clearer are generally welcome, though). The stdib solution for test driven development is unittest (the vast majority of our own regression suite is written that way - only a small proportion uses doctest).
This pessimistic attitude is why doctest is challenging to work with at times, not anything to do with doctest's actual model. The constant criticisms of doctest keep contributors away, and keep its many resolvable problems from being resolved.
Personally I think there are several fundamental problems with doctest *as a unit testing tool*. doctest is *awesome* for testing documentation examples but in particular this one:
* Every line becomes an assertion - in a unit test you typically follow the arrange -> act -> assert pattern. Only the results of the *assertion* are relevant to the test. (Obviously unexpected exceptions at any stage are relevant....). With doctest you have to take care to ensure that the exact output of *every line* of your arrange and act steps also match, even if they are irrelevant to your assertion. (The arrange and act steps will often include lines where you are creating state, and their output is irrelevant so long as they put the right things in place.)
The particular implementation of doctest means that there are additional, potentially resolvable problems that are also a damn nuisance in a unit testing fail:
Jeepers, I changed direction mid-sentence there. It should have read something along the lines of: As well as fundamental problems, the particular implementation of doctest suffers from these potentially resolvable problems:
Execution of an individual testing section continues after a failure. So a single failure results in the *reporting* of potentially many failures.
The problem of being dependent on order of unorderable types (actually very difficult to solve).
Things like shared fixtures and mocking become *harder* (although by no means impossible) in a doctest environment.
Another thing I dislike is that it encourages a "test last" approach, as by far the easiest way of generating doctests is to copy and paste from the interactive interpreter. The alternative is lots of annoying typing of '>>>' and '...', and as you're editing text and not code IDE support tends to be worse (although this is a tooling issue and not a problem with doctest itself).
More fundamental-ish problems: Putting debugging prints into a function can break a myriad of tests (because they're output based). With multiple doctest blocks in a test file running an individual test can be difficult (impossible?). I may be misremembering, but I think debugging support is also problematic because of the stdout redirection. So yeah. Not a huge fan. All the best, Michael
So whilst I'm not against improving doctest, I don't promote it as a unit testing tool and disagree that it is suited to that task.
All the best,
Michael Foord
An interesting third party alternative that has been created recently is behave: http://crate.io/packages/behave/
This style of test is why it's so sad that doctest is ignored and unmaintained. It's based on testing patterns developed by people who care to promote what they are doing, but I'm of the strong opinion that they are inferior to doctest.
Ian
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
--
May you do good and not evil May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html
On Feb 27, 2012, at 10:59 PM, Michael Foord wrote:
I may be misremembering, but I think debugging support is also problematic because of the stdout redirection.
This one is largely solved too, but the trick is to put the pdb entry on the same line as the doctest line you care about, e.g.: >>> import pdb; pdb.set_trace(); command.process(None) GNU Mailman 3... When the debugger drops me into command.process(), everything Just Works. Cheers, -Barry
On Mon, Feb 27, 2012 at 3:59 PM, Michael Foord
On 27 February 2012 20:35, Michael Foord
wrote: Personally I think there are several fundamental problems with doctest *as a unit testing tool*. doctest is *awesome* for testing documentation examples but in particular this one:
* Every line becomes an assertion - in a unit test you typically follow the arrange -> act -> assert pattern. Only the results of the *assertion* are relevant to the test. (Obviously unexpected exceptions at any stage are relevant....). With doctest you have to take care to ensure that the exact output of *every line* of your arrange and act steps also match, even if they are irrelevant to your assertion. (The arrange and act steps will often include lines where you are creating state, and their output is irrelevant so long as they put the right things in place.)
The particular implementation of doctest means that there are additional, potentially resolvable problems that are also a damn nuisance in a unit testing fail:
Jeepers, I changed direction mid-sentence there. It should have read something along the lines of:
As well as fundamental problems, the particular implementation of doctest suffers from these potentially resolvable problems:
The problem of being dependent on order of unorderable types (actually very difficult to solve).
I just thought of something: isn't the obvious solution is for doctest
to test the type of an expression's output and if it's in the set of unordered types {set, dict} to run sort() on it? Then the docstring author can just put the (sorted) output of what's expected.... Perhaps I'm hashing a dead horse, but I really would like to see this added to the issue's tracker as a requested feature. I may code up the patch myself, but it helps my brain to have it "on the dev stack". mark
On Thu, Mar 1, 2012 at 9:31 PM, Mark Janssen
I just thought of something: isn't the obvious solution is for doctest to test the type of an expression's output and if it's in the set of unordered types {set, dict} to run sort() on it? Then the docstring author can just put the (sorted) output of what's expected....
It's the solution people seem to think of first, so it's definitely fairly obvious. It's not a big improvement, though. The original problem is that dict order shouldn't matter, but in doctest it does, making dicts unusable in the normal doctest style. Making it a specific dict order be the prominent one lets you use dicts in doctest, but you have to sort the dicts and rewrite the doctest to take that into account. And even so, some dicts cannot be represented this way. For example, the dict {1:2, "hello": world} cannot be sorted in Python 3, so it won't work in this scheme. The solution I used was to ast.literal_eval on both sides every time you compare output. This way you don't have to care about dict order, or whitespace, or any of these things for Python objects. (There is a flag for not caring about whitespace, but this would inappropriately collapse e.g. the whitespace inside string literals inside a dict. So imo this kills two birds with one stone). -- Devin
On Thu, Mar 1, 2012 at 8:49 PM, Devin Jeanpierre
On Thu, Mar 1, 2012 at 9:31 PM, Mark Janssen
wrote: I just thought of something: isn't the obvious solution is for doctest to test the type of an expression's output and if it's in the set of unordered types {set, dict} to run sort() on it? Then the docstring author can just put the (sorted) output of what's expected....
It's the solution people seem to think of first, so it's definitely fairly obvious. It's not a big improvement, though. The original problem is that dict order shouldn't matter, but in doctest it does, making dicts unusable in the normal doctest style. Making it a specific dict order be the prominent one lets you use dicts in doctest, but you have to sort the dicts and rewrite the doctest to take that into account.
Personally I never copy from the interactive prompt, but instead write my doctest and, if I'm in the mood to copy and paste, copy from the failed doctest error message (which is nicely indented just like my tests). So it would work fine if things were sorted.
And even so, some dicts cannot be represented this way. For example, the dict {1:2, "hello": world} cannot be sorted in Python 3, so it won't work in this scheme.
You could always use a heuristic sorting, e.g., sorted((str(key), key) for key in dict) To make this work you have to write a repr() replacement that is somewhat sophisticated. Though it still wouldn't save you from: class Something: def __repr__(self): return '<Something attr=%r>' % (self.attr) where attr is a dict or some other object with an awkward repr. That's the part I'm unsure of. Of course no eval will help you there either. I don't know if there's any way to really replace repr's implementation everywhere; I'm guessing there isn't. Ian
On Thu, Mar 1, 2012 at 10:51 PM, Ian Bicking
Personally I never copy from the interactive prompt, but instead write my doctest and, if I'm in the mood to copy and paste, copy from the failed doctest error message (which is nicely indented just like my tests). So it would work fine if things were sorted.
I both copy from the interactive prompt and write myself by hand. Either way, I don't want to care about order where order doesn't matter. The less thought required to write a test or example, the easier it is to write a test or example, and therefore the more tests/examples people will write. Why should we ever care about the order of a dict when writing an example or test case?
where attr is a dict or some other object with an awkward repr. That's the part I'm unsure of. Of course no eval will help you there either. I don't know if there's any way to really replace repr's implementation everywhere; I'm guessing there isn't.
There isn't any (non-insane?) way to replace the repr that %r uses (which is not builtins.repr). -- Devin
On Thu, Mar 1, 2012 at 6:31 PM, Mark Janssen
I just thought of something: isn't the obvious solution is for doctest to test the type of an expression's output and if it's in the set of unordered types {set, dict} to run sort() on it? Then the docstring author can just put the (sorted) output of what's expected....
What if the output is a list of dicts? -- --Guido van Rossum (python.org/~guido)
On Thu, Mar 1, 2012 at 8:31 PM, Guido van Rossum
On Thu, Mar 1, 2012 at 6:31 PM, Mark Janssen
wrote: I just thought of something: isn't the obvious solution is for doctest to test the type of an expression's output and if it's in the set of unordered types {set, dict} to run sort() on it? Then the docstring author can just put the (sorted) output of what's expected....
What if the output is a list of dicts?
Right, thanks. Although I suppose in theory one could go deep -- take the deepcopy code and instead of an exact copy replace any unordered types with their sorted copies. mark.
Mark Janssen wrote:
I just thought of something: isn't the obvious solution is for doctest to test the type of an expression's output and if it's in the set of unordered types {set, dict} to run sort() on it? Then the docstring author can just put the (sorted) output of what's expected....
{set, dict} is not the set of unordered types. The set of unordered types is without bound: anyone can create their own unordered types. Even if you limit yourself to the builtins, you forgot frozenset. And then there are non-builtins in the standard library, like OrderedDict, other types like dict_proxy. Sorting the output of an OrderedDict is the wrong thing to do, because the order is significant. So doctest would need to not just recognise mappings and sets, and sort them, but *also* recognise mappings and sets which should *not* be sorted. Remember too, that by the time doctest.OutputChecker sees the output, it only sees it as a string. I don't know how much work it would take to introduce actual type-checks into doctest, but I expect it would be a lot. And one last problem for you to consider: what happens if the output is unsortable? Try this dict for size: { 2+1j: None, 2-1j: None, float('NAN'): None}
Perhaps I'm hashing a dead horse, but I really would like to see this added to the issue's tracker as a requested feature. I may code up the patch myself, but it helps my brain to have it "on the dev stack".
Feel free to add it. For what it's worth, I am a very strong -1 on any suggestion to give doctest "Do What I Mean" powers when it comes to unordered objects. But I would support a "Do What I Say" doctest directive, like NORMALIZE_WHITESPACE, ELLIPSIS, IGNORE_EXCEPTION_DETAIL, e.g. a directive that tells doctest to split both the expected and actual output strings on whitespace, then lexicographically sort them before comparing. This approach doesn't try to be too clever: it's a dumb, understandable test which should fit in nicely with the other tests in doctest.OutputChecker.check_output, perhaps something like this: if optionflags & IGNORE_WORD_ORDER: if sorted(got.split()) == sorted(want.split()): return True It won't solve every doctest ordering problem, but doctest has other heuristics which can be fooled too. It is nice and simple, it solves the first 90% of the problem, and it is under the control of the coder. If you feel like submitting a patch, feel free to use my idea. -- Steven
Steven D'Aprano wrote:
This approach doesn't try to be too clever: it's a dumb, understandable test which should fit in nicely with the other tests in doctest.OutputChecker.check_output, perhaps something like this:
if optionflags & IGNORE_WORD_ORDER: if sorted(got.split()) == sorted(want.split()): return True
Ah, buggarit, too simple. I neglected to take into account the delimiters. Getting this right is harder than I thought, particularly with nested sets/dicts. Still, I reckon a directive is the right approach. -- Steven
It seems that the problem with any solution based on interpreting repr
(especially when nothing in know about the object) is that there are just
too many exceptions. Another approach might be to allow for a custom
compare function to be defined on doctest. E.g., in the module to be
tested:
import doctest
def _compare(got, expected):
return (sorted(eval(got)) == sorted(eval(expected)) or
doctest.compare(got, expected))
doctest.usercompare = _compare
The compare function would only need to deal with the idiosyncrasies of
types actually used in doctests in that module.
I don't know how practical this idea is in terms of implementation - from
my brief look through the code I think it should be fairly easy to slot
this into OutputChecker, but it was a very brief look!
David
On Fri, Mar 2, 2012 at 7:36 AM, Steven D'Aprano
Steven D'Aprano wrote:
This approach doesn't try to be too clever: it's a dumb, understandable
test which should fit in nicely with the other tests in doctest.OutputChecker.check_**output, perhaps something like this:
if optionflags & IGNORE_WORD_ORDER: if sorted(got.split()) == sorted(want.split()): return True
Ah, buggarit, too simple. I neglected to take into account the delimiters.
Getting this right is harder than I thought, particularly with nested sets/dicts.
Still, I reckon a directive is the right approach.
-- Steven
______________________________**_________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/**mailman/listinfo/python-ideashttp://mail.python.org/mailman/listinfo/python-ideas
On Fri, Mar 2, 2012 at 7:36 AM, Steven D'Aprano
wrote: Still, I reckon a directive is the right approach.
Why? That's how I do it because I am/was paranoid about compatibility,
but surely fixing dicts is important enough that, done right, nobody
would object if the semantics of comparison change subtly to allow for
unordered container comparison in the "natural" way?
(Is replying to a quoted quote acceptable on mailing lists?)
On Fri, Mar 2, 2012 at 12:56 AM, David Townshend
It seems that the problem with any solution based on interpreting repr (especially when nothing in know about the object) is that there are just too many exceptions.Another approach might be to allow for a custom compare function to be defined on doctest. E.g., in the module to be tested:
The definition/use of an alternate comparison function needs to be inside the doctests. Two issues: Suppose we're running the doctests on module A, which defines a different compare function. Module B also defines a different comparison function, and is imported but not run as a doctest. Since both of them did a global set-attribute to set the comparison function, but B did it later, B wins and A's doctests are run under the rules for module B. Also, don't forget that doctests are quite often run in things that are _not_ python source files. In particular, tutorial-like documentation these days is frequently in the form of reStructuredText.
import doctest
def _compare(got, expected): return (sorted(eval(got)) == sorted(eval(expected)) or doctest.compare(got, expected))
doctest.usercompare = _compare
This function is wrong in the context of the above discussion. Why sort a dict or set? Worse, why sort a list or tuple?
The compare function would only need to deal with the idiosyncrasies of types actually used in doctests in that module.
Punting it to a user-defined function is nice for _really_ crazy situations, but dicts and sets are not idiosyncratic or in any way exceptional. doctest itself should handle them the way a naive user would expect. -- Devin
Le 02/03/2012 07:55, Devin Jeanpierre a écrit :
[...] dicts and sets are not idiosyncratic or in any way exceptional. doctest itself should handle them the way a naive user would expect.
This discussion seems to forget a core issue with doctest: The output lines can be *anything* that gets printed. eval-able reprs of Python objects are only a part of the possibilities. That’s why doctest cannot “just call sorted” on the output lines. Regards
On Fri, Mar 2, 2012 at 2:01 AM, Éric Araujo
Le 02/03/2012 07:55, Devin Jeanpierre a écrit :
[...] dicts and sets are not idiosyncratic or in any way exceptional. doctest itself should handle them the way a naive user would expect.
This discussion seems to forget a core issue with doctest: The output lines can be *anything* that gets printed. eval-able reprs of Python objects are only a part of the possibilities. That’s why doctest cannot “just call sorted” on the output lines.
Ah, well, it sort of can. You can treat eval-able reprs of python objects specially. But it gets messy. Simple treatments make the treatment of e.g. dicts irreparably inconsistent. in doctest 2:: This is totally fine >>> {-1:1, -2:1} # doctest: +LITERAL_EVAL {-1:1, -2:1} So is this (because doctest2 doesn't care that it was _printed_, just that it was output): >>> print({-1:1, -2:1}) # doctest: +LITERAL_EVAL {-1:1, -2:1} This is not; don't do it bad dog bad bad bad dog (doctest2 has no idea how to separate the repr from the printed text): >>> print(".", {-1:1, -2:1}) # doctest: +LITERAL_EVAL . {-1:1, -2:1} I think maybe this behaviour is a little surprising, or at least a little dumb. The solution I've had in mind is to only do object comparison if the thing in the interpreter window is an expression, rather than a statement. In such a scheme, only the first example would be safe. But aside from not being a very useful distinction anyway, this doesn't agree with the Python interpreter: displaying the last evaluated expression even happens inside a statement. ">>> a;" is totally fine and will display the repr of a. In fact: >>> for d in [{}, {-1:1}, {-2:1}, {-1:1, -2:1}]: d ... {} {-1: 1} {-2: 1} {-2: 1, -1: 1} I can't really think of a way that makes sense and works everywhere, except specifically marking up doctests with things like, "this is a repr'd string; compare by object equality", and "this is a printed string", and so on. That is no small change, but it's tempting. Of course Sphinx would need to be able to turn this into a viewable example. My only worry is that nobody would use it because it's dumb or something, and it's hard to make it not dumb. >>> for d in [{}, {-1:1}, {-2:1}, {-1:1, -2:1}]: # doctest: +schematic ... print("---"); d ... <output> <text>---</text> <eval>{}</eval> <text>---</text> <eval>{-1: 1}</eval> <text>---</text> <eval>{-2: 1}</eval> <text>---</text> <eval>{-2: 1, -1: 1}</eval> </output> (Except seriously.) -- Devin
On Fri, Mar 2, 2012 at 8:05 AM, Devin Jeanpierre
This is totally fine >>> {-1:1, -2:1} # doctest: +LITERAL_EVAL {-1:1, -2:1}
Doing an ast.literal_eval seems like a great feature (and maybe even a sensible default after a string comparison fails). Having a real eval as an explicit doctest option would also make doctest more powerful. Mike
On Mar 2, 2012 8:55 AM, "Devin Jeanpierre"
On Fri, Mar 2, 2012 at 7:36 AM, Steven D'Aprano
wrote:
Still, I reckon a directive is the right approach.
Why? That's how I do it because I am/was paranoid about compatibility, but surely fixing dicts is important enough that, done right, nobody would object if the semantics of comparison change subtly to allow for unordered container comparison in the "natural" way?
(Is replying to a quoted quote acceptable on mailing lists?)
On Fri, Mar 2, 2012 at 12:56 AM, David Townshend
wrote: It seems that the problem with any solution based on interpreting repr (especially when nothing in know about the object) is that there are just too many exceptions.Another approach might be to allow for a custom compare function to be defined on doctest. E.g., in the module to be tested:
The definition/use of an alternate comparison function needs to be inside the doctests. Two issues:
Suppose we're running the doctests on module A, which defines a different compare function. Module B also defines a different comparison function, and is imported but not run as a doctest. Since both of them did a global set-attribute to set the comparison function, but B did it later, B wins and A's doctests are run under the rules for module B.
That was just a quick example of another approach to the problem. Sure, there are some issues to work out, but I don't believe this is an insurmountable problem.
Also, don't forget that doctests are quite often run in things that are _not_ python source files. In particular, tutorial-like documentation these days is frequently in the form of reStructuredText.
Once again, I'm sure we could find a way around this. Perhaps it would also be acceptable to define the compare function inside the docstring, or in this case inside a rst comment.
import doctest
def _compare(got, expected): return (sorted(eval(got)) == sorted(eval(expected)) or doctest.compare(got, expected))
doctest.usercompare = _compare
This function is wrong in the context of the above discussion. Why sort a dict or set? Worse, why sort a list or tuple? Like I said, this is just a quick example. Obviously the function body could look very different.
The compare function would only need to deal with the idiosyncrasies of types actually used in doctests in that module.
Punting it to a user-defined function is nice for _really_ crazy situations, but dicts and sets are not idiosyncratic or in any way exceptional. doctest itself should handle them the way a naive user would expect.
So what about, say, a defaultdict or a WeakSet? What exactly would a naive user expect?
-- Devin
On Fri, Mar 2, 2012 at 6:07 AM, David Townshend
That was just a quick example of another approach to the problem. Sure, there are some issues to work out, but I don't believe this is an insurmountable problem.
Nor do I. I was attempting to offer constructive criticism on the basis that this is a serious suggestion, and deserves attention. Sorry that I gave the wrong impression.
Once again, I'm sure we could find a way around this. Perhaps it would also be acceptable to define the compare function inside the docstring, or in this case inside a rst comment.
I mentioned it earlier, but I think you missed it: I was actually thinking inside the doctest itself. Your system of global assignment works as-is if you do it inside the doctests themselves (except for threads, but who runs doctests in multiple threads? gah!)
So what about, say, a defaultdict or a WeakSet? What exactly would a naive user expect?
WeakSets shouldn't be tested like this, their contents are nondeterministic. Any expectations can be broken by unfortunate race conditions, and there is no way around this. Sometimes users might expect the impossible, but what they expect is irrelevant in such a case. So forget WeakSets. I think that a naive user would expect, w.r.t. all these things, that he copy-pasted a shell session, it would "just work". Where we can fix doctest to align with such lofty expectations, at least in the common cases -- such as with dicts and defaultdicts and so on -- is it really so bad? Or, for a different argument -- surely, if the natural way to write an example in a tutorial is to show a dict as a return value, then we should be able to test that with minimal fuss? Doctest is supposed to serve the documentation, not the other way around; and the harder things are, the higher the barrier to entry is, and the fewer people actually do it. Because testing is important, it's important that testing be as easy as reasonable and possible. -- Devin
I mentioned it earlier, but I think you missed it: I was actually thinking inside the doctest itself. Your system of global assignment works as-is if you do it inside the doctests themselves (except for threads, but who runs doctests in multiple threads? gah!)
My only concern with that is muddying the documentation with long instructions on how to run the doctests. I'm not saying its not the best way though, because I do think the global assignments are a problem which would need to be resolved. Another way, though, would be to store compare functions by the module they were defined in and run assign them to the tests accordingly. This might end up quite complicated though. I think that a naive user would expect, w.r.t. all these things, that
he copy-pasted a shell session, it would "just work". Where we can fix doctest to align with such lofty expectations, at least in the common cases -- such as with dicts and defaultdicts and so on -- is it really so bad?
This would be convenient, but, to play devil's advocate, how about testing this class? class NotADict: def repr(self): return '{'a': 1} Bearing in mind that the purpose of doctests is documentation rather than testing, maybe we're trying to do too much. The problem is that {1, 2, 3} doesn't match {2, 1, 3}, not that {1, 2, 3} might match something else that looks like {1, 2, 3}. My point is that maybe its better to err on the side of bad tests passing rather than good tests failing. After all, they should ideally all be repeated in a more controlled unit test environment anyway.
Devin Jeanpierre wrote:
On Fri, Mar 2, 2012 at 7:36 AM, Steven D'Aprano
wrote: Still, I reckon a directive is the right approach.
Why? That's how I do it because I am/was paranoid about compatibility, but surely fixing dicts is important enough that, done right, nobody would object if the semantics of comparison change subtly to allow for unordered container comparison in the "natural" way?
I would. doctest doesn't compare dicts. If you want to compare dicts, write your doctest like this:
d = make_dict(...) d == {'a': 42, 'b': 23, 'c': None} True
The dicts will be compared directly, and doctest need only care that the result looks like "True". doctest compares *strings*. Under no circumstances do I want the default doctest comparison to try to be "clever" by guessing when I want strings to match using string equality and when I want strings to match using something else. doctest should Do What I Say, and not try to Do What It Guesses I Mean. [...]
Punting it to a user-defined function is nice for _really_ crazy situations, but dicts and sets are not idiosyncratic or in any way exceptional. doctest itself should handle them the way a naive user would expect.
No it shouldn't. doctest should handle them the way they actually are. A naive user might expect that {'None': 1} == {None: 1}. A naive user might expect that {2: 'spam'} == {2: 'SPAM'}. The problem with trying to satisfy naive users is that their expectations are often wrong or poorly thought-out. You, the author of the package being tested, is the best person to judge what your tests should be, not some arbitrary naive user who may know nothing about Python and even less about your package. -- Steven
On Fri, Mar 2, 2012 at 9:57 AM, Steven D'Aprano
doctest doesn't compare dicts. If you want to compare dicts, write your doctest like this:
d = make_dict(...) d == {'a': 42, 'b': 23, 'c': None} True
Of course doctest doesn't compare dicts; if it did, nobody would be objecting to its lack of unordered dict comparison. That aside, your example is not how people do it in the shell, why should I do it that way in my documentation? Just because doctest makes me? Pff. My preferred solution is to change doctest to compare dicts. :)
doctest compares *strings*. Under no circumstances do I want the default doctest comparison to try to be "clever" by guessing when I want strings to match using string equality and when I want strings to match using something else.
You keep saying what doctest does now, as if that should affect what it does in the future. :/ By the way, doctest doesn't do that now. ;) With regards to exception handling, doctest doesn't compare traceback strings to traceback strings, it compares an exception object to a (badly) parsed traceback. doctest isn't string comparison everywhere, just most places. (Of course, it does the comparison by doing a string comparison on the exception message.) As it happens, as a result, doctest exceptions are very hard to screw up (except for SyntaxErrors). The biggest benefit is that you can copy-paste a traceback, and doctest doesn't care when the stack frames differ in details (like line numbers, for example). Or you can use "..." without enabling ELLIPSIS ;) Not to mention that it lets you use two different forms of exception header that come up in different versions of Python, and it still works in other versions of Python without problems. So many benefits from doctest not trying to be a strict "do what I say" string comparison! :p
Punting it to a user-defined function is nice for _really_ crazy situations, but dicts and sets are not idiosyncratic or in any way exceptional. doctest itself should handle them the way a naive user would expect.
No it shouldn't. doctest should handle them the way they actually are.
Yeah, that's what I think. Except I think that they "actually are" dicts, and you think they're strings. Your opinion doesn't make sense to me. They are only strings because that's what doctest turned the dicts into for convenience. There is no reason in particular that it has to ever turn them into strings at all -- the only thing making alternatives inconvenient is the syntax for specifying doctests, not the internal mechanisms of doctest itself. There's nothing holy about these string comparisons. They are only a means to an end. Also, could you give something more concrete about why you believe everything must be based on strings? I couldn't find any reasoning to that effect in your post. Also keep in mind that I'm not fond of literal_eval-ing _both_ sides, I'd much rather only the doctest be eval'd. (In case that affects your answer any.) -- Devin
Devin, You need to start writing real code rather than continue to tell us that the problems are minor and easily fixable, and the solutions are uncontroversial. To those who have tried and thought about it, the problems are *not* easy to solve , except for some superficial edge cases that you and other critics of doctest keep focusing on. And please don't propose that we change the behavior of dict or other data types itself, or add new APIs to objects just for the purpose of "fixing" doctest's issues. -- --Guido van Rossum (python.org/~guido)
On Fri, Mar 2, 2012 at 12:30 PM, Guido van Rossum
Devin,
You need to start writing real code rather than continue to tell us that the problems are minor and easily fixable, and the solutions are uncontroversial. To those who have tried and thought about it, the problems are *not* easy to solve , except for some superficial edge cases that you and other critics of doctest keep focusing on.
I already did write real code. In the context of this discussion, I implemented a +LITERAL_EVAL flag. Was there something else I was supposed to write, other than the solution I advocated? ;) https://bitbucket.org/devin.jeanpierre/doctest2/src/e084a682ccbc/doctest2/co...
And please don't propose that we change the behavior of dict or other data types itself, or add new APIs to objects just for the purpose of "fixing" doctest's issues.
I would never dream of it. That's pretty obscene. -- Devin
On Fri, Mar 2, 2012 at 9:59 AM, Devin Jeanpierre
On Fri, Mar 2, 2012 at 12:30 PM, Guido van Rossum
wrote: Devin,
You need to start writing real code rather than continue to tell us that the problems are minor and easily fixable, and the solutions are uncontroversial. To those who have tried and thought about it, the problems are *not* easy to solve , except for some superficial edge cases that you and other critics of doctest keep focusing on.
I already did write real code. In the context of this discussion, I implemented a +LITERAL_EVAL flag.
Was there something else I was supposed to write, other than the solution I advocated? ;)
https://bitbucket.org/devin.jeanpierre/doctest2/src/e084a682ccbc/doctest2/co...
It's not a solution. It's a hack that only works in the simplest cases
-- it requires the output to look like a Python expression (that can
be evaluated in a limited environment). What if the output were
something like
And please don't propose that we change the behavior of dict or other data types itself, or add new APIs to objects just for the purpose of "fixing" doctest's issues.
I would never dream of it. That's pretty obscene.
Good. I wasn't sure what you meant when you used the phrase "fix dict" -- I presume that was shorthand for "fix the problem that doctest has with dict". -- --Guido van Rossum (python.org/~guido)
On Fri, Mar 2, 2012 at 1:14 PM, Guido van Rossum
https://bitbucket.org/devin.jeanpierre/doctest2/src/e084a682ccbc/doctest2/co...
It's not a solution. It's a hack that only works in the simplest cases
The simplest cases are also the most common, IME. But yes, I'd like to expand it to work in a few more cases and be less insane (I'd rather not interpret printed output as Python code).
-- it requires the output to look like a Python expression (that can be evaluated in a limited environment). What if the output were something like
??? There's a dict in there but the whole thing is not parseable.
Yeah. I don't know of any solution to that either. Even if pure-python code generates that repr, it isn't possible to even replace the repr with a sorted dict-repr in any sane way, because %r doesn't defer to repr. It's just an intractible case. Nothing at all mentioned in this thread would work there. And I never called it easy, by the way. -- Devin
Can I ask a possibly silly question?
As I understand it, doctest takes a small snippet of code, runs it,
and compares the resulting string with a string in the document.
This thread seems to be centered around making comparisons of results
with indeterminate ordering (dicts being the prime example) work
properly. In fact, one proposal was to have doctest call sorted on the
output to make sure it's right, which was shot down because that's not
always the correct thing to do.
So the question is - why isn't dealing with this the responsibility of
the test writer? Yeah, it's not quite the spirit of documentation to
turn a dictionary into a sorted list in the output, but neither is
littering the documentation with +LITERAL_EVAL and the like.
On Fri, Mar 2, 2012 at 1:27 PM, Mike Meyer
This thread seems to be centered around making comparisons of results with indeterminate ordering (dicts being the prime example) work properly. In fact, one proposal was to have doctest call sorted on the output to make sure it's right, which was shot down because that's not always the correct thing to do.
Aahhh, that was me, and I didn't mean to shoot it down. The right modification would have been to typecheck for dict/set before you sort, and then format it like a string, IIRC. But at the time the function just returned a sorted object. In principle you can absolutely sort dict literals etc., but I don't think it's any easier to implement than just parsing them into dict objects and doing a direct dict comparison, so that's why I object to it. In addition, it's harder for the test writer, who now has to pay attention to ordering.
So the question is - why isn't dealing with this the responsibility of the test writer? Yeah, it's not quite the spirit of documentation to turn a dictionary into a sorted list in the output, but neither is littering the documentation with +LITERAL_EVAL and the like.
The thing about +LITERAL_EVAL and the other flags is that modern doctest-displaying tools like Sphinx hide the comments, so that you just see what looks like a regular interpreter session, without any # doctest: directives. Because of this, it's in principle possible to have "natural looking" things, in some cases. But yes, the status quo is that, somehow, you have to handle this explicitly yourself. -- Devin
On Fri, Mar 2, 2012 at 1:27 PM, Mike Meyer
So the question is - why isn't dealing with this the responsibility of the test writer? Yeah, it's not quite the spirit of documentation to turn a dictionary into a sorted list in the output, but neither is littering the documentation with +LITERAL_EVAL and the like.
Currently the test-writer is not empowered to exercise this responsibility most effectively. Earlier, an example was presented that one should write
d = make_dict(...) d == {'a': 42, 'b': 23, 'c': None} True
rather than the implied
d = make_dict(...) {'a': 42, 'b': 23, 'c': None}
Using the former is certainly necessary today, but it's far from ideal. The documentor is stuck either writing the documentation clearly (the latter case) or testably (the former). The hope is to find a way to let people more easily write documentation that is both clear and testable. One thing that covers some--but far from all--cases is to ast.literal_eval or eval the output, since very often that output line is a repr of a Python expression. To be reasonable, this might require a directive in both cases and certainly requires one in the latter case. Though there are still tons of situations this cannot cover, it would allow people writing documentation to avoid ugly constructs like the first code snippet in a not-tiny set of cases. As for littering your code with "+doctest BLAH_BLAH", I don't think this is all that harmful. It allows the documentation writer get features she wants and will not display to the user in the processed documentation. There are already directives like this today and though ugly, they are conventional. Mike
On Mar 02, 2012, at 01:27 PM, Mike Meyer wrote:
So the question is - why isn't dealing with this the responsibility of the test writer? Yeah, it's not quite the spirit of documentation to turn a dictionary into a sorted list in the output, but neither is littering the documentation with +LITERAL_EVAL and the like.
Yeah, it basically is. My contention, based on years of experience (though of course YMMV) is that the best way to solve this is to write better documentation. I personally don't find bare dict prints to be very useful or readable, but a nice little sort/loop/print works fine. -Barry
On Fri, Mar 2, 2012 at 11:14 AM, Guido van Rossum
On Fri, Mar 2, 2012 at 9:59 AM, Devin Jeanpierre
wrote: On Fri, Mar 2, 2012 at 12:30 PM, Guido van Rossum
wrote:> Was there something else I was supposed to write, other than the solution I advocated? ;) https://bitbucket.org/devin.jeanpierre/doctest2/src/e084a682ccbc/doctest2/co...
It's not a solution. It's a hack that only works in the simplest cases
With all respect to Guido, who has mentioned probably the best solution so far (using sys.displayhook()) , on Devin's behalf, I must say that for those of us dedicated to TDD using doctest, we tend to be writing code in tandem to an ideal that doctest engenders; generally speaking: everything being written is so fine grained, that the problems that most people are speaking of, never arise. E. g., the tests are small and uncomplicated because they are written right to the core of the source, there is semi-rigid protocol for __repr__ in line with being eval'able, and code is broken apart so that when the tests gets too complicated, it's a sign that the code is not modular or fine-grained enough. Just as the way OOP eventually evolved into the ideal of the "abstract base class" as a way to match the notion of physical objects with programmatic ones. Python + doctest approach a different apex, catching two birds with one stone: documentation + TDD. Just my small input... mark
On Feb 27, 2012, at 08:35 PM, Michael Foord wrote:
The problem of being dependent on order of unorderable types (actually very difficult to solve).
Actually, not so much, only because IME, I find that I rarely want to just dump the repr of such objects. That's usually going to be hard to read even if the output were sorted. Instead, I very often iterate over the items (in sorted order of course), and use ellipses to ignore the lines (i.e. items) I don't care about. In practice, I haven't found this one to be so bad.
Things like shared fixtures and mocking become *harder* (although by no means impossible) in a doctest environment.
Not if you use separate DocFileSuites.
Another thing I dislike is that it encourages a "test last" approach, as by far the easiest way of generating doctests is to copy and paste from the interactive interpreter. The alternative is lots of annoying typing of '>>>' and '...', and as you're editing text and not code IDE support tends to be worse (although this is a tooling issue and not a problem with doctest itself).
Actually, Emacs users should use rst-mode, which has no so bad support for separate file doctests. Of course, the mode is useful for reST documentation even if your documentation is untested <wink>. -Barry
Barry Warsaw
Actually, Emacs users should use rst-mode, which has no so bad support for separate file doctests. Of course, the mode is useful for reST documentation even if your documentation is untested <wink>.
Any idea where I should send bug reports for ‘rst-mode’? It's not clear to me who develops it. -- \ “Welchen Teil von ‘Gestalt’ verstehen Sie nicht? [What part of | `\ ‘gestalt’ don't you understand?]” —Karsten M. Self | _o__) | Ben Finney
On Feb 29, 2012, at 12:23 PM, Ben Finney wrote:
Barry Warsaw
writes: Actually, Emacs users should use rst-mode, which has no so bad support for separate file doctests. Of course, the mode is useful for reST documentation even if your documentation is untested <wink>.
Any idea where I should send bug reports for ‘rst-mode’? It's not clear to me who develops it.
From the head of the file that I have in my personal elisp:
;;; rst.el --- Mode for viewing and editing reStructuredText-documents.
;; Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
;; Free Software Foundation, Inc.
;; Maintainer: Stefan Merten
Ian Bicking
On Feb 17, 2012 4:12 PM, "Nick Coghlan"
wrote:
An interesting third party alternative that has been created recently is behave: http://crate.io/packages/behave/
This style of test is why it's so sad that doctest is ignored and unmaintained.
I don't see why you draw a connection. There doesn't, to me, seem any need to expand the capabilities of ‘doctest’: it does what it says on the tin, and does it well. Other tasks require other tools.
[the ‘behave’ library is] based on testing patterns developed by people who care to promote what they are doing, but I'm of the strong opinion that they are inferior to doctest.
I think the code-examples-in-documentation is a good thing to have and it's what ‘doctest’ excels at. I don't think distorting behaviour-driven specifications, of the kind ‘behave’ is designed to read, to fit the doctest model would be a good thing. Can you present an argument why you think it would? -- \ “Now Maggie, I’ll be watching you too, in case God is busy | `\ creating tornadoes or not existing.” —Homer, _The Simpsons_ | _o__) | Ben Finney
participants (12)
-
Barry Warsaw
-
Ben Finney
-
David Townshend
-
Devin Jeanpierre
-
Guido van Rossum
-
Ian Bicking
-
Mark Janssen
-
Michael Foord
-
Mike Graham
-
Mike Meyer
-
Steven D'Aprano
-
Éric Araujo