
Johan Harjano ran into an interesting problem when trying to run the Django test suite under Python 3.1. Django has doctests of the form
a6.headline u'Default headline'
Even when converting the doctest with 2to3, the expected output is unmodified. However, in 3.x, the expected output will change (i.e. not produce an u"" prefix anymore). Now, it might be possible to reformulate the test case (e.g. use print() instead of relying on repr), however, this is undesirable as a) the test should continue to test in 2.x that the result object is a unicode string, and b) it makes the test less readable. I would like to find a solution where this gets automatically corrected, e.g. through 2to3, or through changes to doctest, or through changes of str.__repr__. Any proposal appreciated. Regards, Martin

On Mar 05, 2010, at 05:11 AM, Martin v. Löwis wrote:
Johan Harjano ran into an interesting problem when trying to run the Django test suite under Python 3.1.
Django has doctests of the form
a6.headline u'Default headline'
Even when converting the doctest with 2to3, the expected output is unmodified. However, in 3.x, the expected output will change (i.e. not produce an u"" prefix anymore).
For this reason, I always recommend using print, even though...
Now, it might be possible to reformulate the test case (e.g. use print() instead of relying on repr), however, this is undesirable as a) the test should continue to test in 2.x that the result object is a unicode string, and b) it makes the test less readable.
If you really want to test that it's a unicode, shouldn't you actually test its type? (I'm not sure what would happen with that under 2to3.) Besides, the type of the string is very rarely important, so I think the u-prefix and quotes is mostly just noise.
I would like to find a solution where this gets automatically corrected, e.g. through 2to3, or through changes to doctest, or through changes of str.__repr__.
I think Michael was also talking about changes to doctest that would automatically sort dictionaries and sets. Again, it's not hard to write doctests correctly, but it's surprisingly common to implicitly rely on sort order. I think the right place to change these is in doctest. -Barry

On Mar 4, 2010, at 11:30 PM, Barry Warsaw wrote:
If you really want to test that it's a unicode, shouldn't you actually test its type? (I'm not sure what would happen with that under 2to3.)
Presumably 2to3 will be smart enough to translate 'unicode' to 'str' and 'bytes' to... 'bytes'. Just don't use 'str' in 2.x and you should be okay :).

Glyph Lefkowitz wrote:
On Mar 4, 2010, at 11:30 PM, Barry Warsaw wrote:
If you really want to test that it's a unicode, shouldn't you actually test its type? (I'm not sure what would happen with that under 2to3.)
Presumably 2to3 will be smart enough to translate 'unicode' to 'str' and 'bytes' to... 'bytes'. Just don't use 'str' in 2.x and you should be okay :).
In code (including doctest input lines) 2to3 can make those translations, but I believe the doctest output lines are a different story (since they're just arbitrary strings as far as the translator is concerned). I would expect 2to3 to be able to translate the following tests correctly because the expected output stays the same even though the commands change:
print a6.headline 'Default headline' type(a6.headline) is type(u'') True
But I don't see a ready way to support doctests where the expected *output* changes between versions. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Le Thu, 4 Mar 2010 23:30:12 -0500, Barry Warsaw <barry@python.org> a écrit :
If you really want to test that it's a unicode, shouldn't you actually test its type? (I'm not sure what would happen with that under 2to3.) Besides, the type of the string is very rarely important, so I think the u-prefix and quotes is mostly just noise.
String type is actually very important, if you don't want your application/library to fail in the face of non-ASCII data. That's why we did all this thing in py3k, after all :) Regards Antoine.

On 05/03/2010 15:56, Antoine Pitrou wrote:
Le Thu, 4 Mar 2010 23:30:12 -0500, Barry Warsaw<barry@python.org> a écrit :
If you really want to test that it's a unicode, shouldn't you actually test its type? (I'm not sure what would happen with that under 2to3.) Besides, the type of the string is very rarely important, so I think the u-prefix and quotes is mostly just noise.
String type is actually very important, if you don't want your application/library to fail in the face of non-ASCII data.
That's why we did all this thing in py3k, after all :)
Well, I'd like to see a 'unicode agnostic' mode for different reasons - although given that it all changes in Python 3 and it is almost certainly too late to get changes to doctest before the 2.7 release the point is probably moot. IronPython has very different unicode behaviour to Python 2. IronPython is like Python 3 in this regard, all strings are unicode (str is unicode). This means that it rarely puts the u prefix on strings. So doctests that use string repr also fail on IronPython - the django doctest suite is a culprit in this regard too. Dictionary iteration order is also different in IronPython - so dict reprs also fail. For dict reprs it would be nice if doctest compared in an order agnostic way. (Internally reconstruct the dict as a dict of strings from the repr and then compare using dict equality.) I know it is easily *possible* to construct doctests that are robust against these differences, but the fact that the "obvious-thing" often fails in subtle ways on other platforms are *one* of the reasons I don't consider doctest to be a suitable tool for unit testing [1]. :-) Michael [1] It isn't the main reason and I realise that is entirely orthoganal anway... but doctest makes every line an assertion, and you have to jump through hoops if you don't want that behaviour. All besides the point. Doctest simply *rocks* for testing documentation examples - especially in conjunction with Sphinx.
Regards
Antoine.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
-- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.

On Mar 05, 2010, at 10:56 AM, Antoine Pitrou wrote:
String type is actually very important, if you don't want your application/library to fail in the face of non-ASCII data.
That's why we did all this thing in py3k, after all :)
That's not actually what I mean. I meant that in doctests, you probably don't need to be confronted by the string type every single time. It's just not that interesting for documentation, most of the time. I have no problem adding (unit)tests that ensure your outputs and values are of the expected type though! -Barry

On 3/4/2010 11:11 PM, "Martin v. Löwis" wrote:
Johan Harjano ran into an interesting problem when trying to run the Django test suite under Python 3.1.
Django has doctests of the form
a6.headline u'Default headline'
Even when converting the doctest with 2to3, the expected output is unmodified. However, in 3.x, the expected output will change (i.e. not produce an u"" prefix anymore).
Now, it might be possible to reformulate the test case (e.g. use print() instead of relying on repr), however, this is undesirable as a) the test should continue to test in 2.x that the result object is a unicode string, and b) it makes the test less readable.
I would like to find a solution where this gets automatically corrected, e.g. through 2to3, or through changes to doctest, or through changes of str.__repr__.
Any proposal appreciated.
What is the easiest thing that works? If 2to3 can fix string literals and fix code within doc strings, would it be difficult to fix expected strings within doc strings? On the otherhand, as Foord pointed out, the 'u' prefix is something of a CPythonism not required by the language def, so an 'ignore leading u on expected output' flag would be useful. But 'correcting'
a,b u'ah', u'hah'
is too much to expect. Doctest is both useful and fragile because it *almost* imitates human testing methods. If tests like the above were bunched, and my primary goal were to make Django tests work, and work now, without changing the style, I would be tempted to use something like the following: def u(s): #import as needed from a testutil module if type(s) is unicode: # str in 3.x print(s) else print('') #or any 'nevermatch' string
u(a6.headline) 'Default headline'
which both retains the type test and readability. This would be a project specific solution, but then, the other comments suggest that wanting to combine a type and value test this way seems to be someone project specific. Terry Jan Reedy

Am 05.03.2010 20:37, schrieb Terry Reedy:
On 3/4/2010 11:11 PM, "Martin v. Löwis" wrote:
Johan Harjano ran into an interesting problem when trying to run the Django test suite under Python 3.1.
Django has doctests of the form
a6.headline u'Default headline'
Even when converting the doctest with 2to3, the expected output is unmodified. However, in 3.x, the expected output will change (i.e. not produce an u"" prefix anymore).
Now, it might be possible to reformulate the test case (e.g. use print() instead of relying on repr), however, this is undesirable as a) the test should continue to test in 2.x that the result object is a unicode string, and b) it makes the test less readable.
I would like to find a solution where this gets automatically corrected, e.g. through 2to3, or through changes to doctest, or through changes of str.__repr__.
Any proposal appreciated.
What is the easiest thing that works?
If 2to3 can fix string literals and fix code within doc strings, would it be difficult to fix expected strings within doc strings?
Yes. Expected output from doctests can be anything. Doing conversions on it that are correct only for Python code is potentially producing many false positives. Heuristics need to be applied that can get very intricate.
On the otherhand, as Foord pointed out,
Prefect?
the 'u' prefix is something of a CPythonism not required by the language def, so an 'ignore leading u on expected output' flag would be useful. But 'correcting'
a,b u'ah', u'hah'
is too much to expect. Doctest is both useful and fragile because it *almost* imitates human testing methods.
Yes, it is documentation and tests at the same time :) Georg

On Thu, Mar 4, 2010 at 8:11 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Johan Harjano ran into an interesting problem when trying to run the Django test suite under Python 3.1.
Django has doctests of the form
a6.headline u'Default headline'
Even when converting the doctest with 2to3, the expected output is unmodified. However, in 3.x, the expected output will change (i.e. not produce an u"" prefix anymore).
Now, it might be possible to reformulate the test case (e.g. use print() instead of relying on repr), however, this is undesirable as a) the test should continue to test in 2.x that the result object is a unicode string, and b) it makes the test less readable.
I would like to find a solution where this gets automatically corrected, e.g. through 2to3, or through changes to doctest, or through changes of str.__repr__.
Any proposal appreciated.
How about a heuristic rule (which you have to explicitly select) that changes u'XXX' into 'XXX' inside triply-quoted strings given certain contexts, e.g. only at the start of the line, only if there is a nearby preceding line starting with '>>>'? Exactly what context is of the right strength will have to be determined experimentally; if there are a lot of tests outputting things like [u'...'] or {u'...': u'...'} the context may have to be made more liberal. Possibly \bu('.*'|".*") would do it? The issue shows (yet again) a general problem with doctests being overspecified -- the test shouldn't care that the output starts with 'u', it should only care that the value is unicode, but there's no easy way to express this in doctests. But since these doctests exist I suggest that the practical way forward is to stick with them rather than trying to reformulate all the tests. -- --Guido van Rossum (python.org/~guido)

The issue shows (yet again) a general problem with doctests being overspecified -- the test shouldn't care that the output starts with 'u', it should only care that the value is unicode, but there's no easy way to express this in doctests. But since these doctests exist I suggest that the practical way forward is to stick with them rather than trying to reformulate all the tests.
Ok, I think I would prefer that. One approach I considered was to override sys.displayhook, so that I can determine what parts of the output are "single" repr results. Of course, this wouldn't catch repr strings that were ultimately print()ed (not sure whether that restriction would affect Django). The other (more aggressive) approach is the heuristics you propose, which may end up with false negatives. doctest already has a wildcard matching flag, so it would grow another one. Regards, Martin

Martin v. Löwis wrote:
Johan Harjano ran into an interesting problem when trying to run the Django test suite under Python 3.1.
Django has doctests of the form
a6.headline u'Default headline'
Even when converting the doctest with 2to3, the expected output is unmodified. However, in 3.x, the expected output will change (i.e. not produce an u"" prefix anymore).
Now, it might be possible to reformulate the test case (e.g. use print() instead of relying on repr), however, this is undesirable as a) the test should continue to test in 2.x that the result object is a unicode string, and b) it makes the test less readable.
I would like to find a solution where this gets automatically corrected, e.g. through 2to3, or through changes to doctest, or through changes of str.__repr__.
Any proposal appreciated.
You can use a custom DocTestRunner that replaces sys.displayhook in its run() method and records the changed output. Something like the attached seems to do the trick. Regards, Ziga import sys import doctest import linecache import __builtin__ def greet(): """ The standard greeting, in unicode. >>> greet() u'Hello, world!' """ return u'Hello, world!' orig_displayhook = sys.displayhook def py3_displayhook(value): if not isinstance(value, unicode): return orig_displayhook(value) __builtin__._ = value s = repr(value) if s.startswith(("u'", 'u"')): s = s[1:] print >> sys.stdout, s class Runner(doctest.DocTestRunner): converted_files = {} def __init__(self, checker=None, verbose=None, optionflags=0): doctest.DocTestRunner.__init__(self, checker, False, optionflags) def run(self, test, compileflags=None, out=None, clear_globs=True): fn = test.filename if fn not in self.converted_files: self.converted_files[fn] = linecache.getlines(fn) sys.displayhook = py3_displayhook try: return doctest.DocTestRunner.run(self, test, compileflags, out, clear_globs) finally: sys.displayhook = orig_displayhook def report_failure(self, out, test, example, got): lines = [" " * example.indent + line for line in got.splitlines(True)] pos = test.lineno + example.lineno + 1 self.converted_files[test.filename][pos:pos+len(lines)] = lines def _test(): import __main__ finder = doctest.DocTestFinder() runner = Runner() for test in finder.find(__main__): runner.run(test) print "".join(runner.converted_files[__main__.__file__]) if __name__ == "__main__": _test()
participants (11)
-
"Martin v. Löwis"
-
Andrew Bennetts
-
Antoine Pitrou
-
Barry Warsaw
-
Georg Brandl
-
Glyph Lefkowitz
-
Guido van Rossum
-
Michael Foord
-
Nick Coghlan
-
Terry Reedy
-
Žiga Seilnacht