
There are some issues regarding the use of unicode in doctests. Consider the following three tests.
>>> foo = u'föö' >>> foo u'f\xf6\xf6'
>>> foo u'f\u00f6\u00f6'
>>> foo u'föö'
To me, those are identical. At least string comparison shows that u'f\xf6\xf6' == u'f\u00f6\u00f6' == u'föö'. Yet, only the first one of the tests passes, the other two fail. And that's if the tests are within a doc string, where I can specify the encoding used. If DocFileSuite is being used, there's no way of specify the encoding, thus all tests will fail.
Is it supposed to be like this, or have I missed something? If I could specify the encoding for DocFileSuite to use, I would at least be partially happy.
Regards,
Bjorn

Bjorn Tillenius wrote:
There are some issues regarding the use of unicode in doctests. Consider the following three tests.
foo = u'föö' foo
u'f\xf6\xf6'
foo
u'f\u00f6\u00f6'
foo
u'föö'
To me, those are identical.
really? if a function is expected to print "öl", is it alright to print "\u00f6l" instead? wouldn't you complain if your newspaper used Unicode escapes in headlines instead of Swedish characters?
Is it supposed to be like this, or have I missed something? If I could specify the encoding for DocFileSuite to use, I would at least be partially happy.
repr() always generates the same output, no matter what encoding you use. just use repr, and you're done.
</F>

Fredrik Lundh <fredrik <at> pythonware.com> writes:
Bjorn Tillenius wrote:
There are some issues regarding the use of unicode in doctests. Consider the following three tests.
foo = u'föö' foo
u'f\xf6\xf6'
foo
u'f\u00f6\u00f6'
foo
u'föö'
To me, those are identical.
really? if a function is expected to print "öl", is it alright to print "\u00f6l" instead? wouldn't you complain if your newspaper used Unicode escapes in headlines instead of Swedish characters?
No, I wouldn't like the newspaper to use Unicode escapes. For the same reason, I don't want my documentation to contain Unicode escapes. That's why I would like the latter test to pass.
But I understand, it tries to match the output of repr(foo), I guess I can live with that. I can always do:
>>> foo == u'föö' True
On the other hand, since there already are some flags to modify the matching algorithm, one could argue for adding another flag... or at least provide the possibility for the user to alter the matching himself. Although it's not that important for me.
Is it supposed to be like this, or have I missed something? If I could specify the encoding for DocFileSuite to use, I would at least be partially happy.
repr() always generates the same output, no matter what encoding you use. just use repr, and you're done.
What is important for me, though, is to be able to specify an encoding to DocFileSuite. As you said, one doesn't want to read Unicode escapes. At the moment none of the tests I've given as example will pass in a DocFileSuite (given that the text file is encoded using UTF-8). I do find it a bit strange that I can't just copy a doctest within a docstring to a text file. I have to Unicode escape all non-ASCII characters, which produces ugly documentation.
Regards,
Bjorn
participants (3)
-
Bjorn Tillenius
-
Bjorn Tillenius
-
Fredrik Lundh