[Python-Dev] Re: Unicode in doctests

Fri Dec 3 18:02:52 CET 2004

Fredrik Lundh <fredrik <at> pythonware.com> writes:
> Bjorn Tillenius wrote:
> > There are some issues regarding the use of unicode in doctests. Consider
> > the following three tests.
> >
> >    >>> foo = u'föö'
> >    >>> foo
> >    u'f\xf6\xf6'
> >
> >    >>> foo
> >    u'f\u00f6\u00f6'
> >
> >    >>> foo
> >    u'föö'
> >
> > To me, those are identical.
> 
> really?  if a function is expected to print "öl", is it alright to print
> "\u00f6l" instead?  wouldn't you complain if your newspaper used
> Unicode escapes in headlines instead of Swedish characters?

No, I wouldn't like the newspaper to use Unicode escapes. For the same
reason, I don't want my documentation to contain Unicode escapes. That's
why I would like the latter test to pass.

But I understand, it tries to match the output of repr(foo), I guess I can
live with that. I can always do:

    >>> foo == u'föö'
    True

On the other hand, since there already are some flags to modify the matching
algorithm, one could argue for adding another flag... or at least provide
the possibility for the user to alter the matching himself. Although it's
not that important for me.

> > Is it supposed to be like this, or have I missed something? If I could
> > specify the encoding for DocFileSuite to use, I would at least be
> > partially happy.
> 
> repr() always generates the same output, no matter what encoding
> you use.  just use repr, and you're done.

What is important for me, though, is to be able to specify an encoding to
DocFileSuite. As you said, one doesn't want to read Unicode escapes. At the
moment none of the tests I've given as example will pass in a DocFileSuite
(given that the text file is encoded using UTF-8). I do find it a bit
strange that I can't just copy a doctest within a docstring to a text file.
I have to Unicode escape all non-ASCII characters, which produces ugly
documentation.

Regards,

Bjorn