[Doc-SIG] non-ascii docstrings
Edward Loper
edloper at gradient.cis.upenn.edu
Fri Mar 17 22:10:17 CET 2006
I've been working on epydoc, and the question has come up of how I
should treat non-unicode docstrings that contain non-ascii characters.
An example of such a file is "python2.4/encodings/string_escape.py",
whose module docstring contains an 'o' with an umlaut.
In particular, the question is whether I should assume that the
docstring is encoded with the encoding specified by the "-*- coding -*-"
directive at the top of the file.
The reason why we *wouldn't* use the encoding is that PEP 263 [1], which
defines the coding directive, says that it does *not* apply to
non-unicode string literals. In particular, PEP 263 says that the
entire file should be read & tokenized using the specified coding, but
once string objects are created, they should be reencoded back into
8-bit strings using the file encoding.
So the "correct" fix is for the author of the module to use unicode
literals instead of string literals for docstrings that contain
non-ascii characters. This has the advantage that if a user tries to
look at the docstring via introspection, it will be correct.
On the other hand, epydoc is often used by people other than the author
of a module, and requiring them to go through and replace all string
literal docstrings with unicode literals seems a bit unreasonable.
In a way, this is similar to the mistake I've seen many times of using
non-escaped backslashes inside docstrings. e.g.:
def wc(filename):
"""
Count the number of words in the given file. E.g.:
>>> wc("c:\test\new.txt")
100
"""
Which looks fine in the source file, but looks quite broken if you print
its __doc__:
>>> print wc.__doc__
Count the number of words in the given file. E.g.:
>>> wc("c: est
ew.txt")
100
(The right fix in that case is probably to use a raw-string.)
So the question is.. Should epydoc (and other tools like it) be
compliant with PEP 263 (and consistent with Python); or should they "do
what I mean, not what I say" and treat non-ascii docstrings as if they
were encoded using the module's encoding?
-Edward
http://www.python.org/doc/peps/pep-0263/
More information about the Doc-SIG
mailing list