PEP 287: reStructuredText Standard Docstring Format

Here's a serious proposal, safe to post now that April Fool's is over. Please read & comment. -- David Goodger goodger@users.sourceforge.net Open-source projects: - Python Docstring Processing System: http://docstring.sourceforge.net - reStructuredText: http://structuredtext.sourceforge.net - The Go Tools Project: http://gotools.sourceforge.net PEP: 287 Title: reStructuredText Standard Docstring Format Version: $Revision: 1.3 $ Last-Modified: $Date: 2002/04/02 03:50:38 $ Author: goodger@users.sourceforge.net (David Goodger) Discussions-To: doc-sig@python.org Status: Draft Type: Informational Created: 25-Mar-2002 Post-History: 02-Apr-2002 Replaces: 216 Abstract When plaintext hasn't been expressive enough for inline documentation, Python programmers have sought out a format for docstrings. This PEP proposes that the reStructuredText markup [1]_ be adopted as the standard markup format for structured plaintext documentation in Python docstrings, and for PEPs and ancillary documents as well. reStructuredText is a rich and extensible yet easy-to-read, what-you-see-is-what-you-get plaintext markup syntax. Only the low-level syntax of docstrings is addressed here. This PEP is not concerned with docstring semantics or processing at all. Nor is it an attempt to deprecate pure plaintext docstrings, which are always going to be legitimate. The reStructuredText markup is an alternative for those who want more expressive docstrings. Benefits Programmers are by nature a lazy breed. We reuse code with functions, classes, modules, and subsystems. Through its docstring syntax, Python allows us to document our code from within. The "holy grail" of the Python Documentation Special Interest Group (Doc-SIG) [2]_ has been a markup syntax and toolset to allow auto-documentation, where the docstrings of Python systems can be extracted in context and processed into useful, high-quality documentation for multiple purposes. The proposed format (reStructuredText) is entirely readable in plaintext format, and many of the markup forms match common usage (e.g., ``*emphasis*``), so it reads quite naturally. Yet it is rich enough to produce complex documents, and extensible so that there are few limits. The reStructuredText parser is available now. The Docutils project is at the point where standalone reStructuredText documents can be converted to HTML; other output format writers will become available over time. Work is progressing on a Python source "Reader" which will implement auto-documentation. Authors of existing auto-documentation tools are encouraged to integrate the reStructuredText parser into their projects, or better yet, to join forces to produce a world-class toolset for the Python standard library. Tools will become available in the near future, which will allow programmers to generate HTML for online help, XML for multiple purposes, and perhaps eventually PDF/DocBook/LaTeX for printed documentation, essentially "for free" from the existing docstrings. The adoption of a standard will, at the very least, benefit docstring processing tools by preventing further "reinventing the wheel". Eventually PyDoc, the one existing standard auto-documentation tool, could have reStructuredText support added. In the interim it will have no problem with reStructuredText markup, since it treats all docstrings as plaintext. Goals These are the generally accepted goals for a docstring format, as discussed in the Doc-SIG: 1. It must be readable in source form by the casual observer. 2. It must be easy to type with any standard text editor. 3. It must not need to contain information which can be deduced from parsing the module. 4. It must contain sufficient information (structure) so it can be converted to any reasonable markup format. 5. It must be possible to write a module's entire documentation in docstrings, without feeling hampered by the markup language. reStructuredText meets and exceeds all of these goals, and sets its own goals as well, even more stringent. See "Features" below. The goals of this PEP are as follows: 1. To establish reStructuredText as a standard docstring format by attaining "accepted" status (Python community consensus; BDFL pronouncement). Once reStructuredText is a Python standard, effort can be focused on tools instead of arguing for a standard. Python needs a standard set of documentation tools. 2. To address any related concerns raised by the Python community. 3. To encourage community support. As long as multiple competing markups are out there, the development community remains fractured. Once a standard exists, people will start to use it, and momentum will inevitably gather. 4. To consolidate efforts from related auto-documentation projects. It is hoped that interested developers will join forces and work on a joint/merged/common implementation. 5. To adopt reStructuredText as the standard markup for PEPs. One or both of the following strategies may be applied: a) Keep the existing PEP section structure constructs (one-line section headers, indented body text). Subsections can either be forbidden or supported with underlined headers in the indented body text. b) Replace the PEP section structure constructs with the reStructuredText syntax. Section headers will require underlines, subsections will be supported out of the box, and body text need not be indented (except for block quotes). Support for RFC 2822 headers will be added to the reStructuredText parser (unambiguous given a specific context: the first contiguous block of a PEP document). It may be desired to concretely specify what over/underline styles are allowed for PEP section headers, for uniformity. 6. To adopt reStructuredText as the standard markup for README-type files and other standalone documents in the Python distribution. Rationale The lack of a standard syntax for docstrings has hampered the development of standard tools for extracting and converting docstrings into documentation in standard formats (e.g., HTML, DocBook, TeX). There have been a number of proposed markup formats and variations, and many tools tied to these proposals, but without a standard docstring format they have failed to gain a strong following and/or floundered half-finished. Throughout the existence of the Doc-SIG, consensus on a single standard docstring format has never been reached. A lightweight, implicit markup has been sought, for the following reasons (among others): 1. Docstrings written within Python code are available from within the interactive interpreter, and can be 'print'ed. Thus the use of plaintext for easy readability. 2. Programmers want to add structure to their docstrings, without sacrificing raw docstring readability. Unadorned plaintext cannot be transformed ('up-translated') into useful structured formats. 3. Explicit markup (like XML or TeX) is widely considered unreadable by the uninitiated. 4. Implicit markup is aesthetically compatible with the clean and minimalist Python syntax. Proposed alternatives have included: - XML [3]_, SGML [4]_, DocBook [5]_, HTML [6]_, XHTML [7]_ XML and SGML are explicit, well-formed meta-languages suitable for all kinds of documentation. XML is a variant of SGML. They are best used behind the scenes, because they are verbose, difficult to type, and too cluttered to read comfortably as source. DocBook, HTML, and XHTML are all applications of SGML and/or XML, and all share the same basic syntax and the same shortcomings. - TeX [8]_ TeX is similar to XML/SGML in that it's explicit, not very easy to write, and not easy for the uninitiated to read. - Perl POD [9]_ Most Perl modules are documented in a format called POD -- Plain Old Documentation. This is an easy-to-type, very low level format with strong integration with the Perl parser. Many tools exist to turn POD documentation into other formats: info, HTML and man pages, among others. However, the POD syntax takes after Perl itself in terms of readability. - JavaDoc [10]_ Special comments before Java classes and functions serve to document the code. A program to extract these, and turn them into HTML documentation is called javadoc, and is part of the standard Java distribution. However, the only output format that is supported is HTML, and JavaDoc has a very intimate relationship with HTML, using HTML tags for most markup. Thus it shares the readability problems of HTML. - Setext [11]_, StructuredText [12]_ Early on, variants of Setext (Structure Enhanced Text), including Zope Corp's StructuredText, were proposed for Python docstring formatting. Hereafter these variants will collectively be call 'STexts'. STexts have the advantage of being easy to read without special knowledge, and relatively easy to write. Although used by some (including in most existing Python auto-documentation tools), until now STexts have failed to become standard because: - STexts have been incomplete. Lacking "essential" constructs that people want to use in their docstrings, STexts are rendered less than ideal. Note that these "essential" constructs are not universal; everyone has their own requirements. - STexts have been sometimes surprising. Bits of text are unexpectedly interpreted as being marked up, leading to user frustration. - SText implementations have been buggy. - Most STexts have have had no formal specification except for the implementation itself. A buggy implementation meant a buggy spec, and vice-versa. - There has been no mechanism to get around the SText markup rules when a markup character is used in a non-markup context. Proponents of implicit STexts have vigorously opposed proposals for explicit markup (XML, HTML, TeX, POD, etc.), and the debates have continued off and on since 1996 or earlier. reStructuredText is a complete revision and reinterpretation of the SText idea, addressing all of the problems listed above. Features Rather than repeating or summarizing the extensive reStructuredText spec, please read the originals available from http://structuredtext.sourceforge.net/spec/ (.txt & .html files). Reading the documents in following order is recommended: - An Introduction to reStructuredText [13]_ - Problems With StructuredText [14]_ (optional for those who have used StructuredText; it explains many markup decisions made) - reStructuredText Markup Specification [15]_ - A Record of reStructuredText Syntax Alternatives [16]_ (explains markup decisions made independently of StructuredText) - reStructuredText Directives [17]_ There is also a "Quick reStructuredText" user reference [18]_. A summary of features addressing often-raised docstring markup concerns follows: - A markup escaping mechanism. Backslashes (``\``) are used to escape markup characters when needed for non-markup purposes. However, the inline markup recognition rules have been constructed in order to minimize the need for backslash-escapes. For example, although asterisks are used for *emphasis*, in non-markup contexts such as "*" or "(*)" or "x * y", the asterisks are not interpreted as markup and are left unchanged. For many non-markup uses of backslashes (e.g., describing regular expressions), inline literals or literal blocks are applicable; see the next item. - Markup to include Python source code and Python interactive sessions: inline literals, literal blocks, and doctest blocks. Inline literals use ``double-backquotes`` to indicate program I/O or code snippets. No markup interpretation (including backslash-escape [``\``] interpretation) is done within inline literals. Literal blocks (block-level literal text, such as code excerpts or ASCII graphics) are indented, and indicated with a double-colon ("::") at the end of the preceding paragraph (right here -->):: if literal_block: text = 'is left as-is' spaces_and_linebreaks = 'are preserved' markup_processing = None Doctest blocks begin with ">>> " and end with a blank line. Neither indentation nor literal block double-colons are required. For example:: Here's a doctest block: >>> print 'Python-specific usage examples; begun with ">>>"' Python-specific usage examples; begun with ">>>" >>> print '(cut and pasted from interactive sessions)' (cut and pasted from interactive sessions) - Markup that isolates a Python identifier: interpreted text. Text enclosed in single backquotes is recognized as "interpreted text", whose interpretation is application-dependent. In the context of a Python docstring, the default interpretation of interpreted text is as Python identifiers. The text will be marked up with a hyperlink connected to the documentation for the identifier given. Lookup rules are the same as in Python itself: LGB namespace lookups (local, global, builtin). The "role" of the interpreted text (identifying a class, module, function, etc.) is determined implicitly from the namespace lookup. For example:: class Keeper(Storer): """ Keep data fresher longer. Extend `Storer`. Class attribute `instances` keeps track of the number of `Keeper` objects instantiated. """ instances = 0 """How many `Keeper` objects are there?""" def __init__(self): """ Extend `Storer.__init__()` to keep track of instances. Keep count in `self.instances` and data in `self.data`. """ Storer.__init__(self) self.instances += 1 self.data = [] """Store data in a list, most recent last.""" def storedata(self, data): """ Extend `Storer.storedata()`; append new `data` to a list (in `self.data`). """ self.data = data Each piece of interpreted text is looked up according to the local namespace of the block containing its docstring. - Markup that isolates a Python identifier and specifies its type: interpreted text with roles. Although the Python source context reader is designed not to require explicit roles, they may be used. To classify identifiers explicitly, the role is given along with the identifier in either prefix or suffix form:: Use :method:`Keeper.storedata` to store the object's data in `Keeper.data`:instance_attribute:. The syntax chosen for roles is verbose, but necessarily so (if anyone has a better alternative, please post it to the Doc-SIG). The intention of the markup is that there should be little need to use explicit roles; their use is to be kept to an absolute minimum. - Markup for "tagged lists" or "label lists": field lists. Field lists represent a mapping from field name to field body. These are mostly used for extension syntax, such as "bibliographic field lists" (representing document metadata such as author, date, and version) and extension attributes for directives (see below). They may be used to implement docstring semantics, such as identifying parameters, exceptions raised, etc.; such usage is beyond the scope of this PEP. A modified RFC 2822 syntax is used, with a colon *before* as well as *after* the field name. Field bodies are more versatile as well; they may contain multiple field bodies (even nested field lists). For example:: :Date: 2002-03-22 :Version: 1 :Authors: - Me - Myself - I Standard RFC 2822 header syntax cannot be used for this construct because it is ambiguous. A word followed by a colon at the beginning of a line is common in written text. - Markup extensibility: directives and substitutions. Directives are used as an extension mechanism for reStructuredText, a way of adding support for new block-level constructs without adding new syntax. Directives for images, admonitions (note, caution, etc.), and tables of contents generation (among others) have been implemented. For example, here's how to place an image:: .. image:: mylogo.png Substitution definitions allow the power and flexibility of block-level directives to be shared by inline text. For example:: The |biohazard| symbol must be used on containers used to dispose of medical waste. .. |biohazard| image:: biohazard.png - Section structure markup. Section headers in reStructuredText use adornment via underlines (and possibly overlines) rather than indentation. For example:: This is a Section Title ======================= This is a Subsection Title -------------------------- This paragraph is in the subsection. This is Another Section Title ============================= This paragraph is in the second section. Questions & Answers Q1: Is reStructuredText rich enough? A1: Yes, it is for most people. If it lacks some construct that is required for a specific application, it can be added via the directive mechanism. If a common construct has been overlooked and a suitably readable syntax can be found, it can be added to the specification and parser. Q2: Is reStructuredText *too* rich? A2: For specific applications or individuals, perhaps. In general, no. Since the very beginning, whenever a markup syntax has been proposed on the Doc-SIG, someone has complained about the lack of support for some construct or other. The reply was often something like, "These are docstrings we're talking about, and docstrings shouldn't have complex markup." The problem is that a construct that seems superfluous to one person may be absolutely essential to another. reStructuredText takes the opposite approach: it provides a rich set of implicit markup constructs (plus a generic extension mechanism for explicit markup), allowing for all kinds of documents. If the set of constructs is too rich for a particular application, the unused constructs can either be removed from the parser (via application-specific overrides) or simply omitted by convention. Q3: Why not use indentation for section structure, like StructuredText does? Isn't it more "Pythonic"? A3: Guido van Rossum wrote the following in a 2001-06-13 Doc-SIG post: I still think that using indentation to indicate sectioning is wrong. If you look at how real books and other print publications are laid out, you'll notice that indentation is used frequently, but mostly at the intra-section level. Indentation can be used to offset lists, tables, quotations, examples, and the like. (The argument that docstrings are different because they are input for a text formatter is wrong: the whole point is that they are also readable without processing.) I reject the argument that using indentation is Pythonic: text is not code, and different traditions and conventions hold. People have been presenting text for readability for over 30 centuries. Let's not innovate needlessly. See "Section Structure via Indentation" in "Problems With StructuredText" [14]_ for further elaboration. Q4: Why use reStructuredText for PEPs? What's wrong with the existing standard? A4: The existing standard for PEPs is very limited in terms of general expressibility, and referencing is especially lacking for such a reference-rich document type. PEPs are currently converted into HTML, but the results (mostly monospaced text) are less than attractive, and most of the value-added potential of HTML is untapped. Making reStructuredText the standard markup for PEPs will enable much richer expression, including support for section structure, inline markup, graphics, and tables. In several PEPs there are ASCII graphics diagrams, which are all that plaintext documents can support. Since PEPs are made available in HTML form, the ability to include proper diagrams would be immediately useful. Current PEP practices allow for reference markers in the form "[1]" in the text, and the footnotes/references themselves are listed in a section toward the end of the document. There is currently no hyperlinking between the reference marker and the footnote/reference itself (it would be possible to add this to pep2html.py, but the "markup" as it stands is ambiguous and mistakes would be inevitable). A PEP with many references (such as this one ;-) requires a lot of flipping back and forth. When revising a PEP, often new references are added or unused references deleted. It is painful to renumber the references, since it has to be done in two places and can have a cascading effect (insert a single new reference 1, and every other reference has to be renumbered; always adding new references to the end is suboptimal). It is easy for references to go out of sync. PEPs use references for two purposes: simple URL references and footnotes. reStructuredText differentiates between the two. A PEP might contain references like this:: Abstract This PEP proposes adding frungible doodads [1] to the core. It extends PEP 9876 [2] via the BCA [3] mechanism. References and Footnotes [1] http://www.example.org/ [2] PEP 9876, Let's Hope We Never Get Here http://www.python.org/peps/pep-9876.html [3] "Bogus Complexity Addition" Reference 1 is a simple URL reference. Reference 2 is a footnote containing text and a URL. Reference 3 is a footnote containing text only. Rewritten using reStructuredText, this PEP could look like this:: Abstract ======== This PEP proposes adding `frungible doodads`_ to the core. It extends PEP 9876 [#pep9876]_ via the BCA [#]_ mechanism. .. _frungible doodads: http://www.example.org/ .. [#pep9876] `PEP 9876`__, Let's Hope We Never Get Here __ http://www.python.org/peps/pep-9876.html .. [#] "Bogus Complexity Addition" URLs and footnotes can be defined close to their references if desired, making them easier to read in the source text, and making the PEPs easier to revise. The "References and Footnotes" section can be auto-generated with a document tree transform. Footnotes from throughout the PEP would be gathered and displayed under a standard header. If URL references should likewise be written out explicitly (in citation form), another tree transform could be used. URL references can be named ("frungible doodads"), and can be referenced from multiple places in the document without additional definitions. When converted to HTML, references will be replaced with inline hyperlinks (HTML <A> tags). The two footnotes are automatically numbered, so they will always stay in sync. The first footnote also contains an internal reference name, "pep9876", so it's easier to see the connection between reference and footnote in the source text. Named footnotes can be referenced multiple times, maintaining consistent numbering. The "#pep9876" footnote could also be written in the form of a citation:: It extends PEP 9876 [PEP9876]_ ... .. [PEP9876] `PEP 9876`_, Let's Hope We Never Get Here Footnotes are numbered, whereas citations use text for their references. Q5: Wouldn't it be better to keep the docstring and PEP proposals separate? A5: The PEP markup proposal may be removed if it is deemed that there is no need for PEP markup, or it could be made into a separate PEP. If accepted, PEP 1, PEP Purpose and Guidelines [19]_, and PEP 9, Sample PEP Template [20]_ will be updated. It seems natural to adopt a single consistent markup standard for all uses of structured plaintext in Python, and to propose it all in one place. Q6: The existing pep2html.py script converts the existing PEP format to HTML. How will the new-format PEPs be converted to HTML? A6: One of the deliverables of the Docutils project [21]_ will be a new version of pep2html.py with integrated reStructuredText parsing. The Docutils project will support PEPs with a "PEP Reader" component, including all functionality currently in pep2html.py (auto-recognition of PEP & RFC references). Q7: Who's going to convert the existing PEPs to reStructuredText? A7: A call for volunteers will be put out to the Doc-SIG and greater Python communities. If insufficient volunteers are forthcoming, I (David Goodger) will convert the documents myself, perhaps with some level of automation. A transitional system whereby both old and new standards can coexist will be easy to implement (and I pledge to implement it if necessary). Q8: Why use reStructuredText for README and other ancillary files? A8: The reasoning given for PEPs in A4 above also applies to README and other ancillary files. By adopting a standard markup, these files can be converted to attractive cross-referenced HTML and put up on python.org. Developers of Python projects can also take advantage of this facility for their own documentation. Q9: Won't the superficial similarity to existing markup conventions cause problems, and result in people writing invalid markup (and not noticing, because the plaintext looks natural)? How forgiving is reStructuredText of "not quite right" markup? A9: There will be some mis-steps, as there would be when moving from one programming language to another. As with any language, proficiency grows with experience. Luckily, reStructuredText is a very little language indeed. As with any syntax, there is the possibility of syntax errors. It is expected that a user will run the processing system over their input and check the output for correctness. In a strict sense, the reStructuredText parser is very unforgiving (as it should be; "In the face of ambiguity, refuse the temptation to guess" [22]_ applies to parsing markup as well as computer languages). Here's a design goal from "An Introduction to reStructuredText" [13]_: 3. Unambiguous. The rules for markup must not be open for interpretation. For any given input, there should be one and only one possible output (including error output). While unforgiving, at the same time the parser does try to be helpful by producing useful diagnostic output ("system messages"). The parser reports problems, indicating their level of severity (from least to most: debug, info, warning, error, severe). The user or the client software can decide on reporting thresholds; they can ignore low-level problems or cause high-level problems to bring processing to an immediate halt. Problems are reported during the parse as well as included in the output, often with two-way links between the source of the problem and the system message explaining it. Q10: Will the docstrings in the Python standard library modules be converted to reStructuredText? A10: Over time, with the help of the developer community, many modules will be converted. Some modules may never be converted. A future toolset will have to allow for incompleteness. References & Footnotes [1] http://structuredtext.sourceforge.net/ [2] http://www.python.org/sigs/doc-sig/ [3] http://www.w3.org/XML/ [4] http://www.oasis-open.org/cover/general.html [5] http://docbook.org/tdg/en/html/docbook.html [6] http://www.w3.org/MarkUp/ [7] http://www.w3.org/MarkUp/#xhtml1 [8] http://www.tug.org/interest.html [9] http://www.perldoc.com/perl5.6/pod/perlpod.html [10] http://java.sun.com/j2se/javadoc/ [11] http://docutils.sourceforge.net/mirror/setext.html [12] http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage [13] An Introduction to reStructuredText http://structuredtext.sourceforge.net/spec/introduction.txt [14] Problems with StructuredText http://structuredtext.sourceforge.net/spec/problems.txt [15] reStructuredText Markup Specification http://structuredtext.sourceforge.net/spec/reStructuredText.txt [16] A Record of reStructuredText Syntax Alternatives http://structuredtext.sourceforge.net/spec/alternatives.txt [17] reStructuredText Directives http://structuredtext.sourceforge.net/spec/directives.txt [18] Quick reStructuredText http://structuredtext.sourceforge.net/docs/quickref.html [19] PEP 1, PEP Guidelines, Warsaw, Hylton http://www.python.org/peps/pep-0001.html [20] PEP 9, Sample PEP Template, Warsaw http://www.python.org/peps/pep-0009.html [21] http://docutils.sourceforge.net/ [22] From "The Zen of Python (by Tim Peters)", http://www.python.org/doc/Humor.html#zen [23] PEP 216, Docstring Format, Zadka http://www.python.org/peps/pep-0216.html Copyright This document has been placed in the public domain. Acknowledgements Some text is borrowed from PEP 216, Docstring Format [23]_, by Moshe Zadka. Special thanks to all members past & present of the Python Doc-SIG. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End:

Here's a serious proposal, safe to post now that April Fool's is over. Please read & comment.
Good PEP, David! What's the next step? Should the processing code be incorporated in the standard library? Should we start converting the standard library docs to reStructuredText? --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote: Guido van Rossum wrote:
Good PEP, David!
Thanks! I'll take the generally positive tone of your reply as a good sign.
What's the next step?
Mark the PEP as "Accepted"? But first, I'd like to address any issues that may be raised. I'd like to give people a few days at least to respond. Does a lack of responses mean that there *are* no issues? I came here for a good argument! Seriously though, just accepting the PEP would be great progress.
Should the processing code be incorporated in the standard library?
It's not ready for that yet. Here's a summary of the state of the code: - The project is split in two at present: the parser and everything else (the DPS proper). They are to be merged & renamed to "Docutils". This will remove some artificial complexity, reduce some redundancy especially with the test code, and make the whole thing much easier to install. - The parser is functionally complete for standalone documents, and works quite well (passes all 300 unit tests). However, - The code needs some serious refactoring in places. - The internal documentation needs to be completed. - The non-parser part (current DPS) is still in its infancy. It's currently only able to process standalone documents into simple HTML. - The docstring extraction & processing part of Docutils (what I call the "Python source reader" component) is nowhere near ready. Implementing the Python roles for "interpreted text" (links based on namespace context) will need a significant effort. But that has more to do with PEP 258. BTW, expect PEP revisions soon, Barry! - There's no support for PEP processing yet. Assuming that this part of the PEP is accepted, - The PEP strategy for section headers must first be decided (as-is plus reStructuredText in the indented body text, or replace with underline syntax and drop the indents, or allow both). - The parser needs support for PEP-specific constructs (RFC 2822 headers; recognize "PEP \d+" and "RFC[- ]?\d+" as links; Q&A). - pep2html.py will need some work. It would become a front-end to a "PEP reader" component. I think the Docutils code should continue to be developed separately from the stdlib, until it is worthy. It may or may not be ready in time for 2.3, depending on the usual factors: my time, getting more interested developers on board, etc. The PEP processing part could be installed before the full docstring processing part is ready; that should be doable for 2.3.
Should we start converting the standard library docs to reStructuredText?
I assume you mean "docstrings"? Oherwise Fred may take exception, at least until LaTeX support is in place. ;-) There's no rush to convert library docstrings. There won't be a benefit for a while, but there's no danger either. -- David Goodger goodger@users.sourceforge.net Open-source projects: - Python Docstring Processing System: http://docstring.sourceforge.net - reStructuredText: http://structuredtext.sourceforge.net - The Go Tools Project: http://gotools.sourceforge.net

I spoke too soon:
Does a lack of responses mean that there *are* no issues?
I came here for a good argument!
And I got one, on comp.lang.python, flamebait and all. BTW, Barry Warsaw wrote this comment for CVS revision 1.7 of python/nondist/peps/pep-0009.txt: "In David Goodger's PEP 287, he has a better Emacs turd section." I hope a new revision gets checked in soon. I don't want to be Google-associated with "a better Emacs turd". Drat, that's done it. Now my fate is sealed. -- David Goodger goodger@users.sourceforge.net Open-source projects: - Python Docstring Processing System: http://docstring.sourceforge.net - reStructuredText: http://structuredtext.sourceforge.net - The Go Tools Project: http://gotools.sourceforge.net

david wrote:
Does a lack of responses mean that there *are* no issues?
the first time, I got to the following complete nonsense: "XML and SGML /.../ are verbose, difficult to type, and too cluttered to read comfortably as source." and having written several books in SGML and XML without noticing any of those "widely known" problems, I decided that it wasn't meaningful to continue. ::: after a second attempt to read it, I got stuck in the Q&A section. I've never seen such an arrogant PEP before; the authors clearly have very little experience from the problem domain (not only writing and maintaining documentation with markup, but also what makes Python so incredibly usable), yet they want to want to force their new invention down everyone's throat (hey, we spent a lot of time desiging this, so of course you shall have to use it): want to contribute a PEP? sorry, you have to learn a new markup language. want to fix something in the README? sorry, you have to learn a new markup language. want to fix a module in the standard library? sorry, you have to learn a new markup language. it's easy. there are only a couple of 100ks of specifications to read and understand. (that's only slightly larger than the Ruby language reference, and we're convinced that you'd rather learn another markup language than another programming language, right?) (and while you're at it, get a new keyboard; we don't care much about people using non-US keyboards...) ::: -1. the world doesn't need another markup language. there is only one markup language that everyone knows, and it's called HTML. the javadoc folks got it right. this one's all wrong. </F>

On Wed, 3 Apr 2002, Fredrik Lundh wrote:
-1. the world doesn't need another markup language. there is only one markup language that everyone knows, and it's called HTML. the javadoc folks got it right. this one's all wrong.
I'm not against a new markup language, but i do feel that the specification of the language is just too big. What's with the 32 different section title adornment characters, the optional overline, and unspecified title styles (order "as encountered")? And that's just section titles. Do we really need five kinds of lists? How about the 15 different ways to number lists? The five ways to do hyperlink targets? And that's not all... I can appreciate the desire to come up with something flexible, but this goes too far for my taste. The current specification is about 10000 words; get it down to about 1000 and i might go for it. -- ?!ng

Ka-Ping Yee wrote: [Ping]
I'm not against a new markup language, but i do feel that the specification of the language is just too big. ... The current specification is about 10000 words; get it down to about 1000 and i might go for it.
I think this is the crux of the issue. Please realize that the project isn't finished yet. The verbose, comprehensive specification docs are there, because that's what was coded against, that's what was used as material for debate on the Doc-SIG and project mailing lists. Work has already begun on an introductory user document (thanks to Richard Jones; see http://structuredtext.sourceforge.net/docs/quickstart.txt). I apologize that this wasn't available before (and I didn't expect the Spanish Inquisition!).
What's with the 32 different section title adornment characters, the optional overline, and unspecified title styles (order "as encountered")? And that's just section titles. Do we really need five kinds of lists? How about the 15 different ways to number lists? The five ways to do hyperlink targets? And that's not all...
Different strokes for different folks. Based on observation of actual usage, there's a great variety of implicit markup out there. As long as supporting the variations does not introduce ambiguity, I don't see the problem. With any markup, few users make use of all the features.
I can appreciate the desire to come up with something flexible, but this goes too far for my taste.
The markup was developed with the philosophy of "better a rich set of choices than artificial limits". Which features should be omitted? If deemed too rich for Python docstrings (& PEPs etc.), it can be pared back. But since the full spec is out there, I doubt if anyone could come up with a generally acceptable subset. -- David Goodger goodger@users.sourceforge.net Open-source projects: - Python Docstring Processing System: http://docstring.sourceforge.net - reStructuredText: http://structuredtext.sourceforge.net - The Go Tools Project: http://gotools.sourceforge.net

Fredrik Lundh wrote:
the first time, I got to the following complete nonsense:
"XML and SGML /.../ are verbose, difficult to type, and too cluttered to read comfortably as source."
and having written several books in SGML and XML without noticing any of those "widely known" problems, I decided that it wasn't meaningful to continue.
Your ellipsis omits all the good things I said about XML/SGML. The rest of the quote is supposed to be read in the context of: 3. Explicit markup (like XML or TeX) is widely considered unreadable by the uninitiated. Where "by the uninitiated" is key. Having written books in SGML/XML, written the "sre" engine, and made many other wonderful contributions, you are obviously not among the uninitiated. I'll make that more explicit.
I've never seen such an arrogant PEP before
Is this intentional flamebait? Pot calling the kettle black? I will try to restrain myself. ;-)
the authors clearly have very little experience from the problem domain (not only writing and maintaining documentation with markup, but also what makes Python so incredibly usable),
You are mistaken. Personally, I have over six years of experience with SGML & XML, doing document analysis, writing DTDs, and writing document processing systems, for English, Japanese, Chinese, and Korean data. Four years of experience with Python. Member of the Doc-SIG for two years, but I've read over much of the six-year archive. I still have much to learn, no doubt. If you can point out any specific areas where the PEP is lacking with regards to the above, I'd be happy to do the research.
yet they want to want to force their new invention down everyone's throat (hey, we spent a lot of time desiging this, so of course you shall have to use it)
Perhaps you missed the second paragraph of the Abstract? [This PEP is not] an attempt to deprecate pure plaintext docstrings, which are always going to be legitimate. The reStructuredText markup is an alternative for those who want more expressive docstrings.
want to contribute a PEP? sorry, you have to learn a new markup language.
The current standard could easily coexist with the proposed. Whether or not to use the new markup could be left to PEP authors, or decreed. That's a policy decision.
want to fix something in the README? sorry, you have to learn a new markup language. want to fix a module in the standard library? sorry, you have to learn a new markup language. it's easy.
How about: "Want to add to Python's standard documentation? Sorry, you have to learn a new markup language: TeX." Same thing. But hacking on reStructuredText is much easier than hacking on TeX. In any case, there is no *requirement* to use reStructuredText. If the PEP is accepted, it becomes *a* standard for docstrings, but not *the only* standard. At minimum, plaintext docstrings will remain.
there are only a couple of 100ks of specifications to read and understand.
There's need for a short user introduction, true. I apologize that it wasn't already in place. The beginnings of such an introduction is at http://structuredtext.sourceforge.net/docs/quickstart.txt. This is the first time I've seen a proposal blasted for having *too much* documentation! ;-) How many Python programmers read the entire Language Reference, let alone the Library Reference? Few, I imagine. So what? Heck, had I been told I had to read O'Reilly's "Python Standard Library" cover to cover before I could write my first line of code, I probably never would have begun.
(and while you're at it, get a new keyboard; we don't care much about people using non-US keyboards...)
Can you be specific?
-1. the world doesn't need another markup language. there is only one markup language that everyone knows, and it's called HTML. the javadoc folks got it right. this one's all wrong.
You don't like it, fine. I don't see much substantive in your posts here or on comp.lang.python; paraphrased, you're saying "this sucks". Care to give any reasoning behind your conclusion? I must have stepped on some sensitive toes to warrant such a reaction! -- David Goodger goodger@users.sourceforge.net Open-source projects: - Python Docstring Processing System: http://docstring.sourceforge.net - reStructuredText: http://structuredtext.sourceforge.net - The Go Tools Project: http://gotools.sourceforge.net

David Goodger writes: [...]
Proposed alternatives have included:
- XML [3]_, SGML [4]_, DocBook [5]_, HTML [6]_, XHTML [7]_
XML and SGML are explicit, well-formed meta-languages suitable for all kinds of documentation. XML is a variant of SGML. They are best used behind the scenes, because they are verbose, difficult to type, and too cluttered to read comfortably as source. DocBook, HTML, and XHTML are all applications of SGML and/or XML, and all share the same basic syntax and the same shortcomings.
And how much of HTML is required for marking up documentation? JavaDoc doesn't appear to suffer from this problem. I've never heard this argument given by programmers who are writing documentation. I use HTML in my JavaDoc comments. I use HTML in my Doxygen comments (in C++). Why not use HTML in my Python comments? DocBook is much more verbose, as it concentrates almost exclusively on semantics, not display. It is overkill for this. XML and SGML are by themselves not an alternative: they are a means to the end. XML is an SGML application. DocBook is an XML and/or SGML application. HTML is an SGML application. XHTML is an XML application.
- JavaDoc [10]_
Special comments before Java classes and functions serve to document the code. A program to extract these, and turn them into HTML documentation is called javadoc, and is part of the standard Java distribution. However, the only output format that is supported is HTML, and JavaDoc has a very intimate relationship with HTML, using HTML tags for most markup.
[...] This is patently false: there are Doclets available that convert to a wide variety of formats. Sun provides a MIF doclet, and third parties have provided doclets for RTF, TexInfo, LaTeX, and DocBook. There is very little that cannot be marked up in HTML that cannot be converted to other formats in a straight forward way. I'll raise Doxygen as another example: the comments utilize HTML (though you can escape for specific processor features (e.g., TeX equations) if necessary) and documentation can be generated in HTML, RTF, PostScript, Hyperlinked PDF, compressed HTML, and Unix man page format. You also forget to mention TexInfo, one of the older and more widely used documentation format for programmer docs. I don't have a vote, but if I did, -1. -tree -- Tom Emerson Basis Technology Corp. Sr. Computational Linguist http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"

On Wed, Apr 03, 2002 at 11:14:00AM -0500, Tom Emerson wrote:
And how much of HTML is required for marking up documentation? JavaDoc
There's also IBTWSH, a DTD describing a small subset of HTML: <URL:http://home.ccil.org/~cowan/XML/ibtwsh6.dtd> It's intended for use in other DTDs that want slightly richer text than just plain text, but not all of HTML; for example, I use it in my quotation DTD. I keep meaning to write some sort of general IBTWSH processing code as a Python module. --amk (www.amk.ca) "Dorset Street -- the most evil in London, I'm told." -- Hinton, in FROM HELL #2

I've never participated in the doc-sig, and I haven't given a lot of thought to writing complete documentation in docstrings. Given my limited interest, I have always been puzzled by why Python has taken so long to come up with some simple conventions for structuring docstrings. The only other example I was familiar with was JavaDoc. I've never written an JavaDoc, but I've browsed lots of HTML generated by JavaDoc. The one time I looked at JavaDoc, it struck me as quite simple; I felt confident that I could write it given my limited knowledge of and interest in HTML. It also appeared that JavaDoc had a limited feature set, which also seemed like a strength: Let's not write fancy, formatted reports in docstrings. I don't have the same confidence in reStructuredText. It looks like I can mostly read it, but it seems hard to learn how to write. It seems burdensome to require module authors to learn Python and reStructuredText just to contribute to the std library. Jeremy

David, I just posted a squishy comment on PEP 287. I realized that I can be more specific: PEP 287 describes a general-purpose structured text format "allowing for all kinds of documents." It doesn't say much of anything about docstrings specifically: "This PEP is not concerned with docstring semantics or processing at all." I think this is backwards. The PEP proposes to solve a specific problem -- docstring formats -- without any discussion of the problem domain or its requirements. Moreover, I disagree with goal 5. I think it is a non-goal to write a module's entire documentation in docstrings. (Perhaps it should be a goal that the docstring format is *not* right enough to write a module's entire documentation <0.6 wink>.) I think there two missing goals that are essential for a PEP that will make people happy: The first goal should be to keep the markup as simple as possible. The second goal should be to be targetted specifically at the needs of people who write docstrings for the standard library. I think it would also be productive to see an example or two of how this new format would be used in the standard library. Take a module that already has some decent docstrings and re-write it in the new format. Then we can see what benefit results from the effort. And take a module without docstrings and write new ones. (Some candidates: string, random, and unittest have good docstrings. getopt and hmac aren't bad. weakref is relatively small and doesn't have any docstrings.) If the primary goal is to keep the markup simple, I think it's impossible to judge a candidate without knowing what markup is required for docstrings. I am uncomfortable with the PEP's argument that "The problem is that a construct that seems superfluous to one person may be absolutely essential to another." A good design for a docstring format makes some hard decisions about what actually is essential and what is bloat. To paraphrase Aristole, wisdom comes from choosing wisely in the particular situation. Jeremy

One more response. I just chatted with Guido, and he helped me see a different purpose for the PEP. It sounds like reStructuredText (reST) is intended for people who do want to write all the documentation in docstrings. If that's the goal, then it's fine if the doc-sig wants to settle on reST as the answer for those people. I wouldn't object to seeing this PEP approved as an informational PEP that described reST as an optional format for docstrings. (I'm assuming that there is consensus in the doc-sig that reST is the right solution.) As such, the PEP shouldn't be trying to convince people to use reST so much as it should describe the reST format. If the PEP is just going to be an advocacy document, there's not much point to a PEP. Or maybe the PEP could just say "reST is documented elsewhere. The doc-sig has agreed on this as the standard all-things-to-all-people format for the following reasons: ..." As an optional format, I think it would be helpful to explicitly note that it will not be used for the Python standard library. We've already got pretty good documentation for the library in LaTex, and I can't think of any reason to move all that text into the source code of the modules. Jeremy

Thanks Jeremy, for raising good points. Jeremy Hylton wrote in his first message:
I have always been puzzled by why Python has taken so long to come up with some simple conventions for structuring docstrings.
I think there are many factors. People have different requirements, and often insist that their minimal set is the best set (thus reStructuredText is more flexible). We've seen what else is out there, and we aren't satisfied. Python has shown us how readable code can be, and we strive for readable inline docs as well. Nobody has been as committed, driven, and nuts as me before.
The one time I looked at JavaDoc, it struck me as quite simple
Simple, yes, but ugly as sin. I wouldn't want JavaDoc in *my* modules. Many others concur.
It also appeared that JavaDoc had a limited feature set, which also seemed like a strength: Let's not write fancy, formatted reports in docstrings.
This is a common argument, but I think a flawed one. It assumes that rich syntax inevitably begets over-complex docs, and at the same time assumes that no docs ever need to use sophisticated features. Fine, don't write fancy docstrings. But what if you need a single little table? Or a definition list (like in string.py; see my message to Guido)? Eventually you'll need something more sophisticated, but the markup is limited. You're stuck, you get frustrated, throw the tool out and revert to plaintext with ad-hoc conventions. The Python equivalent would be to limit the feature set by eliminating, say, floats. Ints ought to be enough for anyone! Those floats just cause problems...
It looks like I can mostly read it, but it seems hard to learn how to write.
I think that's because I only gave the spec, without the primer. Now that we have a primer, what do you think?
It seems burdensome to require module authors to learn Python and reStructuredText just to contribute to the std library.
I'm not advocating that, never have. I'll make it clear in the PEP. Jeremy wrote in his second message:
If the primary goal is to keep the markup simple... ... A good design for a docstring format makes some hard decisions about what actually is essential and what is bloat.
The primary goal isn't just "simple", it's "readable, simple, rich, easy, and extensible plaintext" (among other things; see http://structuredtext.sf.net/spec/introduction.html#goals). These goals obviously conflict, and the design of the markup has been an exercise in compromise. A successful one, I think. I tried to make simple and common constructs simple. Rarer constructs are more verbose, but since they're rare, it's not such a big deal. Jeremy wrote in his third message:
I wouldn't object to seeing this PEP approved as an informational PEP that described reST as an optional format for docstrings.
That's all I ever intended; sorry if it seemed otherwise.
(I'm assuming that there is consensus in the doc-sig that reST is the right solution.)
More than ever before, I think. 100% consensus is rarely attained in any forum (this one included). -- David Goodger goodger@users.sourceforge.net Open-source projects: - Python Docstring Processing System: http://docstring.sourceforge.net - reStructuredText: http://structuredtext.sourceforge.net - The Go Tools Project: http://gotools.sourceforge.net

[Jeremy]
To paraphrase Aristole, wisdom comes from choosing wisely in the particular situation.
I thought that wisdom came from experience, and experience from lack of wisdom. (Or is that common sense? There's a lot of wisdom in common sense.) I believe that reStructuredText comes from a lot of experience: long discussions about requirements in the doc-sig, and experience with e.g. Zope's StructuredText (which definitely represents the "lack of experience" position IMO). I think that reStructuredText is a good format for marking up docstrings; it's probably as good as it gets given the requirements (a fairly elaborate feature set, yet more readable "in the raw" than HTML). But if you ask me "should we use this for the standard library" I think I'll have to say no. Python's library reference documentation is written using LaTeX, like it or not, and it's not going to change any time soon. (*If* and *when* it changes, it's probably going to be something XMLish. But since XML is so more verbose than LaTeX, I'm not sure there's much of a point to this, at least unless bitrot takes the LaTeX toolchain away from us.) Given this status quo, docstrings in the Python standard library should not try to duplicate the library reference documentation; instead, they should be no more than concise hints. For such docstrings, a markup language, even reStructuredText, is inappropriate. IOW, reStructuredText is not even an option for new standard library modules. I agree with Jeremy that the PEP needs to be clear and explicit about this. --Guido van Rossum (home page: http://www.python.org/~guido/)

"GvR" == Guido van Rossum <guido@python.org> writes:
GvR> [Jeremy]
To paraphrase Aristole, wisdom comes from choosing wisely in the particular situation.
GvR> I thought that wisdom came from experience, and experience from GvR> lack of wisdom. (Or is that common sense? There's a lot of GvR> wisdom in common sense.) Oh, yes, I remember that one, too :-). Don't think it was Aristotle, but the saying goes: Good decisions come from experience; experience comes from bad decisions. GvR> I believe that reStructuredText comes from a lot of experience: GvR> long discussions about requirements in the doc-sig, and GvR> experience with e.g. Zope's StructuredText (which definitely GvR> represents the "lack of experience" position IMO). I believe that reST is a good structured text design, and I think it would be good to use it, e.g., to replace Zope's StructuredText. Jeremy

Jeremy Hylton writes:
I believe that reST is a good structured text design, and I think it would be good to use it, e.g., to replace Zope's StructuredText.
Why does the community need a new structured text design? What is wrong with existing markup methodologies? The PEP didn't answer these with any cogent examples, IMHO. The only rationale that I saw was existing use: the people that already make use of it find it natural, and everyone else has to learn it. -- Tom Emerson Basis Technology Corp. Sr. Computational Linguist http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"

Why does the community need a new structured text design? What is wrong with existing markup methodologies? The PEP didn't answer these with any cogent examples, IMHO.
Maybe that's because it's obvious to anyone who has hung out on doc-sig long enough. Structured text is really a great idea for certain situations; reST is a much better implementation of the idea than any versions I've seen before. E.g. the ST code in Zope stinks (this is not a secret).
The only rationale that I saw was existing use: the people that already make use of it find it natural, and everyone else has to learn it.
Maybe David needs to write up the rationale better. But I can assure you there is one. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum writes:
Why does the community need a new structured text design? What is wrong with existing markup methodologies? The PEP didn't answer these with any cogent examples, IMHO.
Maybe that's because it's obvious to anyone who has hung out on doc-sig long enough.
So in order to understand a PEP one needs to subscribe to all applicable mailing lists?
Structured text is really a great idea for certain situations; reST is a much better implementation of the idea than any versions I've seen before. E.g. the ST code in Zope stinks (this is not a secret).
I have no idea what is better. I ju
The only rationale that I saw was existing use: the people that already make use of it find it natural, and everyone else has to learn it.
Maybe David needs to write up the rationale better. But I can assure you there is one.
Obviously so, else he wouldn't have written the PEP in the first place. But the rationale in the PEP didn't convince me that it was worth my time to blindly adopt a new markup scheme that I've never used to document classes in the Python libraries and application I'm writing merely to make use of the documentation tools that are provided. Instead the greater motivation is to adopt JavaDoc/Doxygen and write appropriate tools, since most developers already speak enough HTML to write the reference documentation. Of course there is a lot of current practice which has the momentum. Given how carefully rationalized the other PEPs are, there is no reason to not make 287 equally rationalized. Devil's-advocately-yours, tree -- Tom Emerson Basis Technology Corp. Sr. Computational Linguist http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"

Guido van Rossum wrote:
But if you ask me "should we use this for the standard library" I think I'll have to say no.
If the question is "Should we convert all library documentation and stuff it into docstrings in the source?", I would agree with you wholeheartedly. There are some people who would like it that way, but I'm not one of them. People seem to have gotten the impression that I'm advocating taking the library docs out of LaTeX and stuffing them into module docstrings, for later extraction & processing. That is *not* the case! I don't know *how* anyone got that idea! <ahem> <ahem> The docs are safe from me. ;-)
Given this status quo, docstrings in the Python standard library should not try to duplicate the library reference documentation; instead, they should be no more than concise hints.
I agree completely.
For such docstrings, a markup language, even reStructuredText, is inappropriate.
IOW, reStructuredText is not even an option for new standard library modules.
What about existing docstrings? There is plenty of informal markup in there already. For the standard library I would suggest that, once the tools are up to it (e.g., once there's reStructuredText support in pydoc), *existing* docstrings could be *minimally* converted to formalize the implicit markup that's *already there*. For example, here's the module docstring for the string.py module: """A collection of string operations (most are no longer used in Python 1.6). Warning: most of the code you see here isn't normally used nowadays. With Python 1.6, many of these functions are implemented as methods on the standard string object. They used to be implemented by a built-in module called strop, but strop is now obsolete itself. Public module variables: whitespace -- a string containing all characters considered whitespace lowercase -- a string containing all characters considered lowercase l uppercase -- a string containing all characters considered uppercase l letters -- a string containing all characters considered letters digits -- a string containing all characters considered decimal digits hexdigits -- a string containing all characters considered hexadecimal octdigits -- a string containing all characters considered octal digit punctuation -- a string containing all characters considered punctuati printable -- a string containing all characters considered printable """ (I wrapped the first two paragraphs and truncated the list so email wouldn't wreck it.) As it stands, this is almost valid reStructuredText (strictly speaking it is already valid, but the list would get wrapped and wouldn't be very useful). The list of variables needs a bit of work; it could be turned into a bullet list or a definition list. The variable identifiers themselves could be marked up as "interpreted text" (e.g. rendered in a different face, with links to each identifier's docstring if it exists). The warning could be left as-is, or spruced up. Here is the fully converted docstring: """A collection of string operations (most are no longer used in Python 1.6). .. Warning:: most of the code you see here isn't normally used nowadays. With Python 1.6, many of these functions are implemented as methods on the standard string object. They used to be implemented by a built-in module called strop, but strop is now obsolete itself. Public module variables: `whitespace` a string containing all characters considered whitespace `lowercase` a string containing all characters considered lowercase letters `uppercase` a string containing all characters considered uppercase letters `letters` a string containing all characters considered letters `digits` a string containing all characters considered decimal digits `hexdigits` a string containing all characters considered hexadecimal digits `octdigits` a string containing all characters considered octal digits `punctuation` a string containing all characters considered punctuation `printable` a string containing all characters considered printable """ The conversion is minimal (it could be even less), it's still perfectly readable, and the difference in the converted output is significant. Please take a look at the converted output (1 or 2) and compare to the output for vanilla pydoc (3). 1. http://structuredtext.sf.net/spec/string.html 2. http://structuredtext.sf.net/spec/string2.html (bullet list instead of definition list) 3. the first section of http://web.pydoc.org/2.2/string.html (Note that the HTML uses a CSS1 stylesheet, so a recent browser is required. A writer for HTML for older browsers is on the to-do list.) In any case, nothing needs to be done any time soon. What do you think?
I agree with Jeremy that the PEP needs to be clear and explicit about this.
Will do. -- David Goodger goodger@users.sourceforge.net Open-source projects: - Python Docstring Processing System: http://docstring.sourceforge.net - reStructuredText: http://structuredtext.sourceforge.net - The Go Tools Project: http://gotools.sourceforge.net

[I'm hoping David Goodger is reading this on python-dev. Replying to <user>@users.sf.net doesn't work from my home box. It's my fault but I have no idea how to fix it.]
IOW, reStructuredText is not even an option for new standard library modules.
What about existing docstrings? There is plenty of informal markup in there already.
Yeah, but they're not using any particular formal markup. You'd still have to do a massive, *massive* cleanup if you wanted reST to apply. And it's not clear what the advantage would be -- stdlib docstrings are mostly intended as hints, summarizing the info from the library reference, and as such are intended for presentation *without* additional processing. Minimal though reST is, it's still markup.
For the standard library I would suggest that, once the tools are up to it (e.g., once there's reStructuredText support in pydoc), *existing* docstrings could be *minimally* converted to formalize the implicit markup that's *already there*. For example, here's the module docstring for the string.py module:
"""A collection of string operations (most are no longer used in Python 1.6).
Warning: most of the code you see here isn't normally used nowadays. With Python 1.6, many of these functions are implemented as methods on the standard string object. They used to be implemented by a built-in module called strop, but strop is now obsolete itself.
Public module variables:
whitespace -- a string containing all characters considered whitespace lowercase -- a string containing all characters considered lowercase l uppercase -- a string containing all characters considered uppercase l letters -- a string containing all characters considered letters digits -- a string containing all characters considered decimal digits hexdigits -- a string containing all characters considered hexadecimal octdigits -- a string containing all characters considered octal digit punctuation -- a string containing all characters considered punctuati printable -- a string containing all characters considered printable
"""
(I wrapped the first two paragraphs and truncated the list so email wouldn't wreck it.)
As it stands, this is almost valid reStructuredText (strictly speaking it is already valid, but the list would get wrapped and wouldn't be very useful). The list of variables needs a bit of work; it could be turned into a bullet list or a definition list. The variable identifiers themselves could be marked up as "interpreted text" (e.g. rendered in a different face, with links to each identifier's docstring if it exists). The warning could be left as-is, or spruced up. Here is the fully converted docstring:
"""A collection of string operations (most are no longer used in Python 1.6).
.. Warning:: most of the code you see here isn't normally used nowadays. With Python 1.6, many of these functions are implemented as methods on the standard string object. They used to be implemented by a built-in module called strop, but strop is now obsolete itself.
Public module variables:
`whitespace` a string containing all characters considered whitespace `lowercase` a string containing all characters considered lowercase letters `uppercase` a string containing all characters considered uppercase letters `letters` a string containing all characters considered letters `digits` a string containing all characters considered decimal digits `hexdigits` a string containing all characters considered hexadecimal digits `octdigits` a string containing all characters considered octal digits `punctuation` a string containing all characters considered punctuation `printable` a string containing all characters considered printable
"""
The conversion is minimal (it could be even less), it's still perfectly readable, and the difference in the converted output is significant. Please take a look at the converted output (1 or 2) and compare to the output for vanilla pydoc (3).
1. http://structuredtext.sf.net/spec/string.html 2. http://structuredtext.sf.net/spec/string2.html (bullet list instead of definition list) 3. the first section of http://web.pydoc.org/2.2/string.html
(Note that the HTML uses a CSS1 stylesheet, so a recent browser is required. A writer for HTML for older browsers is on the to-do list.)
In any case, nothing needs to be done any time soon. What do you think?
Alas, I find both the input and the output of the reST-ized version worse than the original. The original takes up much less vertical space, which (given the goal of being a relatively terse hint rather than formal reference docs) counts for more than bullets, boxes and color. The situation would be different if the goal was to replace the reference docs, but since it isn't, I think the informal markup is just fine. --Guido van Rossum (home page: http://www.python.org/~guido/)

From: "Guido van Rossum" <guido@python.org>
Alas, I find both the input and the output of the reST-ized version worse than the original. The original takes up much less vertical space, which (given the goal of being a relatively terse hint rather than formal reference docs) counts for more than bullets, boxes and color. The situation would be different if the goal was to replace the reference docs, but since it isn't, I think the informal markup is just fine.
I have to agree with Guido, for the same reasons. I wonder if it's the input markup that's "wrong", or if it just needs a better backend for generating HTML from the markup. Incidentally, I'm really excited about reST. I've been looking for a tolerable markup for C++ comments, and reST looks like it might fit the bill. Other musings: A reST-based Wiki would give people an easy way to see how they like the format. Is it just me, or are docstrings less-convenient than comments? Any thought given to reST-processing a module's comments? -Dave

Is it just me, or are docstrings less-convenient than comments? Any thought given to reST-processing a module's comments?
Maybe it's just me but I would be more happy to optionally be able to write reST inside comments instead of docstrings and have it extracted from there. That means the comment would take the place of the secondary docstring. regards.

David Abrahams wrote:
Is it just me, or are docstrings less-convenient than comments?
How do you mean?
Any thought given to reST-processing a module's comments?
HappyDoc does this, in addition to docstrings. Docutils hasn't gotten that far yet, and before it does I intend to analyze HappyDoc thoroughly. -- David Goodger goodger@users.sourceforge.net Open-source projects: - Python Docstring Processing System: http://docstring.sourceforge.net - reStructuredText: http://structuredtext.sourceforge.net - The Go Tools Project: http://gotools.sourceforge.net

----- Original Message ----- From: "David Goodger" <goodger@users.sourceforge.net>
David Abrahams wrote:
Is it just me, or are docstrings less-convenient than comments?
How do you mean?
I don't know, it's hard to put into words, but I'll try: 1. 5 more characters just to get started. Probably a shift key too, if I'm going to be stylistically conformant with other work I've seen. ("""..."""). 2. The docstring separates the function signature from its body, which tends to obscure the code a bit. I prefer prefix documentation. 3. Weird indentation when the docstring spans multiple lines def foo(bar, baz): """Start of doc rest of doc and some more doc""" function_body() Documentation is really hard to start with, and every additional barrier to entry is unacceptable to me. Every time I write a doc string, I think of all 3 of the above things, which causes a little "cognitive dissonance", and makes it less likely that I'll write another. -Dave

David Abrahams wrote:
Is it just me, or are docstrings less-convenient than comments?
How do you mean?
I don't know, it's hard to put into words, but I'll try:
1. 5 more characters just to get started. Probably a shift key too, if I'm going to be stylistically conformant with other work I've seen. ("""...""").
If the docstring is more than a few lines long, the quotes become average out to less than "#" in comments. Easier to edit docstrings; no per-line prefix.
2. The docstring separates the function signature from its body, which tends to obscure the code a bit. I prefer prefix documentation.
I suppose that's valid. It comes down to what you're used to. You could put triple-quoted multi-line strings before the "def", but they wouldn't be accessible or particularly useful. Triple-quotes can be used as multi-line comments, at least for temporary ones.
3. Weird indentation when the docstring spans multiple lines
def foo(bar, baz): """Start of doc rest of doc and some more doc""" function_body()
The convention is to indent all lines of the docstring equally, and to put the closing triple-quotes on a line by themselves. I prefer to put the opening triple-quotes on a line by themselves too. It would end up looking like this:: def foo(bar, baz): """ Start of doc rest of doc and some more doc """ function_body()
Documentation is really hard to start with, and every additional barrier to entry is unacceptable to me. Every time I write a doc string, I think of all 3 of the above things, which causes a little "cognitive dissonance", and makes it less likely that I'll write another.
You can still write prefix comments. HappyDoc extracts them, but they're not accessible from the running program or from the interactive interpreter. Better in the long run (for you and others who use your code) to embrace the features Python gives you. Confusing otherwise. Nobody's forcing you though. -- David Goodger goodger@users.sourceforge.net Open-source projects: - Python Docstring Processing System: http://docstring.sourceforge.net - reStructuredText: http://structuredtext.sourceforge.net - The Go Tools Project: http://gotools.sourceforge.net

1. 5 more characters just to get started. Probably a shift key too, if I'm going to be stylistically conformant with other work I've seen. ("""...""").
Depends on how long your comment is going to be. For long comments, a on each line gets boring quickly, and not all editors know how to reformat such comment blocks right.
2. The docstring separates the function signature from its body, which tends to obscure the code a bit. I prefer prefix documentation.
You have a small point there.
3. Weird indentation when the docstring spans multiple lines
def foo(bar, baz): """Start of doc rest of doc and some more doc""" function_body()
Don't do this! There's an explicit rule that doc strings should be indented like the code containing them, and all docstring processors are supposed to compensate for this. Please write this, like everybody else does: def foo(bar, baz): """Start of doc rest of doc and some more doc""" function_body() Also note the blank line after the first line -- the first line is supposed to be a one-line summary of the function (for use in abbreviated help balloons, overviews, and so on).
Documentation is really hard to start with, and every additional barrier to entry is unacceptable to me. Every time I write a doc string, I think of all 3 of the above things, which causes a little "cognitive dissonance", and makes it less likely that I'll write another.
Time for you to start reading more Python code -- I've never heard this excuse before. --Guido van Rossum (home page: http://www.python.org/~guido/)

----- Original Message ----- From: "Guido van Rossum" <guido@python.org> To: "David Abrahams" <david.abrahams@rcn.com> Cc: <python-dev@python.org> Sent: Saturday, April 06, 2002 1:47 PM Subject: Re: [Python-Dev] PEP 287: reStructuredText Standard DocstringFormat
1. 5 more characters just to get started. Probably a shift key too, if I'm going to be stylistically conformant with other work I've seen. ("""...""").
Depends on how long your comment is going to be. For long comments, a on each line gets boring quickly, and not all editors know how to reformat such comment blocks right.
Ah, but /mine/ does <.002wink>. Anyway, I presume docstrings are supposed to be relatively short? Note that I am not comfortable just opening the docstring, because then the editor colors the rest of my program green until I add the closing quotes, so I have to type six quotes and three "move back"s just to get started. It takes nine boring lines of documentation to even approach the labor implied by opening comment characters. Any single-character-to-end-of-line syntax is going to make things easier.
2. The docstring separates the function signature from its body, which tends to obscure the code a bit. I prefer prefix documentation.
You have a small point there.
nay, it's bigger than a breadbox.
Don't do this! There's an explicit rule that doc strings should be indented like the code containing them, and all docstring processors are supposed to compensate for this. Please write this, like everybody else does:
def foo(bar, baz): """Start of doc
rest of doc and some more doc""" function_body()
I assume what I'm seeing above is due to the fact that you used the hated tab key to indent part of the docstring? That wasn't what you intended, was it? Also, isn't the closing quote supposed to occupy a separate line?
Also note the blank line after the first line -- the first line is supposed to be a one-line summary of the function (for use in abbreviated help balloons, overviews, and so on).
That's what I thought. Is there a guideline for how to write those, BTW?
Documentation is really hard to start with, and every additional barrier to entry is unacceptable to me. Every time I write a doc string, I think of all 3 of the above things, which causes a little "cognitive dissonance", and makes it less likely that I'll write another.
Time for you to start reading more Python code -- I've never heard this excuse before.
How is reading more Python code going to help? Not trying to be difficult, really. I'm willing to invest a little to train my stupid brain to stop complaining. Still wish I didn't have to, though. -Dave

From: Samuele Pedroni <pedroni@inf.ethz.ch>
If I channel correctly that tradition, Common Lisp has docstrings too, but long docstrings are considered bad style.
My point is not against docstrings, it is against long primary/secondary docstrings, and at least IMHO the attitude of Guido wrt to leaving things as they are for the std lib seems to rhyme with this. Here is an example from Common Lisp: A docstring for the GETHASH function CL-USER 13 > (documentation 'gethash 'function) "Finds the entry in Hash-Table whose key is Key and returns the associated value and T as multiple values, or returns Default and Nil if there is no such entry." the full fledged doc of it: <http://www.xanalys.com/software_tools/reference/HyperSpec/Body/f_gethas.htm#... thash> putting the markup for the second in the first would be an abuse. The primary/secondary splitting (not possible in CL), does not change this because *in the code*, they end up in the same place and count visually anyway as one *long* docstring. regards.

My point is not against docstrings, it is against long primary/secondary docstrings, and at least IMHO the attitude of Guido wrt to leaving things as they are for the std lib seems to rhyme with this.
Here is an example from Common Lisp:
A docstring for the GETHASH function
CL-USER 13 > (documentation 'gethash 'function) "Finds the entry in Hash-Table whose key is Key and returns the associated value and T as multiple values, or returns Default and Nil if there is no such entry."
the full fledged doc of it: <http://www.xanalys.com/software_tools/reference/HyperSpec/Body/f_gethas.htm#... thash>
Indeed. This is how I originally envisioned docstrings, and this is how I still like to see them: short hints that mean you don't have to look it up in the manual in most cases (if you have some common sense and experience and aren't looking for how hairs are split in edge cases).
putting the markup for the second in the first would be an abuse.
Agreed.
The primary/secondary splitting (not possible in CL), does not change this because *in the code*, they end up in the same place and count visually anyway as one *long* docstring.
But note that there are other points of view. I don't mind if some package author wants to keep all the docs together with the code and wants to stick it all in the docstrings, and run some tool that extracts the docs and formats them as a reference manual. That's just not how I want to manage the standard library docs. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido]
The primary/secondary splitting (not possible in CL), does not change this because *in the code*, they end up in the same place and count visually anyway as one *long* docstring.
But note that there are other points of view. I don't mind if some package author wants to keep all the docs together with the code and wants to stick it all in the docstrings, and run some tool that extracts the docs and formats them as a reference manual. That's just not how I want to manage the standard library docs.
Yup, but the final point is that easy to make happy also people that want to split the doc between the short informative docstring and a longish comment in front of the definition, and have the auto-doc extraction support this with reasonable options (extracting both things or just one). You can dislike this different approach, because the comment/docstring can be redundant or because, trying to avoid this, the reading vs. spatial orders of the two will not match. Personally I can see myself using this approach sometimes, it does not hurt my sensibility too much. It's very hard to come up with strong arguments wrt to these issues. working-for-a-peaceful-cohabitation-expressing-the-point- of-view-of-a-(maybe-1-person)-minority-ly y'rs, Samuele.

Guido van Rossum wrote:
[I'm hoping David Goodger is reading this on python-dev.
Yes, I do. I read it off the web. I know I can now subscribe without going through the initiation ceremony and learning the secret handshake, but old habits die hard.
What about existing docstrings? There is plenty of informal markup in there already.
Yeah, but they're not using any particular formal markup. You'd still have to do a massive, *massive* cleanup if you wanted reST to apply. And it's not clear what the advantage would be ... The situation would be different if the goal was to replace the reference docs, but since it isn't, I think the informal markup is just fine.
Fair enough. This is a long-term project. Plenty of time to change your mind later, when the advantages become clear. :-) -- David Goodger goodger@users.sourceforge.net Open-source projects: - Python Docstring Processing System: http://docstring.sourceforge.net - reStructuredText: http://structuredtext.sourceforge.net - The Go Tools Project: http://gotools.sourceforge.net

Tom Emerson wrote:
- JavaDoc [10]_
... However, the only output format that is supported is HTML
This is patently false: there are Doclets available that convert to a wide variety of formats.
I was unaware. I've modified the text to remove the false statement. Thanks for the heads-up. -- David Goodger goodger@users.sourceforge.net Open-source projects: - Python Docstring Processing System: http://docstring.sourceforge.net - reStructuredText: http://structuredtext.sourceforge.net - The Go Tools Project: http://gotools.sourceforge.net
participants (9)
-
Andrew Kuchling
-
David Abrahams
-
David Goodger
-
Fredrik Lundh
-
Guido van Rossum
-
Jeremy Hylton
-
Ka-Ping Yee
-
Samuele Pedroni
-
Tom Emerson