PEP 258: DPS Generic Implementation Details

David Goodger dgoodger at bigfoot.com
Wed Jun 13 00:42:54 EDT 2001


I am posting this PEP to comp.lang.python for greatest community exposure.
Please direct replies to the Python Documentation SIG's mailing list:
mailto:doc-sig at python.org.

In addition to the master copy at http://python.sf.net/peps/pep-0258.txt
(HTML at http://python.sf.net/peps/pep-0258.html), a working copy is kept at
the project web site, http://docstring.sf.net/.

-- 
David Goodger    dgoodger at bigfoot.com    Open-source projects:
 - Python Docstring Processing System: http://docstring.sf.net
 - reStructuredText: http://structuredtext.sf.net
 - The Go Tools Project: http://gotools.sf.net


PEP: 258
Title: DPS Generic Implementation Details
Version: $Revision: 1.1 $
Last-Modified: $Date: 1935/06/06 05:54:49 $
Author: dgoodger at bigfoot.com (David Goodger)
Discussions-To: doc-sig at python.org
Status: Draft
Type: Standards Track
Created: 31-May-2001
Post-History:


Abstract

    This PEP documents generic implementation details for a Python
    Docstring Processing System (DPS).  The rationale and high-level
    concepts of the DPS are documented in PEP 256, "Docstring
    Processing System Framework" [1].


Specification

    Docstring Extraction Rules
    ==========================

    1. If the '__all__' variable is present in the module being
       documented, only identifiers listed in '__all__' are examined
       for docstrings.  In the absense of '__all__', all identifiers
       are examined, except those whose names are private (names begin
       with '_' but don't begin and end with '__').

    2. Docstrings are string literal expressions, and are recognized
       in the following places within Python modules:

       a) At the beginning of a module, class definition, or function
          definition, after any comments.  This is the standard for
          Python __doc__ attributes.

       b) Immediately following a simple assignment at the top level
          of a module, class definition, or __init__ method
          definition, after any comments.  See "Attribute Docstrings"
          below.

       c) Additional string literals found immediately after the
          docstrings in (a) and (b) will be recognized, extracted, and
          concatenated.  See "Additional Docstrings" below.

    3. Python modules must be parsed by the docstring processing
       system, not imported.  There are security reasons for not
       importing untrusted code.  Also, docstrings are to be
       recognized in places where the bytecode compiler ignores string
       literal expressions (2b and 2c above), meaning importing the
       module will lose these docstrings.  Of course, standard Python
       parsing tools such as the 'parser' library module should be
       used.

    Since attribute docstrings and additional docstrings are not
    recognized by the Python bytecode compiler, no namespace pollution
    or performance degradation will result from their use.  (The
    initial parsing of a module may take a slight performance hit.)

    Attribute Docstrings
    --------------------

    XXX A description of attribute docstrings would be appropriate in
    PEP 257 "Docstring Conventions".

    (This is a simplified version of PEP 224 [3] by Marc-Andre Lemberg.)

    A string literal immediately following an assignment statement is
    interpreted by the docstring extration machinery as the docstring
    of the target of the assignment statement, under the following
    conditions:

    1. The assignment must be in one of the following contexts:

       a) At the top level of a module (i.e., not inside a loop or
          conditional): a module attribute.

       b) At the top level of a class definition: a class attribute.

       c) At the top level of a class' '__init__' method definition:
          an instance attribute.

       Since each of the above contexts are at the top level (i.e.,
       just inside the outermost suite of a definition), it may be
       necessary to place dummy assignments for attributes assigned
       conditionally or in a loop.  Blank lines may be used after
       attribute docstrings to emphasize the connection between the
       assignment and the docstring.

    2. The assignment must be to a single target, not to a list or a
       tuple of targets.

    3. The form of the target:

       a) For contexts 1a and 1b above, the target must be a simple
          identifier (not a dotted identifier, a subscripted
          expression, or a sliced expression).

       b) For context 1c above, the target must be of the form
          'self.attrib', where 'self' matches the '__init__' method's
          first parameter (the instance parameter) and 'attrib' is a
          simple indentifier as in 3a.

    Examples::

        g = 'module attribute (global variable)'
        """This is g's docstring."""

        class AClass:

            c = 'class attribute'
            """This is AClass.c's docstring."""

            def __init__(self):
                self.i = 'instance attribute'
                """This is self.i's docstring."""

    Additional Docstrings
    ---------------------

    XXX A description of additional docstrings would be appropriate in
    the PEP 257, "Docstring Conventions" [4].

    Many programmers would like to make extensive use of docstrings
    for API documentation.  However, docstrings do take up space in
    the running program, so some of these programmers are reluctant to
    'bloat up' their code.  Also, not all API documentation is
    applicable to interactive environments, where __doc__ would be
    displayed.

    The docstring processing system's extraction tools will
    concatenate all string literal expressions which appear at the
    beginning of a definition or after a simple assignment.  Only the
    first strings in definitions will be available as __doc__, and can
    be used for brief usage text suitable for interactive sessions;
    subsequent string literals and all attribute docstrings are
    ignored by the Python bytecode compiler and may contain more
    extensive API information.

    Example::

        def function(arg):
            """This is __doc__, function's docstring."""
            """
            This is an additional docstring, ignored by the bytecode
            compiler, but extracted by the docstring processing system.
            """
            pass

    Issue: This breaks 'from __future__ import' statements in Python
    2.1 for multiple module docstrings.  Resolution?

    1. Should we search for docstrings after a __future__ statement?
       Very ugly.

    2. Redefine __future__ statements to allow multiple preceeding
       string literals?

    3. Or should we not even worry about this?  There shouldn't be
       __future__ statements in production code, after all.  Modules
       with __future__ statements will have to put up with the
       single-docstring limitation.

    Choice of Docstring Format
    ==========================

    Rather than force everyone to use a single docstring format,
    multiple input formats are allowed by the processing system.  A
    special variable, __docformat__, may appear at the top level of a
    module before any function or class definitions.  Over time or
    through decree, a standard format or set of formats should emerge.

    The __docformat__ variable is a string containing the name of the
    format being used, a case-insensitive string matching the input
    parser's module or package name (i.e., the same name as required
    to 'import' the module or package), or a registered alias.  If no
    __docformat__ is specified, the default format is 'plaintext' for
    now; this may be changed to the standard format once determined.

    The __docformat__ string may contain an optional second field,
    separated from the format name (first field) by a single space: a
    case-insensitive language identifier as defined in RFC 1766 [5].
    A typical language identifier consists of a 2-letter language code
    from ISO 639 [6] (3-letter codes used only if no 2-letter code
    exists; RFC 1766 is currently being revised to allow 3-letter
    codes).  If no language identifier is specified, the default is
    'en' for English.  The language identifier is passed to the parser
    and can be used for language-dependent markup features.

    DPS Structure
    =============

    - package 'dps'

      - function 'dps.main()' (in 'dps/__init__.py')

      - package 'dps.parsers'

        - module 'dps.parsers.model'; see 'Input Parser API' below.

      - package 'dps.formatters'

        - module 'dps.formatters.model'; see 'Output Formatter API' below.

      - package 'dps.languages'

        - module 'dps.languages.en' (English)

        - others to be added

      - utility modules: 'dps.statemachine'

    Command-Line Interface
    ======================

    XXX To be determined.

    System Python API
    =================

    XXX To be determined.

    Input Parser API
    ================

    Each input parser is a module or package exporting a 'Parser' class,
    with the following interface:

        class Parser:

            def __init__(self, inputstring, errors='warn', language='en'):
                """Initialize the Parser instance."""

            def parse(self):
                """Return a DOM tree, the parsed input string."""

    XXX This needs a lot of work.  What is required for this API?

    A model 'Parser' class implementing the full interface along with
    utility functions can be found in the 'dps.parsers.model' module.

    Output Formatter API
    ====================

    Each output formatter is a module or package exporting a
    'Formatter' class, with the following interface:

        class Formatter:

            def __init__(self, domtree, language='en', showwarnings=0):
                """Initialize the Formatter instance."""

            def format(self):
                """
                Return a formatted string representation of the DOM tree.
                """

    XXX This also needs a lot of work.  What is required for this API?

    A model 'Formatter' class implementing the full interface along
    with utility functions can be found in the 'dps.formatters.model'
    module.

    Language Module API
    ===================

    Language modules will contain language-dependent strings and
    mappings.  They will be named for their language identifier (as
    defined in 'Choice of Docstring Format' above), converting dashes
    to underscores.

    XXX Specifics to be determined.

    Intermediate Data Structure
    ===========================

    A single intermediate data structure is used internally by the
    docstring processing system.  This data structure is a DOM tree
    whose schema is documented in an XML DTD (eXtensible Markup
    Language Document Type Definition), which comes in three parts:

    - the Python Plaintext Document Interface DTD, ppdi.dtd [7],

    - the Generic Plaintext Document Interface DTD, gpdi.dtd [8],

    - and the OASIS Exchange Table Model, soextbl.dtd [9].

    The DTD defines a rich set of elements, suitable for any input
    syntax or output format.  The input parser and the output
    formatter share the same intermediate data structure.  The
    processing system may do transformations on the data from the
    input parser before passing it on to the output formatter.  The
    DTD retains all information necessary to reconstruct the original
    input text, or a reasonable facsimile thereof.

    XXX Specifics (about the DOM tree) to be determined.

    Output Management
    =================

    XXX To be determined.

    Type of output: filesystem only, or in-memory data structure too?
    File/directory naming & structure conventions.  In-memory data
    structure should follow filesystem naming; file/directory ==
    leaf/node.  Use a directory hierarchy rather than long file names
    (long file names were one of the reasons pythondoc couldn't run on
    MacOS).


References and Footnotes

    [1] http://python.sf.net/peps/pep-0256.html

    [2] http://www.python.org/sigs/doc-sig/

    [3] http://python.sf.net/peps/pep-0224.html

    [4] http://python.sf.net/peps/pep-0257.html

    [5] http://www.rfc-editor.org/rfc/rfc1766.txt

    [6] http://lcweb.loc.gov/standards/iso639-2/englangn.html

    [7] http://docstring.sf.net/spec/ppdi.dtd

    [8] http://docstring.sf.net/spec/ppdi.dtd

    [9] http://docstring.sf.net/spec/soextblx.dtd


Project Web Site

    A SourceForge project has been set up for this work at
    http://docstring.sf.net.


Copyright

    This document has been placed in the public domain.


Acknowledgements

    This document borrows ideas from the archives of the Python Doc-SIG
    [2]. Thanks to all members past & present.



Local Variables:
mode: indented-text
indent-tabs-mode: nil
End:




More information about the Python-list mailing list