[Doc-SIG] DocStrings 0.3: Suggestion for documentation syntax (very long)

Moshe Zadka Moshe Zadka <mzadka@geocities.com>
Fri, 4 Feb 2000 12:41:18 +0200 (IST)


Special tokens:
	@ -- escapes any character following it (i.e., @c is always translated
	to c)
	
	[ -- a short tag opener. Closed with a matching ]

	::(newline) -- beginning a a long tag. everything until the
	indent level returns to that of the line which started the long
	tag is part of the contents.

	(newline)(newline) -- new paragraph.

The syntax of short tags is '[' tagname ' ' contents ']' where contents is
any valid snippet of docstring, with the exception of a long tag.

The syntax of long tags is

tagname (attr '=' value)*'::'
	contents

Where contents is any valid snippet of docstring.

List of tags, and where they are valid:
	code (both as long and short) -- anywhere
	example (long, no attributes) -- anywhere
	arg, return (attributes: name, type, default (optional) ) -- 
		function docstring, never inside another long tag.
	rest-arg, kw-arg (long, no attributes) -- 
		function docstring, never inside another long tag.
	data (attributes: name, type) -- 
		class/module doc-string, never inside another long tag.
	exception (attributes: name) -- 
		class/module doc-string, never inside another long tag.
	member (attributes: name, type) -- 
		class doc-string, never inside another long tag.
	function, module, class, exception, var, method, keyword, member,
	pytype, file, url, arg (short tag) -- anywhere

	(these are for general markup syntax, so I've probably forgotten
	some):
		list (long tag) -- anywhere
		item (long tag) -- anywhere
		emph (short tag) -- anywhere 

(Note: there are some long tags with the same name as short tags. This
poses no problems: the tags are different tags!)

This was brief on purpose: I don't believe hardly anyone will actually 
read the spec: most people will just start writing doc-strings based on what
they have seen. So, here is an example of a marked up module. I want to thank
Gordon McMillan for providing a well-documented module: I just modified the
syntax.

(search for ======= if you want to skip the example module and get to the
description of the intermediary format)

"""Utilities for comparing files and directories."""

import os
import stat
import statcache

_cache = {}
BUFSIZE=8*1024

def cmp(f1, f2, shallow=1,use_statcache=0):
    """Compare two files.

    arg name=f1 type=string::
        First file name

    arg name=f2 type=string::
        Second file name

    arg name=shallow type=bool default=1::
       Just check stat signature (do not read the files).

    arg name=use_statcache type=bool default=0::
        Do not stat() each file directly: go through
        the statcache module for more efficiency.

    return type=bool::
        1 if the files are the same, 0 otherwise.

    This function uses a cache for past comparisons and the results,
    with a cache invalidation mechanism relying on stale signatures.
    Of course, if [arg use_statcache] is true, this mechanism is defeated,
    and the cache will never grow stale.

    """
    stat_function = (os.stat, statcache.stat)[use_statcache]
    s1, s2 = _sig(stat_function(f1)), _sig(stat_function(f2))
    if s1[0]!=stat.S_IFREG or s2[0]!=stat.S_IFREG: return 0
    if shallow and s1 == s2: return 1
    if s1[1]!=s2[1]:         return 0

    result = _cache.get((f1, f2))
    if result and (s1, s2)==result[:2]:
        return result[2]
    outcome = _do_cmp(f1, f2)
    _cache[f1, f2] = s1, s2, outcome
    return outcome

def _sig(st):
    return (stat.S_IFMT(st[stat.ST_MODE]),
            st[stat.ST_SIZE],
            st[stat.ST_MTIME])

def _do_cmp(f1, f2):
    bufsize = BUFSIZE
    fp1 , fp2 = open(f1, 'rb'), open(f2, 'rb')
    while 1:
        b1, b2 = fp1.read(bufsize), fp2.read(bufsize)
        if b1!=b2: return 0
        if not b1: return 1

# Directory comparison class.
#
class dircmp:
    """A class that manages the comparison of 2 directories.

    High level usage:

    list::

        item::
            [code x = dircmp(dir1, dir2)]
        item::
            [code x.report()] -> prints a report on the differences between 
            dir1 and dir2
        item::
            [code x.report_partial_closure()] -> prints report on 
            differences between dir1 and dir2, and reports on common 
            immediate subdirectories.
        item::
            [code x.report_full_closure()] -> like report_partial_closure,
            but fully recursive.

    member name=left_list type=list::
        files in [var dir1], except for the ones in [var hide] and 
        [var ignore] lists.

    member name=right_list type=list::
        files in [var dir2], except for the ones in [var hide] and 
        [var ignore] lists.

    member name=common type=list::
        names in both [var dir1] and [var dir2]

    member name=left_only type=list::
        names in [var dir1] but not in [var dir2]

    member name=right_only type=list::
        names in [var dir2] but not in [var dir1]

    member name=common_dirs type=list::
        subdirectories in both [var dir1] and [var dir2].

    member name=common_files type=list::
        files in both [var dir1] and [var dir2].

    member name=common_funny type=list::
        names in both [var dir1] and [var dir2] where the type differs between
        [var dir1] and [var dir2], or the name is not [function stat]-able.

    member name=same_files type=list::
        list of identical files.

    member name=diff_files type=list::
        list of filenames which differ.

    member name=funny_files type=list::
        list of files which could not be compared.

    member name=funny_files type=list::
        list of files which could not be compared.

    member name=subdirs type=dictionary::
        values are [class dircmp] objects, keyed by names in 
        [member common_dirs].
    """

    def __init__(self, a, b, ignore=None, hide=None): # Initialize
        '''\
        initialize an directory comparison

        arg name=a type=string::
            first directory

        arg name=b type=string::
            second directory

        arg name=ignore type=list or None::
            list of directories to ignore when comparing. None means to
            ignore [code ['RCS', 'CVS', 'tags']].

        arg name=hide type=list or None::
            list of directories to hide when comparing. None means to hide
            defaults to [code [os.curdir, os.pardir]].
        '''

        self.left = a
        self.right = b
        if hide is None:
            self.hide = [os.curdir, os.pardir] # Names never to be shown
        else:
            self.hide = hide
        if ignore is None:
            self.ignore = ['RCS', 'CVS', 'tags'] # Names ignored in comparison
        else:
            self.ignore = ignore

    def phase0(self): # Compare everything except common subdirectories
        self.left_list = _filter(os.listdir(self.left),
                                 self.hide+self.ignore)
        self.right_list = _filter(os.listdir(self.right),
                                  self.hide+self.ignore)
        self.left_list.sort()
        self.right_list.sort()

    __p4_attrs = ('subdirs',)
    __p3_attrs = ('same_files', 'diff_files', 'funny_files')
    __p2_attrs = ('common_dirs', 'common_files', 'common_funny')
    __p1_attrs = ('common', 'left_only', 'right_only')
    __p0_attrs = ('left_list', 'right_list')

    def __getattr__(self, attr):
        if attr in self.__p4_attrs:
            self.phase4()
        elif attr in self.__p3_attrs:
            self.phase3()
        elif attr in self.__p2_attrs:
            self.phase2()
        elif attr in self.__p1_attrs:
            self.phase1()
        elif attr in self.__p0_attrs:
            self.phase0()
        else:
            raise AttributeError, attr
        return getattr(self, attr)

    def phase1(self): # Compute common names
        a_only, b_only = [], []
        common = {}
        b = {}
        for fnm in self.right_list:
            b[fnm] = 1
        for x in self.left_list:
            if b.get(x, 0):
                common[x] = 1
            else:
                a_only.append(x)
        for x in self.right_list:
            if common.get(x, 0):
                pass
            else:
                b_only.append(x)
        self.common = common.keys()
        self.left_only = a_only
        self.right_only = b_only

    def phase2(self): # Distinguish files, directories, funnies
        self.common_dirs = []
        self.common_files = []
        self.common_funny = []

        for x in self.common:
            a_path = os.path.join(self.left, x)
            b_path = os.path.join(self.right, x)

            ok = 1
            try:
                a_stat = statcache.stat(a_path)
            except os.error, why:
                # print 'Can\'t stat', a_path, ':', why[1]
                ok = 0
            try:
                b_stat = statcache.stat(b_path)
            except os.error, why:
                # print 'Can\'t stat', b_path, ':', why[1]
                ok = 0

            if ok:
                a_type = stat.S_IFMT(a_stat[stat.ST_MODE])
                b_type = stat.S_IFMT(b_stat[stat.ST_MODE])
                if a_type <> b_type:
                    self.common_funny.append(x)
                elif stat.S_ISDIR(a_type):
                    self.common_dirs.append(x)
                elif stat.S_ISREG(a_type):
                    self.common_files.append(x)
                else:
                    self.common_funny.append(x)
            else:
                self.common_funny.append(x)

    def phase3(self): # Find out differences between common files
        xx = cmpfiles(self.left, self.right, self.common_files)
        self.same_files, self.diff_files, self.funny_files = xx

    def phase4(self): # Find out differences between common subdirectories
        # A new dircmp object is created for each common subdirectory,
        # these are stored in a dictionary indexed by filename.
        # The hide and ignore properties are inherited from the parent
        self.subdirs = {}
        for x in self.common_dirs:
            a_x = os.path.join(self.left, x)
            b_x = os.path.join(self.right, x)
            self.subdirs[x]  = dircmp(a_x, b_x, self.ignore, self.hide)

    def phase4_closure(self): # Recursively call phase4() on subdirectories
        self.phase4()
        for x in self.subdirs.keys():
            self.subdirs[x].phase4_closure()

    def report(self):
		'''\
        Print a report on the differences between [member a] and [member b].
        Output format is purposely lousy
		'''
        print 'diff', self.left, self.right
        if self.left_only:
            self.left_only.sort()
            print 'Only in', self.left, ':', self.left_only
        if self.right_only:
            self.right_only.sort()
            print 'Only in', self.right, ':', self.right_only
        if self.same_files:
            self.same_files.sort()
            print 'Identical files :', self.same_files
        if self.diff_files:
            self.diff_files.sort()
            print 'Differing files :', self.diff_files
        if self.funny_files:
            self.funny_files.sort()
            print 'Trouble with common files :', self.funny_files
        if self.common_dirs:
            self.common_dirs.sort()
            print 'Common subdirectories :', self.common_dirs
        if self.common_funny:
            self.common_funny.sort()
            print 'Common funny cases :', self.common_funny

    def report_partial_closure(self):
		'''Print reports on [arg self] and on [member subdirs]'''
        self.report()
        for x in self.subdirs.keys():
            print
            self.subdirs[x].report()

    def report_full_closure(self):
		'''Report on [var self] and [member subdirs] recursively.'''
        self.report()
        for x in self.subdirs.keys():
            print
            self.subdirs[x].report_full_closure()


# Compare common files in two directories.
# Return:
#	- files that compare equal
#	- files that compare different
#	- funny cases (can't stat etc.)
#
def cmpfiles(a, b, common):
    """Compare common files in two directories.

    arg name=a type=string::
        name of first directory.

    arg name=b type=string::
        name of second directory.

    arg name=common type=list::
        names of common files to be compared

    return type=tuple::
        list::
            item::
                files that compare equal
            item::
                files that are different
            item::
                filenames that aren't regular files.
    """
    res = ([], [], [])
    for x in common:
        res[_cmp(os.path.join(a, x), os.path.join(b, x))].append(x)
    return res


# Compare two files.
# Return:
#	0 for equal
#	1 for different
#	2 for funny cases (can't stat, etc.)
#
def _cmp(a, b):
    try:
        return not abs(cmp(a, b))
    except os.error:
        return 2


# Return a copy with items that occur in skip removed.
#
def _filter(list, skip):
    result = []
    for item in list:
        if item not in skip: result.append(item)
    return result


# Demonstration and testing.
#
def demo():
    import sys
    import getopt
    options, args = getopt.getopt(sys.argv[1:], 'r')
    if len(args) <> 2: raise getopt.error, 'need exactly two args'
    dd = dircmp(args[0], args[1])
    if ('-r', '') in options:
        dd.report_full_closure()
    else:
        dd.report()

if __name__ == '__main__':
    demo()


======= 

The intermediary format is XML.
I have not yet written a DTD, but this is a general sketch --
The root element is "module" 
Inside "module", you can have "description", "class", "function", "exception" 
and "data".
Inside "class", you can have "description", "class", "function", "exception", 
"function", "data" and "member"
Inside "function", and "description", "arg", "kw-arg", "rest-arg".
and "return" elements. 
Inside "exception", there is a "description".
Inside "data", there is a "description".
Inside "member", there is a "description".
Inside "arg", "kw-arg", "rest-arg" and "return" there is a "description"

The following elements have a "type" attribute:
"member", "data", "arg", "return".

The following elements have a "name" attribute:
"class", "function", "exception", "data", "member", "arg".

Inside a "description" there are "p" elements.
Inside "p" element, there is PCDATA, "code" element, "example" element,
"module-ref", "class-ref", "exception-ref", "member-ref", "data-ref", 
"function-ref", "arg-ref", "var", "keyword", "pytype", "file", "url", "emph" 
and "list".

Inside the *-ref elements, "code", "example", "var", "keyword", "pytype", 
"file" and "url" there is PCDATA.

Inside "list" there are "item" elements.

The content model of "item" and "emph" is the same as that for "p".


=============

The OOL format will be similar to the intermediary format, with two important
changes:

1. It will have "link" elements, which allow it to pull data out of a 
module doc-strings. These link elements should be powerful enough so that
for most modules, a canonical OOL file will suffice.

2. It will have some SGML minimization to make writing it less painful.


Please comment!
--
Moshe Zadka <mzadka@geocities.com>. 
INTERNET: Learn what you know.
Share what you don't.