DocStrings 0.3: Suggestion for documentation syntax (very long)
Special tokens: @ -- escapes any character following it (i.e., @c is always translated to c) [ -- a short tag opener. Closed with a matching ] ::(newline) -- beginning a a long tag. everything until the indent level returns to that of the line which started the long tag is part of the contents. (newline)(newline) -- new paragraph. The syntax of short tags is '[' tagname ' ' contents ']' where contents is any valid snippet of docstring, with the exception of a long tag. The syntax of long tags is tagname (attr '=' value)*'::' contents Where contents is any valid snippet of docstring. List of tags, and where they are valid: code (both as long and short) -- anywhere example (long, no attributes) -- anywhere arg, return (attributes: name, type, default (optional) ) -- function docstring, never inside another long tag. rest-arg, kw-arg (long, no attributes) -- function docstring, never inside another long tag. data (attributes: name, type) -- class/module doc-string, never inside another long tag. exception (attributes: name) -- class/module doc-string, never inside another long tag. member (attributes: name, type) -- class doc-string, never inside another long tag. function, module, class, exception, var, method, keyword, member, pytype, file, url, arg (short tag) -- anywhere (these are for general markup syntax, so I've probably forgotten some): list (long tag) -- anywhere item (long tag) -- anywhere emph (short tag) -- anywhere (Note: there are some long tags with the same name as short tags. This poses no problems: the tags are different tags!) This was brief on purpose: I don't believe hardly anyone will actually read the spec: most people will just start writing doc-strings based on what they have seen. So, here is an example of a marked up module. I want to thank Gordon McMillan for providing a well-documented module: I just modified the syntax. (search for ======= if you want to skip the example module and get to the description of the intermediary format) """Utilities for comparing files and directories.""" import os import stat import statcache _cache = {} BUFSIZE=8*1024 def cmp(f1, f2, shallow=1,use_statcache=0): """Compare two files. arg name=f1 type=string:: First file name arg name=f2 type=string:: Second file name arg name=shallow type=bool default=1:: Just check stat signature (do not read the files). arg name=use_statcache type=bool default=0:: Do not stat() each file directly: go through the statcache module for more efficiency. return type=bool:: 1 if the files are the same, 0 otherwise. This function uses a cache for past comparisons and the results, with a cache invalidation mechanism relying on stale signatures. Of course, if [arg use_statcache] is true, this mechanism is defeated, and the cache will never grow stale. """ stat_function = (os.stat, statcache.stat)[use_statcache] s1, s2 = _sig(stat_function(f1)), _sig(stat_function(f2)) if s1[0]!=stat.S_IFREG or s2[0]!=stat.S_IFREG: return 0 if shallow and s1 == s2: return 1 if s1[1]!=s2[1]: return 0 result = _cache.get((f1, f2)) if result and (s1, s2)==result[:2]: return result[2] outcome = _do_cmp(f1, f2) _cache[f1, f2] = s1, s2, outcome return outcome def _sig(st): return (stat.S_IFMT(st[stat.ST_MODE]), st[stat.ST_SIZE], st[stat.ST_MTIME]) def _do_cmp(f1, f2): bufsize = BUFSIZE fp1 , fp2 = open(f1, 'rb'), open(f2, 'rb') while 1: b1, b2 = fp1.read(bufsize), fp2.read(bufsize) if b1!=b2: return 0 if not b1: return 1 # Directory comparison class. # class dircmp: """A class that manages the comparison of 2 directories. High level usage: list:: item:: [code x = dircmp(dir1, dir2)] item:: [code x.report()] -> prints a report on the differences between dir1 and dir2 item:: [code x.report_partial_closure()] -> prints report on differences between dir1 and dir2, and reports on common immediate subdirectories. item:: [code x.report_full_closure()] -> like report_partial_closure, but fully recursive. member name=left_list type=list:: files in [var dir1], except for the ones in [var hide] and [var ignore] lists. member name=right_list type=list:: files in [var dir2], except for the ones in [var hide] and [var ignore] lists. member name=common type=list:: names in both [var dir1] and [var dir2] member name=left_only type=list:: names in [var dir1] but not in [var dir2] member name=right_only type=list:: names in [var dir2] but not in [var dir1] member name=common_dirs type=list:: subdirectories in both [var dir1] and [var dir2]. member name=common_files type=list:: files in both [var dir1] and [var dir2]. member name=common_funny type=list:: names in both [var dir1] and [var dir2] where the type differs between [var dir1] and [var dir2], or the name is not [function stat]-able. member name=same_files type=list:: list of identical files. member name=diff_files type=list:: list of filenames which differ. member name=funny_files type=list:: list of files which could not be compared. member name=funny_files type=list:: list of files which could not be compared. member name=subdirs type=dictionary:: values are [class dircmp] objects, keyed by names in [member common_dirs]. """ def __init__(self, a, b, ignore=None, hide=None): # Initialize '''\ initialize an directory comparison arg name=a type=string:: first directory arg name=b type=string:: second directory arg name=ignore type=list or None:: list of directories to ignore when comparing. None means to ignore [code ['RCS', 'CVS', 'tags']]. arg name=hide type=list or None:: list of directories to hide when comparing. None means to hide defaults to [code [os.curdir, os.pardir]]. ''' self.left = a self.right = b if hide is None: self.hide = [os.curdir, os.pardir] # Names never to be shown else: self.hide = hide if ignore is None: self.ignore = ['RCS', 'CVS', 'tags'] # Names ignored in comparison else: self.ignore = ignore def phase0(self): # Compare everything except common subdirectories self.left_list = _filter(os.listdir(self.left), self.hide+self.ignore) self.right_list = _filter(os.listdir(self.right), self.hide+self.ignore) self.left_list.sort() self.right_list.sort() __p4_attrs = ('subdirs',) __p3_attrs = ('same_files', 'diff_files', 'funny_files') __p2_attrs = ('common_dirs', 'common_files', 'common_funny') __p1_attrs = ('common', 'left_only', 'right_only') __p0_attrs = ('left_list', 'right_list') def __getattr__(self, attr): if attr in self.__p4_attrs: self.phase4() elif attr in self.__p3_attrs: self.phase3() elif attr in self.__p2_attrs: self.phase2() elif attr in self.__p1_attrs: self.phase1() elif attr in self.__p0_attrs: self.phase0() else: raise AttributeError, attr return getattr(self, attr) def phase1(self): # Compute common names a_only, b_only = [], [] common = {} b = {} for fnm in self.right_list: b[fnm] = 1 for x in self.left_list: if b.get(x, 0): common[x] = 1 else: a_only.append(x) for x in self.right_list: if common.get(x, 0): pass else: b_only.append(x) self.common = common.keys() self.left_only = a_only self.right_only = b_only def phase2(self): # Distinguish files, directories, funnies self.common_dirs = [] self.common_files = [] self.common_funny = [] for x in self.common: a_path = os.path.join(self.left, x) b_path = os.path.join(self.right, x) ok = 1 try: a_stat = statcache.stat(a_path) except os.error, why: # print 'Can\'t stat', a_path, ':', why[1] ok = 0 try: b_stat = statcache.stat(b_path) except os.error, why: # print 'Can\'t stat', b_path, ':', why[1] ok = 0 if ok: a_type = stat.S_IFMT(a_stat[stat.ST_MODE]) b_type = stat.S_IFMT(b_stat[stat.ST_MODE]) if a_type <> b_type: self.common_funny.append(x) elif stat.S_ISDIR(a_type): self.common_dirs.append(x) elif stat.S_ISREG(a_type): self.common_files.append(x) else: self.common_funny.append(x) else: self.common_funny.append(x) def phase3(self): # Find out differences between common files xx = cmpfiles(self.left, self.right, self.common_files) self.same_files, self.diff_files, self.funny_files = xx def phase4(self): # Find out differences between common subdirectories # A new dircmp object is created for each common subdirectory, # these are stored in a dictionary indexed by filename. # The hide and ignore properties are inherited from the parent self.subdirs = {} for x in self.common_dirs: a_x = os.path.join(self.left, x) b_x = os.path.join(self.right, x) self.subdirs[x] = dircmp(a_x, b_x, self.ignore, self.hide) def phase4_closure(self): # Recursively call phase4() on subdirectories self.phase4() for x in self.subdirs.keys(): self.subdirs[x].phase4_closure() def report(self): '''\ Print a report on the differences between [member a] and [member b]. Output format is purposely lousy ''' print 'diff', self.left, self.right if self.left_only: self.left_only.sort() print 'Only in', self.left, ':', self.left_only if self.right_only: self.right_only.sort() print 'Only in', self.right, ':', self.right_only if self.same_files: self.same_files.sort() print 'Identical files :', self.same_files if self.diff_files: self.diff_files.sort() print 'Differing files :', self.diff_files if self.funny_files: self.funny_files.sort() print 'Trouble with common files :', self.funny_files if self.common_dirs: self.common_dirs.sort() print 'Common subdirectories :', self.common_dirs if self.common_funny: self.common_funny.sort() print 'Common funny cases :', self.common_funny def report_partial_closure(self): '''Print reports on [arg self] and on [member subdirs]''' self.report() for x in self.subdirs.keys(): print self.subdirs[x].report() def report_full_closure(self): '''Report on [var self] and [member subdirs] recursively.''' self.report() for x in self.subdirs.keys(): print self.subdirs[x].report_full_closure() # Compare common files in two directories. # Return: # - files that compare equal # - files that compare different # - funny cases (can't stat etc.) # def cmpfiles(a, b, common): """Compare common files in two directories. arg name=a type=string:: name of first directory. arg name=b type=string:: name of second directory. arg name=common type=list:: names of common files to be compared return type=tuple:: list:: item:: files that compare equal item:: files that are different item:: filenames that aren't regular files. """ res = ([], [], []) for x in common: res[_cmp(os.path.join(a, x), os.path.join(b, x))].append(x) return res # Compare two files. # Return: # 0 for equal # 1 for different # 2 for funny cases (can't stat, etc.) # def _cmp(a, b): try: return not abs(cmp(a, b)) except os.error: return 2 # Return a copy with items that occur in skip removed. # def _filter(list, skip): result = [] for item in list: if item not in skip: result.append(item) return result # Demonstration and testing. # def demo(): import sys import getopt options, args = getopt.getopt(sys.argv[1:], 'r') if len(args) <> 2: raise getopt.error, 'need exactly two args' dd = dircmp(args[0], args[1]) if ('-r', '') in options: dd.report_full_closure() else: dd.report() if __name__ == '__main__': demo() ======= The intermediary format is XML. I have not yet written a DTD, but this is a general sketch -- The root element is "module" Inside "module", you can have "description", "class", "function", "exception" and "data". Inside "class", you can have "description", "class", "function", "exception", "function", "data" and "member" Inside "function", and "description", "arg", "kw-arg", "rest-arg". and "return" elements. Inside "exception", there is a "description". Inside "data", there is a "description". Inside "member", there is a "description". Inside "arg", "kw-arg", "rest-arg" and "return" there is a "description" The following elements have a "type" attribute: "member", "data", "arg", "return". The following elements have a "name" attribute: "class", "function", "exception", "data", "member", "arg". Inside a "description" there are "p" elements. Inside "p" element, there is PCDATA, "code" element, "example" element, "module-ref", "class-ref", "exception-ref", "member-ref", "data-ref", "function-ref", "arg-ref", "var", "keyword", "pytype", "file", "url", "emph" and "list". Inside the *-ref elements, "code", "example", "var", "keyword", "pytype", "file" and "url" there is PCDATA. Inside "list" there are "item" elements. The content model of "item" and "emph" is the same as that for "p". ============= The OOL format will be similar to the intermediary format, with two important changes: 1. It will have "link" elements, which allow it to pull data out of a module doc-strings. These link elements should be powerful enough so that for most modules, a canonical OOL file will suffice. 2. It will have some SGML minimization to make writing it less painful. Please comment! -- Moshe Zadka <mzadka@geocities.com>. INTERNET: Learn what you know. Share what you don't.
(Apologies to Moshe, to whom I inadvertently sent a separate copy of this - the perils of hitting "Reply" instead of "Reply All" in Outlook <fx:spit>) Either I'm terminally confused (I hope that's it), or else several people must have been murdered at IPC8... If I'm understanding things, Moshe is proposing (officially) a text-intensive form of markup for use within doc strings. This is designed to be easily mapped to (for instance) XML, and is nearly as verbose. If I have understood that correctly, then I have two immediate comments: 1. Why don't we just use XML? - the proposed format is almost as verbose, and has the disadvantage that it is a new format "for the sake of it" [1]. 2. What happened to all the people who were going to refuse to use such a format if it were more verbose than the proposal being circulated pre-IPC8 (which I thouhgt was generally agreed on)? There's been no explanation of why they've suddenly reversed position, if they indeed have. As a smaller note, Moshe seems to be leaving out "cross references" as unwanted in the inline documentation, whereas I would have thought it's as necessary there as out of line... [1] If we're going to have our own format, it has to give *significant* benefits to readability/typability/etc, because having to translate into something else to determine if it is legitimate (in the "valid XML" sense) already makes it that much harder to correlate errors back to the original text. I don't see those benefits here. If this proposal is serious, can we please have an explanation of why the various points raised in the pre-IPC8 discussion are being junked (things like wanting list elements not to be separated by blank lines, wanting to keep the typing/markup mode limited, Tim Peters' points about not needing to markup code blocks if they had >>>/... markers, etc.). Desperately hoping I've missed LOTS of points, Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.demon.co.uk/ My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) [I've read it twice. I've thought it over. I'm sending it anyway.]
On Fri, 4 Feb 2000, Tony J Ibbs (Tibs) wrote:
(Apologies to Moshe, to whom I inadvertently sent a separate copy of this - the perils of hitting "Reply" instead of "Reply All" in Outlook <fx:spit>)
Either I'm terminally confused (I hope that's it), or else several people must have been murdered at IPC8...
Since the beginning, I knew that the "official" line will succeed and any other "lines" ( some of them with a lot of common sense) will be easily erased: - eval docstrings - hyperreferences - indexes I proposed or I contributed an approach: http://www.ctv.es/USERS/irmina/mandoc/mandoc.py but I didn't finish it, partly because I got bored, partly because I knew the official line had XML in mind.
If I'm understanding things, Moshe is proposing (officially) a text-intensive form of markup for use within doc strings. This is designed to be easily mapped to (for instance) XML, and is nearly as verbose. If I have understood that correctly, then I have two immediate comments:
1. Why don't we just use XML? - the proposed format is almost as verbose, and has the disadvantage that it is a new format "for the sake of it" [1].
The question is which are the benefits of a XML-ish doc string system? Certainly, the casual programmer is not going to write doc in a grammar that is far more complex than the language ( python) itself. So I think, this so complex doc strings aims other objectives, it's intended to be robust, and to be used by "proffesional" people. When I mean "professional" I mean what people thinks professional is. Professional usually means little complex but clearly defined and massively used. Most people would consider Java more professional than python, because it's more used and less complex ( or its complexity is biased towards "professional things"). Visual Basic and Windows are very "professional" too. "Professional" always involves two words: "fashionable", "stupid-enough". XML is both: fashionable and "stupid-enough". Even admitting that XML is very useful, I think it's completely stupid to use XML whenever there's an exchange of data between programs or whenever we have to declare a structure. I think XML is the glue that unites "Black boxes", each program should be a black box, but inside that Black box I can live without XML. Now, documenting can be considered a black box, we can always support a XML layer in the most external step of documentation, and use an internal mini-language or tags or whatever suits our needs better. But all this is against "professional" stuff because: - It gives an unnecessary flexibility ( professionals want stiff systems) - It gives more than one way to do it (professionals want only one solution) - It uses academic rubish : black boxes, modularity. Any professional want to hard-wire and write the less code as possible. Any professional will tell you that the less interfaces the better. The TRUTH is that the more interfaces you use , the more freedom degrees are available for you, and better maintainance. But "professional" like the hard way, rewrite code and such. Of course, being "professional" is the logical step for python, it will lose most of its charm but it'll make more money. Linux is suffering the same evolution, btw.
2. What happened to all the people who were going to refuse to use such a format if it were more verbose than the proposal being circulated pre-IPC8 (which I thouhgt was generally agreed on)? There's been no explanation of why they've suddenly reversed position, if they indeed have.
I can't fight against the world. I must add that few people is interested in the problem itself. The problem is as simple as this: - A rich, flexible and used by many people, way of doc This implies more than a solution, but generating XML in the last step. Used by many people implies extremely simple and encouraging, I think my "mandoc" is simple, and I must add that there should be more than one way of simple documentation.
As a smaller note, Moshe seems to be leaving out "cross references" as
Cross references are extremely useful but they're never as impressive as a set of magic words. Many people suggest their own recipe of magic words for the problem , of course, professional like magic words. But they're not magic at all, but just a tool for describing things, the interesting thing is expression capability. Crossreferences express the inner structure and as many relationships as we want to, unfortunately they're not magic.
[1] If we're going to have our own format, it has to give *significant* benefits to readability/typability/etc, because having to translate into something else to determine if it is legitimate (in the "valid XML" sense) already makes it that much harder to correlate errors back to the original text. I don't see those benefits here.
I can't agree with this, pure XML would be always more buggy. Regards/Saludos Manolo www.ctv.es/USERS/irmina /TeEncontreX.html /texpython.htm /pyttex.htm I have an existential map. It has "You are here" written all over it. -- Steven Wright
2. What happened to all the people who were going to refuse to use such a format if it were more verbose than the proposal being circulated pre-IPC8 (which I thouhgt was generally agreed on)? There's been no explanation of why they've suddenly reversed position, if they indeed have.
I certainly haven't. There was (alas) no serious discussion on this topic during developer's day, which in my opinion was a very frustrating and pointless event. Too many folks with no involvement in the SIG just felt it was an opportunity to express their 'feelings' without committing resources or proposals. We had yet another discussion about XML, SGML, absence of tools, whether DocBook, PDF or Word, having user needs drive the process or not, etc. etc., but I don't think that Fred learned anything that he didn't know before. FWIW, I am tentatively opposed to Moshe's proposal, the tentatively because I haven't had a chance to give his proposal a fair look. The 'look and feel' of the markup, however, doesn't sit well with me. I will do a more serious critique in the next few days. Moshe, could you step back a little and explain why we should undo the relative agreement we'd established (IMO) before IPC8? --david
David, sorry to pick on you, but this will be a rather collective answer. On Fri, 4 Feb 2000, David Ascher wrote:
not, etc. etc., but I don't think that Fred learned anything that he didn't know before.
I agree. It was a bit pointless.
FWIW, I am tentatively opposed to Moshe's proposal, the tentatively because I haven't had a chance to give his proposal a fair look. The 'look and feel' of the markup, however, doesn't sit well with me. I will do a more serious critique in the next few days.
Great!
Moshe, could you step back a little and explain why we should undo the relative agreement we'd established (IMO) before IPC8?
Well, I certainly seemed to steer enough negative feelings! For one thing "I didn't know we had one". For another thing, my proposal had one thing no other proposal I know of had: a clear definition of the driving forces behind it. Now, one thing I would like to know is whether it was bad design (my goals were wrong) or bad implementation (my proposal didn't meet those goals). I must say that no-one seemed to give it as much as a chance -- that is, re-write one docstring in it. As to Greg's objection, well, I can certainly live with something which "guesses" what the words mean based on some delimiters. I'm doubtful that I could write it, and I'm even skeptic that anyone will do a good enough job that it will replace the semantic markup completely, but we can give it a try. In short, what I want (and I think deserve) is more then people saying "Yuck!" -- I want to see if version 0.4 can be better. If people think my proposal is so horrible it can never be fixed, then if someone could show an alternative proposal, it would do us all some good. crushed-by-the-multitudes-ly y'rs, Z. -- Moshe Zadka <mzadka@geocities.com>. INTERNET: Learn what you know. Share what you don't.
David Ascher writes:
not, etc. etc., but I don't think that Fred learned anything that he didn't know before.
Yes, I learned that there are still very few people interested in out-of-line documentation, and I can do whatever I darn well please. ;) -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> Corporation for National Research Initiatives
On Fri, 4 Feb 2000, Tony J Ibbs (Tibs) wrote:
1. Why don't we just use XML? - the proposed format is almost as verbose, and has the disadvantage that it is a new format "for the sake of it" [1].
Because XML is totally unacceptable for typing. Moreover, it is quite easy to type syntactically incorrect XML [1], and the concept of documentation so complex that it requires debugging on a syntactic level just horrifies me. -- ?!ng [1] ...as evidenced by my very own mistake! Can you find the error in the silly XML example in my first message entitled "Ease of use is #1"?
Hi! Moshe Zadka: [...]
def cmp(f1, f2, shallow=1,use_statcache=0): """Compare two files.
arg name=f1 type=string:: First file name
arg name=f2 type=string:: Second file name
arg name=shallow type=bool default=1:: Just check stat signature (do not read the files).
arg name=use_statcache type=bool default=0:: Do not stat() each file directly: go through the statcache module for more efficiency.
return type=bool:: 1 if the files are the same, 0 otherwise.
This function uses a cache for past comparisons and the results, with a cache invalidation mechanism relying on stale signatures. Of course, if [arg use_statcache] is true, this mechanism is defeated, and the cache will never grow stale.
""" [...] def report_partial_closure(self): '''Print reports on [arg self] and on [member subdirs]''' [...] Please comment!
I don't like it. I don't believe, that there will ever exist much Python sourcecode containing such doc strings. I'm watching the CVS-checkin mailing list and Guido has just committed three large batches of std lib modules containing much nicer and smaller doc strings. These will definitely set standards for other people writing inline doc strings, as soon as Python 1.6 comes out. I can't imagine, that the average Joe Random Python programmer will ever bother to write inline doc in such a complex format as you proposed. I am sad to see, that you have spend a great deal of energy and time on this spec. May be I've missed something important? Maybe the goal of the whole effort should have been made clearer? The 'future direction' chapter of the current Python documentation and the status report posted by Fred Drake J. recently in this SIG descibe the goal not so clear (at least not for me). Maybe I've not enough knowledge the planned workflows when dealing with python documentation. The OOL documentation of Pythons std library is remarkable good and can only be improved in minor details. Therefore I can't see what a 1:1 relation between OOL doc and inline doc strings would buy us? Sure, there is some content overlap between OOL and doc strings, which intrduces some double work for the maintainers. But often the OOL is more elaborated (so should it be), contains tables and other material, which would only bloat doc strings and distract from the code. What about Tkinter? There is Fredrik Lundhs effort, which is still incomplete in some minor areas, but was nevertheless a very valuable reference for me. Unfortunately it can't be included into the python distribution because of copyright issues AFAIK. Inline doc strings are pratically nonexistent in Tkinter. Who will ever volunteer to write some, if they must be written in an ugly and hard to learn format? What about free third party packages? Many have no doc at all. Others have rather good documentation (e.g. Pmw which is only incomplete in some areas). Other packages have doc strings, but the format varies much depending on the taste of the author. That is situation now with an exponetially growing set of new open source Python packages written all over the world. I don't believe that many people are eager to rewrite their doc strings according to a new spec looking radically other than normal text. In the meantime Java has Javadoc and its capabilities are remarkable compared to what Python programmers can use today. So IMOHO it doesn't make much sense to develop a very complex new grammar for doc strings. This effort should be better spend on a set of simple grammars and modular StructuredText like tools, which are able to extract a reasonable doctree structure from existing doc strings. Danilo's 'python-doc'-Package has taken some steps into this direction. However I am not really happy with the current implementation, which still relies heavily on Jim Fultons original StructuredText.py and does further recognition of tags on the paragraph structure spit out by StructuredText. But it is better than gendoc and much better than having nothing at all. So far Daniel L. has certainly provided a good starting point. May be John Aycocks easy to use generic parser could be used to build flexible and customizable doc string eaters? So anybody interested in his own complicated and sophisticated doc string grammar could easily hack up his parser, while we can stay with simple and easy to write doc string formats like those written by Guido and others. Okay: that was sure a demoralizing rant. It's just my peronal view. Regards, Peter -- Peter Funk, Oldenburger Str.86, 27777 Ganderkesee, Tel: 04222 9502 70, Fax: -60
To the moderator: please cancel my earlier message, which had an attached example that was too big. Instead of including it in the message, i'll just put it on a website. ------------------ Before i begin, let me make an attempt to propose the two most important goals of any documentation project. I. To encourage people to write lots of documentation. II. To make that documentation as accessible as possible. There is an example at http://www.lfw.org/python/SocketServer.html ; it is discussed in more detail below. On Fri, 4 Feb 2000, Moshe Zadka wrote:
Special tokens: @ -- escapes any character following it (i.e., @c is always translated to c)
[ -- a short tag opener. Closed with a matching ]
::(newline) -- beginning a a long tag. everything until the indent level returns to that of the line which started the long tag is part of the contents.
(newline)(newline) -- new paragraph.
The syntax of short tags is '[' tagname ' ' contents ']' where contents is any valid snippet of docstring, with the exception of a long tag.
The syntax of long tags is
tagname (attr '=' value)*'::' contents [there follow some 450 lines of description]
My first reaction to this is -- Holy crap! This has gotten totally out of control! It will do us no good for geeks like us -- the 1% of uber-geeks *within* that 1% of geeks that Randy described -- to sit in ivory towers describing elaborate syntaxes for marking up documentation when most other readers won't even understand the syntax, let alone use it in their own code. Why, i wouldn't even bother to do all of this marking up in my own documentation. I sincerely wish not for Moshe to take this personally -- i just think that this is one example of far too many steps down a seductive path in the wrong direction. Why does one never see e-mail like the following? <fragment> <adj/Holy/ <expl severity=mild>crap</expl>! </fragment> <sentence> <subj><pron/This/</subj> <pred><verb>has gotten</verb> <adjp><adv>totally</adv> <adv/out/ <prep/of/ <noun/control/</adjp>! </sentence> Because: 1. No one in their right mind would waste all that time typing. The markup has two potential audiences: humans and machines. 2. The markup isn't going to get used by any sort of mechanical parsing tool anyways. 3. Any meaning that a human could obtain from the markup can be fairly well derived from context. Now, let's translate these three points over into the world of source code documentation. [1.] Point 1 stands as is. Every obstacle that we introduce -- whether a cognitive obstacle (more syntax to understand) or just extra effort (more typing) will make it less likely for people to write documentation. Can you imagine how quickly people would simply stop writing any documentation at *all* if we were to have doc-sig police jumping around pointing out "Oh, your docstring has incorrect syntax because you didn't escape this bracket with an @-sign here." or "You used that semantic tag incorrectly; you should have used [var] instead of [code]."? Remember the old adage that says that every math equation in a book will cut its readership in half? Well, imagine something similar -- every additional syntactic construct or tag will cut the *writership* in half (or some fraction). Even though more tags may mean more expressive power, it also means a more difficult choice to make every time one uses a tag -- beyond a certain point it becomes debatable which tag is the correct one to use, and then all is lost. Also, the more complex the system becomes, the less predictable the failure modes will be -- till we get to the point you have to debug docstrings (eek!). The absolute priority here is to make docstrings *dead* easy to write. That entry barrier has to go way down -- in the limit, to zero, where even existing docstrings written without knowledge of our discussion can be considered rich and "correct" docstrings. (The example is generated from SocketServer.py as distributed with Python 1.5 -- nothing has been edited.) [2.] To the proliferation of long and short tags such as arg, code, data, etc. -- first i ask, what is the purpose? Is this a solution in search of a problem? What will an automatic documentation generator use these tags for? For example, you can devise a structure where you list function arguments and mark up each one, but since the descriptions next to them are just in plain English, what good would a nicely-organized table do for a machine anyway? When would you ever need to collect descriptions of random individual arguments without looking at the function docstring as a whole? (Can we expect it to do much better than the example at http://www.lfw.org/python/SocketServer.html ? At what cost to the writers and readers of docstrings do we achieve such additional power?) [3.] The people who are going to be writing and reading these docstrings have already invested considerable mental effort in the syntax of one language, Python. Let's capitalize on the context we can gather from that, rather than introducing an entirely independent syntax for people to learn as well. We can get a lot of mileage out of this. For example, identifiers that appear after "self." are clearly instance attributes; identifiers that are immediately followed by "(" are function or method calls; we can determine class and function names by introspection; and so on. These conventions are already used by many people; all we have to do is introduce a little payoff, and make sure we keep the conventions very straightforward and the payoff predictable, to encourage and strengthen the use of these conventions. (The example uses this technique to mark up class and method names with hyperlinks, and to make attribute names stand out.) Finally, about the example: this is the example i showed at the end of the Doc-SIG meeting on Developer Day at IPC8. It was generated automatically from the stock SocketServer.py by a program that imports the module and introspects into its classes and functions. We use this script (except for the hyperlinking -- that was a recent addition) at ILM and it works for us fairly well. See: http://www.lfw.org/python/SocketServer.html Further improvements: - Many modules contain documentation in # comments at the beginning of the module, or immediately before functions or classes. The script could look for documentation in these places as well, if docstrings are not found. (This is now done in my local copy, though it wasn't in the script when i demonstrated it at IPC8.) (Side note: i've updated all the modules in the standard library that did this, by moving the # comments into proper docstrings, but this is still likely a good feature to have, as i bet lots of other modules out there use # comments instead of docstrings.) - The script oughta scan the module for constants as well. Constants could be documented with a # comment on the same line as the constant assignment. - Given knowledge of other modules in the system, the script could produce hyperlinks where documentation in one module references functions or classes in another. In short -- let's make it so easy to write rich docstrings that people do it correctly without even knowing that they are doing it. -- ?!ng "If I have not seen as far as others, it is because giants were standing on my shoulders." -- Hal Abelson
Hi! [ping]:
There is an example at http://www.lfw.org/python/SocketServer.html ; it is discussed in more detail below.
I've just had look at it: This looks really great! That goes far beyond that what I got out pythondoc or gendoc. Where can I get the tool/script which produced this nice HTML? Please go ahead and get the permission to publish manpy! I'm holding my breath... ;-) [...] Regards, Peter -- Peter Funk, Oldenburger Str.86, 27777 Ganderkesee, Tel: 04222 9502 70, Fax: -60
----------Peter wrote:------------ [ping]:
There is an example at http://www.lfw.org/python/SocketServer.html ; it is discussed in more detail below.
I've just had look at it: This looks really great! That goes far beyond that what I got out pythondoc or gendoc. Where can I get the tool/script which produced this nice HTML?
Please go ahead and get the permission to publish manpy! I'm holding my breath... ;-) [...]
Regards, Peter
I also looked at it and that's along the lines of what I'm looking for in terms of generated output (though I think I would choose less pinky colors). More importantly though, I would like to know more about what the doc-strings look like in order to produce that output. /will
On Sun, 6 Feb 2000, will wrote:
I also looked at it and that's along the lines of what I'm looking for in terms of generated output (though I think I would choose less pinky colors). More importantly though, I would like to know more about what the doc-strings look like in order to produce that output.
The module is unaltered from SocketServer.py in the standard library distributed with Python 1.5.2. I put a copy of it up at http://www.lfw.org/python/SocketServer.py in case you don't have 1.5.2. In short, nothing special. The docstrings are exactly what you see on the HTML page without the formatting. -- ?!ng "If I have not seen as far as others, it is because giants were standing on my shoulders." -- Hal Abelson
-----Original Message----- The module is unaltered from SocketServer.py in the standard library distributed with Python 1.5.2. I put a copy of it up at http://www.lfw.org/python/SocketServer.py in case you don't have 1.5.2. In short, nothing special. The docstrings are exactly what you see on the HTML page without the formatting.
-- ?!ng
Heh--I feel pretty silly now.... Looking at the doc-strings involved and the output generated, that's exactly what I'm looking for. At the bottom of the SocketServer.html page it says: "manpy by Tommy Burnette. Web interface by Ping...." Does this mean that manpy is the script that generates the documentation in some format-intermediary form and then there is a cgi script/program that takes that form and formats it for html? btw - If this is redundant information from what was at IPC8, then my apologies and such. /will
participants (8)
-
David Ascher -
Fred L. Drake, Jr. -
Ka-Ping Yee -
Manuel Gutierrez Algaba -
Moshe Zadka -
pf@artcom-gmbh.de -
Tony J Ibbs (Tibs) -
will