Proposed format for docstrings: The whitespace at the beginning of a docstring is ignored. Paragraphs are separated by one or more blank lines. For compatibility with Guido, IDLE and Pythonwin (and increasing the likelihood that the proposal will be accepted by GvR), the docstrings of callables must follow the following convention established in Python's builtins: >>> print len.__doc__ len(object) -> integer Return the number of items of a sequence or mapping. In other words, the first paragraph must fit on a line, repeat the name of the callable, with a 'wordy' signature, the ' -> ' string, and the type of the return value. The second paragraph must be a one-sentence description of the callable. It is also allowed to have those two bits separated by a " -- " string: >>> print [].pop.__doc__ L.pop([index]) -> item -- remove and return item at index (default last) and functions which don't return anything can omit the " -> foo" bit: L.append(object) -- append object to end Each paragraph is either 'text' or a 'keyword-tagged block'. A keyword is a case-sensitive element of [a-zA-Z_]+ followed by two colons (with optional whitespace between the keyword and the colons, but no whitespace allowed between the two colons). A paragraph which doesn't start with a keyword is 'text'. Characters between # signs and the end of the line are stripped by the docstring parser. A 'keyword-tagged block' is nested much like Python code. Just like in Python, the block can either be on the same line as the keyword if it is one-line long (I'll refer to such blocks as 'text' blocks even though they aren't in visual paragraphs), or needs to be indented relative to the keyword. Examples: Author:: Guido van Rossum # comments are stripped Date_of_release :: 1/1/1999 # The key is "Date_of_release" and the # whitespace before the : is stripped Contributors:: # The value is a block of lines. John Doe Ronald Reagan Francois Mitterand Some keywords can have special parsing rules, as the block of text which the keyword designates is well-specified by the rules above. The first example of such a keyword-specific parsing rule is for Arguments: Arguments:: self -- instance input (sequence) -- the sequence which is being processed (the specific syntax of Arguments:: is left for a later discussion). Other candidates which can impose specific parsing rules are: ReturnType, Date, Version, etc. Text blocks can be followed by indented blocks as well -- those are 'children' blocks of the outdented block. 'text' blocks which start with * or - are tagged as 'bullet items' for rendering. The bullet marker has to be consistent within a given level of indentation. Example: * this is one bullet - this is a sub-bullet - this is another sub-bullet * this is another bullet In text blocks, some strings are recognized as links: .foo in the docstring of a class will refer to the foo attribute of the class. In the docstring of a method, it will refer to the foo attribute of the method's class. In the docstring of a module it will refer to a function or class defined in that module foo.bar will refer to the bar attribute of foo, which will be looked up in the following namespaces in order: (to be determined) URL notation is automatically recognized. [foo] refers to the keyword 'foo' in the section 'References' of the current docstring. [..] links cannot span multiple lines or contain whitespaces (as keywords can't). (in other words, if a [ is not matched by a ] in the same line or before a whitespace character is hit, then it is a syntax error. References:: foo:: My Dissertation, University Press, 1902 The set of keywords which are 'officially sanctioned' is: For module docstrings: [see Trove discussion for a good starting set -- this discussion has been had!] For class docstrings: [To be determined] For method docstrings: [To be determined] For function docstrings: [To be determined] Miscellaneous Thoughts: I chose double-colon notation for keywords so that one can have text paragraphs which match the 'word:' notation without having them be interpreted as keywords. Does this proposal make docstrings whitespace-heavy -- the requirement to break each paragraph with a line of whitespace means that a lot of lines are blank, especially when doing 'bulleted lists' The above was (quickly) written with parsing in mind. Is it really easily parseable? If not, what needs to be changed so that it is parseable? I also wanted to make sure that syntax errors could be flagged early and 'localized' for aid in debugging. I'm not sure that I did that carefully enough. Are there normal uses in docstrings where one wants to turn off the automatic link detection? Is there value in having string interpolation? David Arnold mentioned __version__ = "$Revision$[11:-2] __date__ = "$Date$ which raises some issues. I don't think that having [11:-2] evaluated by the docstring parser is a wise idea. However, I can imagine that the module author could do: __version__ = "$Revision$"[11:-2] in the Python code, and then Version:: %(__version__)s in the docstring and that such a simple string interpolation mechanism could have value. I'm not sure it's worth the complication though. What dictionary would be used to do the interpolation? Hopefully constructively, --david PS: It goes without saying that while I railed against design by committee, I am of course hopeful for feedback, for technical reasons (dummy, you forgot special cases X, Y and Z!) and because I realize that a standards proposal needs at least broad agreement if not consensus to be effective in the long run. The sharper-eyed will note that I stacked the deck in my favor in the above proposal by including what Guido does naturally as valid in the proposed grammar.
I would *love* to see a standard for doc strings, and although I've often objected to specific proposals in the past, by now I'd take almost anything. Well, no, that's NEVER true, but David's proposal doesn't cause *too* many knee-jerk reactions... David Ascher wrote:
Paragraphs are separated by one or more blank lines.
As you say later on, I think this does cause some over-use of whitespace...
Characters between # signs and the end of the line are stripped by the docstring parser.
This is a Bad Thing - I have quite often needed to discuss things in doc strings which include use of the "#" character - not least if I'm parsing a little language that uses "#" as its comment character! So losing stuff thus would be difficult. Either (a) why do we need comments in doc strings, or (b) provide a way to escape the "#" character. (Also, if one were using Tim Peter's "test using the doc string as template" thingy, one needs to be able to put generic Python code in the doc strings, and that means that stopping comment characters from going through to the ultimate documentation may be a bad thing.)
A 'keyword-tagged block' is nested much like Python code. Just like in Python, the block can either be on the same line as the keyword if it is one-line long
I *like* this.
Contributors:: # The value is a block of lines.
John Doe
Ronald Reagan
Francois Mitterand
but the above gets oververbose. I suppose one could instead use a list syntax: Contributors:: - John Doe - Ronald Reagan - Francois Mitterand since I don't see the ambiguity in allowing the omission of the vertical whitespace here, *if* one allows that some care would be needed with hyphenation! (i.e., one can't allow one's hyphens to start a line, which is awkward but probably not too bad). Another possibility might be to allow "Python list" syntax - I started off disliking this, but over the last few minutes it has grown on me: Contributors:: [ John Doe, Ronald Reagan, Francois Mitterand ] (again, highjacking Python's syntax).
Text blocks can be followed by indented blocks as well -- those are 'children' blocks of the outdented block.
And this solves my "I want a list item to have multiple paragraphs" problem, which has been a bugbear of mine in the past with other proposals... The exact indentation of a second paragraph in a list item (whether aligned with the bullet or the text) would need addressing later, but I don't much care (provided it is with the text, of course).
'text' blocks which start with * or - are tagged as 'bullet items' for rendering. The bullet marker has to be consistent within a given level of indentation.
Example:
* this is one bullet
- this is a sub-bullet
- this is another sub-bullet
* this is another bullet
Again, sometimes I'd like to allow the blank lines to be missing. Another way to do this is to have a "special" character to introduce the bullet items - so maybe instead: Example: @* this is one bullet @- this is a sub-bullet but that's horrible in its own way - maybe the white space is just what we have to live with (I certainly WOULD live with it if it was the only thing standing in the way of adopting the proposal!). No, on thinking about it, I would vote for either: 1) use of white space as David proposes (pro: utter simplicity, con: doesn't quite look as nice as I'd like) 2) allow Python list syntax (pro: emphasises this is for short lists, con: a bit odd) 3) detect bullet characters at the "start of line" (pro: still fairly simple, con: one has to take care about, e.g., dashes in text) Ah - I just realised that negative numbers at the start of a line probably kill that one... Could we do numbered/lettered/named lists by, for instance: *1 This list item is numbered, and one expects all items at this indentation in this list to be numbered -a Ditto for "lettered" items in this list @fred And this sub-list has item names -2 This may well get flagged as a mistake *B Unless we're allowing the author to do odd things if they like... (is that simple enough?)
Is there value in having string interpolation? David Arnold mentioned
__version__ = "$Revision$[11:-2] __date__ = "$Date$
There's also a semi-convention I've seen where a module's doc string is also used as its documentation for Unix commands, and one substitutes in sys.argv[0] - i.e., the command used to invoke the script - as a string into the "Usage:" line. It's a rather hacky trick, and perhaps not to worry about too much.
The sharper-eyed will note that I stacked the deck in my favor in the above proposal by including what Guido does naturally as valid in the proposed grammar.
Yea, go for it! desparately hoping this will get off the ground, but with no time to do anything more than comment on it, Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.demon.co.uk/ 2 wheels good + 2 wheels good = 4 wheels good? 3 wheels good + 2 wheels good = 5 wheels better? My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)
On Mon, 29 Nov 1999, Tony J Ibbs (Tibs) wrote:
Characters between # signs and the end of the line are stripped by the docstring parser.
This is a Bad Thing - I have quite often needed to discuss things in doc
As I mentioned in another email, yes, you're right.
(Also, if one were using Tim Peter's "test using the doc string as template" thingy, one needs to be able to put generic Python code in the doc strings, and that means that stopping comment characters from going through to the ultimate documentation may be a bad thing.)
This raises a deeper issue: introducing Python code in a docstring. Such text cannot be parsed like text because linebreaks, indentation etc. are important. Here's one idea which I like -- introduce a new keyword which is the equivalent of HTML's <PRE> tag: Code: def foo(): ... return ... In other words, Python code is just another kind of text, but the processing rules applied to that block are different. The only restriction is that the text in a Code: block *cannot* be outdented more than the first line in the block. The rendering in HTML would omit the label "Code:" and instead change font to the monospace font or whatnot. One related comment: multiple instances of a given keyword can occur within a docstring.
[... on the issue of how to 'shorten' lists... ]
No, on thinking about it, I would vote for either:
1) use of white space as David proposes (pro: utter simplicity, con: doesn't quite look as nice as I'd like) 2) allow Python list syntax (pro: emphasises this is for short lists, con: a bit odd) 3) detect bullet characters at the "start of line" (pro: still fairly simple, con: one has to take care about, e.g., dashes in text) Ah - I just realised that negative numbers at the start of a line probably kill that one...
How about another keyword? List: * foo * bar * spam Again, such keywords would not be rendered in 'output formats' (HTML, PS, etc.).
There's also a semi-convention I've seen where a module's doc string is also used as its documentation for Unix commands, and one substitutes in sys.argv[0] - i.e., the command used to invoke the script - as a string into the "Usage:" line. It's a rather hacky trick, and perhaps not to worry about too much.
I'd rather leave that to the coder who does the if __name__ == '__main__' code. sys.argv is a runtime-built construct, and I think docstrings should be dependent on compile-time information only. --david
David Ascher wrote:
Paragraphs are separated by one or more blank lines.
As you say later on, I think this does cause some over-use of whitespace...
Agreed. Let's kill them.
Characters between # signs and the end of the line are stripped by the docstring parser.
This is a Bad Thing - I have quite often needed to discuss things in doc strings which include use of the "#" character - not least if I'm parsing a little language that uses "#" as its comment character! So losing stuff thus would be difficult. Either (a) why do we need comments in doc strings, or (b) provide a way to escape the "#" character.
I forgot to mention this in my original reply. I also think that this is a bad idea. I don't think we need meta-comments for the doc-strings. I don't like the idea even if we find a way to escape '#'.
but the above gets oververbose. I suppose one could instead use a list syntax:
Contributors:: - John Doe - Ronald Reagan - Francois Mitterand
Yes, and this goes with what David had in his proposal about bullets.
since I don't see the ambiguity in allowing the omission of the vertical whitespace here, *if* one allows that some care would be needed with hyphenation! (i.e., one can't allow one's hyphens to start a line, which is awkward but probably not too bad). Another possibility might be to allow "Python list" syntax - I started off disliking this, but over the last few minutes it has grown on me:
Contributors:: [ John Doe, Ronald Reagan, Francois Mitterand ]
(again, highjacking Python's syntax).
Again as long as we don't go having meta-compilation in the first version of the system.
No, on thinking about it, I would vote for either:
1) use of white space as David proposes (pro: utter simplicity, con: doesn't quite look as nice as I'd like) 2) allow Python list syntax (pro: emphasises this is for short lists, con: a bit odd) 3) detect bullet characters at the "start of line" (pro: still fairly simple, con: one has to take care about, e.g., dashes in text) Ah - I just realised that negative numbers at the start of a line probably kill that one...
This one is also a bit ugly, but how about a hybrid: List [ * item 1 * item 2 [ * sub-item 1 * sub-item 2 ] * item 3 ] -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org
David Ascher wrote:
Proposed format for docstrings: ... Is there value in having string interpolation? David Arnold mentioned
__version__ = "$Revision$[11:-2] __date__ = "$Date$
which raises some issues. I don't think that having [11:-2] evaluated by the docstring parser is a wise idea. However, I can imagine that the module author could do:
__version__ = "$Revision$"[11:-2]
in the Python code, and then
Version:: %(__version__)s
in the docstring and that such a simple string interpolation mechanism could have value. I'm not sure it's worth the complication though. What dictionary would be used to do the interpolation?
This raises the question of whether to parse or evaluate the loaded module. Evaluation has the benefit of providing "automatic" context, i.e. the symbols defined in the global namespace are exactly the ones relevant for class definitions, etc. It probably makes contruction of interdepence graphs a lot easier to write. On the downside you have unwanted side effects due to loading different modules. Some notes on the proposal: · Mentioning the function/method signature is ok, but sometimes not needed since e.g. the byte code has enough information to deduce the signature from it. This is not true for builtin function which is probably the reason for all builtin doc strings to include the signature. · I would extend the reference scheme to a lookup in the module globals in case the local one (in the Reference section) fails. You could then write e.g. "For details see the [string] module." and the doc tool would then generate some hyperlink to the string module provided the string module is loaded into the global namespace. · Standard symbols like __version__ could be included and used by the doc tool per default without the user specifying any special "Version:: %(__version__)s" % globals() tags. BTW, for some code which does online formatting of the doc strings, have a look at my hack.py script. It includes a function called docs() which prints out all the information it can find on the given target object. Here's an example:
docs(string.upper) upper : upper(s) -> string
Return a copy of the string s converted to uppercase.
docs(string.zfill) zfill(x, width) : zfill(x, width) -> string
Pad a numeric string x with zeros on the left, to fill a field of the specified width. The string x is never truncated. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 32 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
MAL said: 7 I would extend the reference scheme to a lookup in the module globals in case the local one (in the Reference section) fails. You could then write e.g. "For details see the [string] module." and the doc tool would then generate some hyperlink to the string module provided the string module is loaded into the global namespace. We have, it occurs to me, another important namespace: unimported modules. Thus the string module doesn't import re, I assume, but may wish to refer to it (e.g. to say `this function is a cheap variant of the eponymous one in re') in its doc-strings. Fortunately, we also have a handy name to hang this namespace off (which can't coincide with a name in either of our namespaces): import. Thus: `this function is a cheap variant of import.re.search' could be sensible in doc strings. Note, however, that some bypassing of this may be achieved using the [blah] notation (which is good). I have a problem with too much vertical white space, but I believe the perturbations Tibs suggested (and which match what's in gendoc / pythondoc - if my memory isn't disserving me again - so must be feasible) suffice to deal with that. I can make my editor window more than a hundred columns wide if I want, and know that code lines jutting past that are too long; but I still only get 55 lines in sight at the same time, and real code often involves wanting to see more than that. This situation gets badly exacerbated by being obliged to throw gratuitous blank lines (though not as much as by my tendency to verbosity). But, like Tibs, I can live with the vspace if I must. What happened to gendoc / pythondoc ? Eddy.
Edward Welbourne writes:
We have, it occurs to me, another important namespace: unimported modules. Thus the string module doesn't import re, I assume, but may wish to refer to it (e.g. to say `this function is a cheap variant of the eponymous one in re') in its doc-strings. Fortunately, we also have ... Note, however, that some bypassing of this may be achieved using the [blah] notation (which is good).
Excellent point! I think this can be handled very nicely by stating that the [name] syntax use (in this order): 1. the standard Python search sequence 2. fully-qualified names into the standard library The latter can be implemented in a number of different ways depending on the desired level of efficiency and willingness to pre-process documentation. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives
On Mon, 29 Nov 1999, M.-A. Lemburg wrote:
This raises the question of whether to parse or evaluate the loaded module. Evaluation has the benefit of providing "automatic" context, i.e. the symbols defined in the global namespace are exactly the ones relevant for class definitions, etc. It probably makes contruction of interdepence graphs a lot easier to write. On the downside you have unwanted side effects due to loading different modules.
Good point. Too many modules "do things" on import, some exceedingly expensive. I have written modules where the import never ends, by design =). I'm afraid that parsing is all we can do safely with the Python code. That does make interpolation much more delicate. Maybe we can do everything but string interpolation w/ parsing, and then defer string interpolation until and if the module can be evaluated safely. Somehow we'd need to indicate to the docstring processor whether that evaluation is safe or not.
Some notes on the proposal:
· Mentioning the function/method signature is ok, but sometimes not needed since e.g. the byte code has enough information to deduce the signature from it. This is not true for builtin function which is probably the reason for all builtin doc strings to include the signature.
Right. It's not true for builtins, extension module functions, and I'm not sure how easy it is for JPython code. I have no problem with somehow making it easy to omit those in cases where the information can be obtained through the bytecode.
· I would extend the reference scheme to a lookup in the module globals in case the local one (in the Reference section) fails. You could then write e.g. "For details see the [string] module." and the doc tool would then generate some hyperlink to the string module provided the string module is loaded into the global namespace.
Sounds good to me!
· Standard symbols like __version__ could be included and used by the doc tool per default without the user specifying any special "Version:: %(__version__)s" % globals() tags.
Fine. I think that falls somewhat outside of the 'docstring' proposal, but I agree with it. --david PS: Marc-Andre, how do you get these nice bullet characters in your emails? What character is that? =)
Some people on this list should remember the development days of gendoc and it's cleaner successor pythondoc written by Dan Larsson (gosh I hope I'm not the only one)! This thread rehashes much of what has already been discussed. We pleaded back then for ideas/opinions/hacked code to help improve the working code Dan wrote but got little response. I'm glad to see folks thinking along these lines again. Please take a look at pythondoc and use it as a starting point for a full featured documentation generator. It uses the structured text approach for doc string parsing, and has options for either parsing the source or importing the module to gather metadata, (the later is necessary to document C modules). -Robin Friedrich See: http://starship.python.net/crew/danilo/
On Mon, 29 Nov 1999, Robin Friedrich wrote:
Some people on this list should remember the development days of gendoc and it's cleaner successor pythondoc written by Dan Larsson (gosh I hope I'm not the only one)!
Yes, I remember it. Thanks for the reminder and pointer, Robin!
We pleaded back then for ideas/opinions/hacked code to help improve the working code Dan wrote but got little response.
FWIW, I think that one problem gendoc/pythondoc had in terms of strategy was that it was billed as a 'tool'. I think that if we establish a 'blessed standard' then any standard-compliant tool has a guaranteed user base, and has a far greater likelihood of long-term success. Also, once the format is documented, then folks who don't like gendoc or for whatever reason want to do it 'their own way' can still do it in a compatible way. I'll start digging in gendoc to see the differences between its format and what I've been discussing. I'd love to leverage it to build a reference implementation. Dan Larsson, are you reading this discussion? We could use your experience here! --david
http://www.python.org/sigs/doc-sig/status.html Contains an old summary of the formatting rules for Structured Text use in doc strings. Oddly Dan's subscription to this list is disabled, probably from an old address. The latest address I have for him is Daniel.Larsson@telia.com ----- Original Message ----- From: David Ascher <da@ski.org> To: Robin Friedrich <friedrich@pythonpros.com> Cc: <doc-sig@python.org>; Daniel Larsson <Daniel.Larsson@vasteras.mail.telia.com> Sent: Monday, November 29, 1999 1:22 PM Subject: Re: [Doc-SIG] docstring grammar
On Mon, 29 Nov 1999, Robin Friedrich wrote:
Some people on this list should remember the development days of gendoc and it's cleaner successor pythondoc written by Dan Larsson (gosh I hope I'm not the only one)!
Yes, I remember it. Thanks for the reminder and pointer, Robin!
We pleaded back then for ideas/opinions/hacked code to help improve the working code Dan wrote but got little response.
FWIW, I think that one problem gendoc/pythondoc had in terms of strategy was that it was billed as a 'tool'. I think that if we establish a 'blessed standard' then any standard-compliant tool has a guaranteed user base, and has a far greater likelihood of long-term success. Also, once the format is documented, then folks who don't like gendoc or for whatever reason want to do it 'their own way' can still do it in a compatible way.
I'll start digging in gendoc to see the differences between its format and what I've been discussing. I'd love to leverage it to build a reference implementation.
Dan Larsson, are you reading this discussion? We could use your experience here!
--david
http://www.python.org/sigs/doc-sig/status.html
Contains an old summary of the formatting rules for Structured Text use in doc strings.
Oddly Dan's subscription to this list is disabled, probably from an old address. The latest address I have for him is Daniel.Larsson@telia.com
----- Original Message ----- From: David Ascher <da@ski.org> To: Robin Friedrich <friedrich@pythonpros.com> Cc: <doc-sig@python.org>; Daniel Larsson <Daniel.Larsson@vasteras.mail.telia.com> Sent: Monday, November 29, 1999 1:22 PM Subject: Re: [Doc-SIG] docstring grammar
On Mon, 29 Nov 1999, Robin Friedrich wrote:
Some people on this list should remember the development days of gendoc and it's cleaner successor pythondoc written by Dan Larsson (gosh I hope I'm not the only one)!
Yes, I remember it. Thanks for the reminder and pointer, Robin!
We pleaded back then for ideas/opinions/hacked code to help improve
Hmm, I think I had an old email address on the list, and since the latest employment haven't enabled me to do much Python programming :-(, I sort of forgot to fix the problem. I'll fix that. There is an archive for the list, right? So I can catch up on what you all are talking about. Daniel Larsson ----- Original Message ----- From: Robin Friedrich <friedrich@pythonpros.com> To: David Ascher <da@ski.org> Cc: <doc-sig@python.org>; <Daniel.Larsson@telia.com> Sent: Monday, November 29, 1999 8:44 PM Subject: Re: [Doc-SIG] docstring grammar the
working code Dan wrote but got little response.
FWIW, I think that one problem gendoc/pythondoc had in terms of strategy was that it was billed as a 'tool'. I think that if we establish a 'blessed standard' then any standard-compliant tool has a guaranteed user base, and has a far greater likelihood of long-term success. Also, once the format is documented, then folks who don't like gendoc or for whatever reason want to do it 'their own way' can still do it in a compatible way.
I'll start digging in gendoc to see the differences between its format and what I've been discussing. I'd love to leverage it to build a reference implementation.
Dan Larsson, are you reading this discussion? We could use your experience here!
--david
David Ascher wrote:
On Mon, 29 Nov 1999, M.-A. Lemburg wrote:
This raises the question of whether to parse or evaluate the loaded module. Evaluation has the benefit of providing "automatic" context, i.e. the symbols defined in the global namespace are exactly the ones relevant for class definitions, etc. It probably makes contruction of interdepence graphs a lot easier to write. On the downside you have unwanted side effects due to loading different modules.
Good point. Too many modules "do things" on import, some exceedingly expensive. I have written modules where the import never ends, by design =). I'm afraid that parsing is all we can do safely with the Python code. That does make interpolation much more delicate. Maybe we can do everything but string interpolation w/ parsing, and then defer string interpolation until and if the module can be evaluated safely. Somehow we'd need to indicate to the docstring processor whether that evaluation is safe or not.
I think gendoc did this with a command line switch... well the early versions did (I think under a different name though, or perhaps the name is different now ?).
Some notes on the proposal:
· Mentioning the function/method signature is ok, but sometimes not needed since e.g. the byte code has enough information to deduce the signature from it. This is not true for builtin function which is probably the reason for all builtin doc strings to include the signature.
Right. It's not true for builtins, extension module functions, and I'm not sure how easy it is for JPython code. I have no problem with somehow making it easy to omit those in cases where the information can be obtained through the bytecode.
There's code in hack.py for the extraction and also a more generic module by Fredrik Lundh for building signature strings.
· I would extend the reference scheme to a lookup in the module globals in case the local one (in the Reference section) fails. You could then write e.g. "For details see the [string] module." and the doc tool would then generate some hyperlink to the string module provided the string module is loaded into the global namespace.
Sounds good to me!
Without too much parsing overhead this only works for the evaluation technique though. Would be nice to have... even if it doesn't work for some reason (the doc tool could then just produce some different markup for the reference string, e.g. put it in italics).
· Standard symbols like __version__ could be included and used by the doc tool per default without the user specifying any special "Version:: %(__version__)s" % globals() tags.
Fine. I think that falls somewhat outside of the 'docstring' proposal, but I agree with it.
True. It's something I've added to my hack.py formatting functions and I thought it would be nice to have... (it also encourages people to use __version__).
--david
PS: Marc-Andre, how do you get these nice bullet characters in your emails? What character is that? =)
It's chr(183) in Latin-1: the famous center dot ;-) I've tweaked my keyboard setup to have it handy... -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 32 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
David Ascher wrote:
Some notes on the proposal:
· Mentioning the function/method signature is ok, but sometimes not needed since e.g. the byte code has enough information to deduce the signature from it. This is not true for builtin function which is probably the reason for all builtin doc strings to include the signature.
Right. It's not true for builtins, extension module functions, and I'm not sure how easy it is for JPython code. I have no problem with somehow making it easy to omit those in cases where the information can be obtained through the bytecode.
Perhaps we could use a convention: if the first line starts with a Python identifier followed by '(' and the identifier matches the name of the doc string owning object (function or method), then no byte code lookup is done. Otherwise such a lookup causes a new first line to be prepended to the processed doc string (with '-> ?' return value). This should cover most cases. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 32 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
David Ascher writes:
Right. It's not true for builtins, extension module functions, and I'm not sure how easy it is for JPython code. I have no problem with somehow making it easy to omit those in cases where the information can be obtained through the bytecode.
The same information can be obtained from the parse tree; there's no need to generate or examine bytecode, unless you want to extend this to work on .pyc files for which you don't have sources!
Fine. I think that falls somewhat outside of the 'docstring' proposal, but I agree with it.
We should think of in-source documentation; docstrings are simply an important location for data-entry. ;-) -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives
M.-A. Lemburg writes:
This raises the question of whether to parse or evaluate the loaded module. Evaluation has the benefit of providing "automatic"
Guido agrees with me on this one, sorry: it must be extractable from the parse tree. Evaluation of module code is a *huge* no-no; we should be able to run documentation tools on unknown (== untrusted) code.
context, i.e. the symbols defined in the global namespace are exactly the ones relevant for class definitions, etc. It probably makes contruction of interdepence graphs a lot easier to write. On the downside you have unwanted side effects due to loading different modules.
No, these will always be hard in Python. Order of imports can be significant, and the set of imports can change over the life of a module (imports can be delayed to reduce startup time, or a number of alternatives may be supported).
· Mentioning the function/method signature is ok, but sometimes not needed since e.g. the byte code has enough information to deduce the signature from it. This is not true for builtin function which is probably the reason for all builtin doc strings to include the signature.
That's right. There's little need for signature repitition for Python code. There will be times it is appropriate, but that's the oddball case.
· I would extend the reference scheme to a lookup in the module globals in case the local one (in the Reference section) fails. You could then write e.g. "For details see the [string] module." and the doc tool would then generate some hyperlink to the string module provided the string module is loaded into the global namespace.
Definately; the search sequence should mirror that of the runtime, and the details need not be repeated. That's what we have the language reference for.
· Standard symbols like __version__ could be included and used by the doc tool per default without the user specifying any special "Version:: %(__version__)s" % globals() tags.
Definately. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives
"Fred L. Drake, Jr." wrote:
M.-A. Lemburg writes:
This raises the question of whether to parse or evaluate the loaded module. Evaluation has the benefit of providing "automatic"
Guido agrees with me on this one, sorry: it must be extractable from the parse tree. Evaluation of module code is a *huge* no-no; we should be able to run documentation tools on unknown (== untrusted) code.
That's why gendoc has a switch to be able to either parse the module or import it. Note that imports are the only way to extract information from C extensions.
context, i.e. the symbols defined in the global namespace are exactly the ones relevant for class definitions, etc. It probably makes contruction of interdepence graphs a lot easier to write. On the downside you have unwanted side effects due to loading different modules.
No, these will always be hard in Python. Order of imports can be significant, and the set of imports can change over the life of a module (imports can be delayed to reduce startup time, or a number of alternatives may be supported).
I guess that's the price you have to pay for *automatic* documentation extraction. Perhaps there is a way to only extract class/function/method __doc__ strings from pyc-modules without actually running them, since those are really our only targets. [looks at some module code objects...] It wasn't obvious from the code objects I just looked at, but there could be way... after all the information must hidden somewhere between those bytes codes ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 28 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
That's why gendoc has a switch to be able to either parse the module or import it. Note that imports are the only way to extract information from C extensions.
Hm... C extensions are also the most dangerous (in some cases) to import, and further more this restricts you to generating documentiation for modules that actually work on your current platform. Not a good idea.
Perhaps there is a way to only extract class/function/method __doc__ strings from pyc-modules without actually running them, since those are really our only targets.
[looks at some module code objects...]
It wasn't obvious from the code objects I just looked at, but there could be way... after all the information must hidden somewhere between those bytes codes ;-)
Quite easily: & python Python 1.5.2+ (#929, Aug 4 1999, 13:59:33) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam [startup.py ...] [startup.py done] >>> import string >>> fn = string.__file__ >>> fn '/usr/local/lib/python1.5/string.pyc' >>> import marshal >>> f = open(fn, "rb") >>> f.seek(8) >>> c = marshal.load(f) >>> f.close() >>> c <code object ? at 104a60, file "/usr/local/lib/python1.5/string.py", line 0> >>> dir(c) ['co_argcount', 'co_code', 'co_consts', 'co_filename', 'co_firstlineno', 'co_flags', 'co_lnotab', 'co_name', 'co_names', 'co_nlocals', 'co_stacksize', 'co_varnames'] >>> print c.co_consts[0] Common string manipulations. Public module variables: whitespace -- a string containing all characters considered whitespace lowercase -- a string containing all characters considered lowercase letters uppercase -- a string containing all characters considered uppercase letters letters -- a string containing all characters considered letters digits -- a string containing all characters considered decimal digits hexdigits -- a string containing all characters considered hexadecimal digits octdigits -- a string containing all characters considered octal digits >>> codes = filter(lambda x: type(x).__name__ == "code", c.co_consts) >>> for x in codes: print x.co_consts[0]; print "-"*20 lower(s) -> string Return a copy of the string s converted to lowercase. -------------------- (etc.) --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
Hm... C extensions are also the most dangerous (in some cases) to import, and further more this restricts you to generating documentiation for modules that actually work on your current platform. Not a good idea.
What I'm hearing is that C extensions should just NOT be documented inline. Perhaps the interperter should look for their docstrings in .pdc files...to be defined another day!!! -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I always wanted to be somebody, but I should have been more specific." --Lily Tomlin
Guido van Rossum wrote:
Hm... C extensions are also the most dangerous (in some cases) to import, and further more this restricts you to generating documentiation for modules that actually work on your current platform. Not a good idea.
Paul Prescod replied:
What I'm hearing is that C extensions should just NOT be documented inline. Perhaps the interperter should look for their docstrings in .pdc files...to be defined another day!!!
C extensions should still have doc strings, but these only serve a function for interactive use, not to generate the separate documentation. The separate documentation of C modules could be extracted in a different way from the source -- but that's a separate problem. --Guido van Rossum (home page: http://www.python.org/~guido/)
Paul Prescod writes:
What I'm hearing is that C extensions should just NOT be documented inline. Perhaps the interperter should look for their docstrings in .pdc files...to be defined another day!!!
Perhaps a reasonable approach would be to write the documentation as a Python source file that offered right interface and some sort of flag that the classes are really extension types? This would make it reasonably easy to work with and explain, and no new markup language has to be introduced simply to document an extension module. It also wouldn't hurt that no additional tools would be needed! ;-) -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives
Paul Prescod writes:
What I'm hearing is that C extensions should just NOT be documented inline. Perhaps the interperter should look for their docstrings in .pdc files...to be defined another day!!!
Fred Drake:
Perhaps a reasonable approach would be to write the documentation as a Python source file that offered right interface and some sort of flag that the classes are really extension types? This would make it reasonably easy to work with and explain, and no new markup language has to be introduced simply to document an extension module.
I experimented with this for the threading module. Java also does this for native methods (which always have a stub declaring their types in a Java class file). On the downside, it decouples the doc from the source, which was the primary motivation for docstring extraction, and perhaps writing it directly in latex/SGML/whatever is easier than writing a dummy Python module -- it certainly gives more control. Plus, it's more likely that specialized editors exist.
It also wouldn't hurt that no additional tools would be needed! ;-)
But probably existing tools would have to be extended to know about this arrangement, since it's not completely transparent. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Fri, 3 Dec 1999, Guido van Rossum wrote:
Paul Prescod writes:
What I'm hearing is that C extensions should just NOT be documented inline. Perhaps the interperter should look for their docstrings in .pdc files...to be defined another day!!!
Fred Drake:
Perhaps a reasonable approach would be to write the documentation as a Python source file that offered right interface and some sort of flag that the classes are really extension types? This would make it reasonably easy to work with and explain, and no new markup language has to be introduced simply to document an extension module.
I experimented with this for the threading module. Java also does this for native methods (which always have a stub declaring their types in a Java class file).
But I think it's crucial that IDEs for example can find out what the signature for functions defined in extension modules is. So, I'm going to propose: look for module_doc.py if found: use it else: if user_allows_imports: import module and use introspection (e.g. __doc__) else: tough. Some users trust the code, some don't. This way what the tool provides is up to the user. [ ] Enable JavaScript [ ] Enable Cookies [ ] Enable Java [ ] Enable Python Import Which brings up the question of whether we want to associate RSA certificates with modules. JUST KIDDING! --da
M.-A. Lemburg writes:
Perhaps there is a way to only extract class/function/method __doc__ strings from pyc-modules without actually running them, since those are really our only targets.
I look forward to your software. ;-) But see also my response to Paul's message about documenting extension modules. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives
"Fred L. Drake, Jr." wrote:
M.-A. Lemburg writes:
Perhaps there is a way to only extract class/function/method __doc__ strings from pyc-modules without actually running them, since those are really our only targets.
I look forward to your software. ;-)
Ahh, yes... well... perhaps someday ;-) right now I'm busy with other things which go one level deeper, namely the Unicode intergration.
But see also my response to Paul's message about documenting extension modules.
About the add-on Python module with included doc strings... well, I don't think that's easily possible since function objects are immutable AFAIK (and these contain the doc strings). Also separating the docs and the implementation too far is not very convenient, e.g. I use C macros to help me with this: Py_C_Function( mxDateTime_now, "now()\n\n" "Returns a DateTime-object reflecting the current local time." ) { ... } I have to add that I maintain my package docs as a completely separate entities: experience has it that leaving things undocumented for a while is better if you work on new things in already pretty wide spread tools such as mxDateTime. Also, the details you want to include in the "real" docs sometimes don't make sense within the code or simply don't fit anywhere. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 28 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
Manuel: if David includes `Keyword::' in his bits and pieces, would Keyword:: indexing keyword data retrieval searching (within a doc-string) contain the information you've been wanting to take out of your \indexaboutindexing \indexaboutkeyword \indexretrieval \indexsearching etc. (with apologies for not having followed your system well enough to mimic the names you'd actually use) ? What I've understood of your scheme appears to tell me the answer Yes. If so, I guess you could just slurp the Keyword slice out of a namespace-tree generated from doc-strings and, I suspect, happiness would abound and confusion abate. I know you have bits that define an indexing command that expands to several indexing commands, which this lacks: but could the same effect be arrived at by turning your set of indexing command definitions into an `expert system' that expands some keywords ? And ... to folk who know about the state of the craft of indexing: is there a better way to go with this ? After all, I'm pretty much just borrowing from one of HTML's META tags here ... Now, back to the spec itself:
For compatibility with Guido, IDLE ... len(object) -> integer
i.e. docstring-startline: archetypical-call [ '->' return ] ['--' summary ] Quite apart from compatibility - this is a *good* approach. I guess that could be why Guido does it ...
Each paragraph is either 'text' or a 'keyword-tagged block'. Sounds good. Flesh and skeleton.
A 'keyword-tagged block' is nested much like Python code. Yes, thank you very much, beautiful - this will give us scope for nested sub-structures in the keyword-tagged data: in particular, get rid of
I'm with Tibs on the #-comment stuff - particularly the liberty to simply embed a piece of python code in a doc string. that Date_of_release ... use Author:: David Ascher Release:: Date:: 1999/11/28 Name:: post-gendoc-0.1 Stability:: draft etc. I was initially confused about : or :: because your examples began with the first keyword I'd thought of, namely Example, and only used one : with that one, going on to :: for the rest - then I noticed that you weren't offering it as an example keyword but using it to introduce your list of examples. While I would far sooner have only one :, those of us advocating this need to watch for the danger that the parser will get similarly confused between the author's use of `Example:' in the manner of English idiom and in its keyword sense (and, of course, it isn't the only word to worry about). (The flip-side is: I can see myself getting irritated by the need to say Example:: as a keyword immediately after I've ended a paragraph with the word example ...) Note: this keyword representation is isomorphic to XML via `the usual' equivalences between (pythonic) indentation-structuring and the begin-end style of structuring that C and XML use. keyword: single-liner -> <keyword>single-liner</keyword> keyword: indent block dedent -> <keyword> block (possibly transformed down a bit itself) </keyword>
Some keywords can have special parsing rules,
coo, context-sensitive parsing ;^) Good idea. Lets some things only be keywords where they need to be ...
The above was (quickly) written with parsing in mind. Is it really easily parseable? If not, what needs to be changed so that it is parseable? Well, the bulleting (and descriptive list stuff) has been explored already in pythondoc / gendoc, so clearly it's all `within scope'. Heh. And between David and Tibs, surely we have the parsing technology ...
On the subject of vertical space ... I'd guess the parser won't need a blank line between * the end of a paragraph and * the start of its first indented subordinate ? Though, indeed, I do want to take out the other blank line here, and I thought gendoc managed that ...
Is there value in having string interpolation? Yes. Definitely. I hadn't realised it was possible until you mentioned it, now I'm sure it's Needed.
Hopefully constructively, having had some time to think on it, I'd say Thoroughly so.
Hierarchical namespaces, Context-sensitive parsing, Mappable to XML but written like python, Scope for indexing, and for arbitrary extension within sub-namespaces, Conformance to the only important standard (Guido's de facto habits ;^) Proposed by someone who knows how to write parsers ... No need for the run-time system to bother with any of it (all hidden inside the doc string) Thank you David, Eddy. -- PS - David: you do realise, though, that the committee won't keep up the momentum on this unless you ruthlessly play Gdo until he joins in ...
On Mon, 29 Nov 1999, Edward Welbourne wrote:
I'm with Tibs on the #-comment stuff - particularly the liberty to simply embed a piece of python code in a doc string.
Agreed. I am removing that bit about ignoring #'ed text from my proposal.
I was initially confused about : or :: because your examples began with the first keyword I'd thought of, namely Example, and only used one : with that one, going on to :: for the rest - then I noticed that you weren't offering it as an example keyword but using it to introduce your list of examples. While I would far sooner have only one :, those of us advocating this need to watch for the danger that the parser will get similarly confused between the author's use of `Example:' in the manner of English idiom and in its keyword sense.
After a little thought, I'm tempted to remove the :: requirement as well. In my proposal, I think that using the : after Example was a mistake in style. If it was a heading then it should just be text w/o a colon. If it was supposed to be more of a sentence then it should have been spelled out, as in: For example, we can have: The *intent* was, however, to avoid the 'danger' you note above. I'm still open to go either way, "safe" or "comfortable". I forgot two markups: *this* is bold and _this_ is italic. Bold and italic markups must begin and end within a paragraph (I'd say 'within a sentence' but I don't want to complicate the parser with a sentence type). No space allowed between *'s and _'s and their contents.
On the subject of vertical space ... I'd guess the parser won't need a blank line between * the end of a paragraph and
* the start of its first indented subordinate ?
Though, indeed, I do want to take out the other blank line here, and I thought gendoc managed that ...
By all means, we should borrow from gendoc if it's already solved those issues. I admit not to having looked deeply into gendoc. I'll look into this some more a bit later.
Proposed by someone who knows how to write parsers ...
Uh? Me? No way. You must be confusing me with someone else! --david
David Ascher wrote:
I was initially confused about : or :: because your examples began with the first keyword I'd thought of, namely Example, and only used one : with that one, going on to :: for the rest - then I noticed that you weren't offering it as an example keyword but using it to introduce your list of examples. While I would far sooner have only one :, those of us advocating this need to watch for the danger that the parser will get similarly confused between the author's use of `Example:' in the manner of English idiom and in its keyword sense.
After a little thought, I'm tempted to remove the :: requirement as well. In my proposal, I think that using the : after Example was a mistake in style. If it was a heading then it should just be text w/o a colon. If it was supposed to be more of a sentence then it should have been spelled out, as in:
For example, we can have:
The *intent* was, however, to avoid the 'danger' you note above. I'm still open to go either way, "safe" or "comfortable".
I'd suggest using '^ *[a-zA-Z_]+[a-zA-Z_0-9]*: *' as RE for keywords, i.e. keywords are Python identifiers immediatly followed by a colon starting a line of a doc string. That should avoid most complications, I guess. For example: blablablba and ...long sentence..., for example : would not be parsed as keywords, while Example: a=1;b=2 does fit the above definition (I don't see a problem with including examples in the parsed sections, BTW... examples are often much more intuitive to understand than complex definitions). Something else: How would the following be handled: Arguments: file -- a file like object mode -- file mode indicator as defined in [__builtin__.open] Arguments: buffersize -- optional buffer size in bytes that is, what happens if a keyword appears twice ? In the above case an error should be raised, but sometimes this may be useful: Example: first multi-line example Example: second multi-line example Hmm, perhaps these two examples should be wrapped using bullets: Examples: - first example spanning multiple lines - second example -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 32 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
On Mon, 29 Nov 1999, M.-A. Lemburg wrote:
How would the following be handled:
Arguments: file -- a file like object mode -- file mode indicator as defined in [__builtin__.open]
That, btw, is illegal -- the block must either be a single-line block or an indented block.
Arguments: buffersize -- optional buffer size in bytes
that is, what happens if a keyword appears twice ? In the above case an error should be raised, but sometimes this may be useful:
Agreed -- I made a similar point in another email which waved 'hi!' to yours as they crossed somewhere over the atlantic. =)
Example: first multi-line example
Example: second multi-line example
Hmm, perhaps these two examples should be wrapped using bullets:
Examples: - first example spanning multiple lines - second example
Depends on the case. In a long docstring, one might want to have several sections, each with Examples: subsections. I propose that part of the definition of a keyword is (along with any special parsing rules) whether it can be duplicated in a docstring. --david
David Ascher wrote:
I propose that part of the definition of a keyword is (along with any special parsing rules) whether it can be duplicated in a docstring.
FAPP we can approach both this and the `context-sensitive' stuff from the same point of view as SGML: precisely because Blah: something legitimate in a Blah block maps directly to <BLAH> something legitimate within a BLAH </BLAH> so all the kinds of rule that a DTD could have imposed on BLAH are sensible things to impose on Blah. In particular, rather than `whether it can be duplicated in a docstring' we have a nested tree structure in our hands, so we can ask whether it can be duplicated as a child of its parent. Suppose Date to be unique: Author: David Ascher Release: Date: 1999/Nov/28 Name: proto-post-gendoc:0.2 Media: e-mail Bugs: Report: Date: 1999/Nov/29 As initially specified, the denotation for italic conflicts with python identifiers where these genuinely start and end in underscore. Status: resolved, adopting gendoc's approach Report: Date: ... in which Date is `unique' but shows up many times in one doc-string. I would suggest that a tag is either unique in all contexts that allow for it, or in none (so we don't have ickiness in which *some* tags allow several Date subordinates - that kind of stuff makes it harder for folk to remember what's unique and what isn't). The right layer A note on tags: we seem to be headed for `python identifier followed by a colon'. I'd like to argue for RFC 822 headers - that is, specifically, to allow hyphens, so as to allow Bugs: Reports-to: doc-sig@python.org Report: ... as above ... and, indeed, to change Bugs: to Known-bugs: Of course we could use _, but hyphen comes more naturally to text and the parser for our keywords (unlike that for python identifiers) doesn't have to worry about subtraction as `something we might be doing here' to confuse with recognising the keyword. For the sake of a coarse reprise of where I think we are: Within docstrings, paragraphs, `text fields', descriptive and bulleted lists are marked up using pretty much what gendoc used, though we seem to be making some tweaks. The main addition of David's proposal is a structured data format entirely analogous to a *ML's begin-end structure, but transformed to indent/dedent format - in exactly the same way that one transforms the begin-end structure of C or Pascal into python code. This gets us all the desiderata that XML would provide, but it does it in a pythonic format. The typical block allows (depending on the keyword which introduced it) an assortment of keywords to be used to introduce sub-blocks; it may also allow paragraphs and/or lists within it. The docstring is a block which is willing to hold all `outer' structural groups (i.e. top-level keywords, with their blocks, and paragraphs). A paragraph is a block which (possibly along with the blocks started by some keywords) may have sub-blocks which are list items. We can effectively write the rules for all this as a DTD and parse it into a form which can be manipulated *as if* it had been obtained by parsing a lump of XML - in particular, it should be trivial to perform XSL-ish tree transformations to convert it to whatever DTD The Manual wants as its input; while leaving ample scope for the inventive toolwright to perform sophisticated information massaging on docstrings, and not obliging us to use all that ugly XML taggery in the source. We need a moderately short list (of order a dozen) of `top-level' tags: subordinate to each we may introduce a few others (context sensitivity) but simplicity demands vocabulary restraint and re-use. The top level seems to run to: In all docstrings: Author(s), Release, Contributors Example(s), Test-script, Code Warning In docstrings of callables: Argument(s), Return, Raises In docstrings of classes: Supports/Implements/Mimics... (one synonym) Subclassing (for folk using this class as a base - what to override) Attributes, Methods (each supporting Private and Public as subordinates) In docstrings of modules: Contents so 7 universally-applicable keywords, (up to) the rest of a dozen in each of the specific contexts for docstrings. I would reckon we can keep to about another dozen keywords spread around as subordinates of the above (Date, Private, Public, Expect (for Test-script), Required & Optional (for arguments), ...). On test-scripts (in the manner of Tim Peters) we may not need a Test-script keyword at all: simply using >>> is how the tool recognises it, and there's nothing to stop the docstring parser recognising this as a special indent mark that transforms to target XML *as if* it had come from a block introduced by Test-script:. Eddy.
After a little thought, I'm tempted to remove the :: requirement as well.
I agree this would be a good thing. I originally intended to reply in context to all the good suggestions - however, I dont look like finding time until after Christmas :-( So here is my 2c worth, mainly echoing comments from others: Drop the absolute requirement for the whitespace, especially with bulleted lists. People will generally not be editing these strings in a word-processor, so will have control over the line breaks. Thus: * Any line starting with a word followed by a colon can be considered a keyword. If you dont want this, just make sure its not the first word on the line. * A star or dash starting a line can be considered a new list item. Again, if it is truly a hyphen or whatever else, just adjust your line wrap slightly so it is no longer the first word. Other random thoughts: * The [blah] notation is good, but needs to be well defined. eg, "[module.function]" when used in the context of a package should use the same "module scoping" that Python itself uses. However, the use of brackets may conflict with people who use inline code (rather than an example "block" - maybe something like "@" could be used? @module.function@ would be reasonable. * IMO, importing the module to extract this information is fine. For the 1% of cases where it is not and the author of the module needs to use the tool, we could offer a hack - eg "sys.doc_building" will be defined when the tool is running, so could fine tune their code appropriately. For the vast majority of cases, I guess that importing would be just fine and make the tool simpler, thereby giving more chance of it one day existing :-) Indeed, do it the simple way, and the first person who needs the parse-only option can help code it :-) * Example/test code should be clearly identifiable. Tim Peters docstring tester could also be hacked to work with with format. Further, it should be possible to have lots of discrete sample code, each with their own discussion - eg: """ The following code shows how to do this: Example: def foo(): etc /Example: The following code shows how to do that: Example: def bar(): etc As a final note: The tool should be written with distinct "generate" and "collate" phases, simply to resolve the cross-references. It is unreasonable to expect that all cross-references will be capable of being resolved in a single pass. Note sure exactly what this means from an implementation POV, but it is important. Thats about it. I really like this, and feel it can is both powerful and extensible enough to grow with us. All we need now is the tool :-) Mark.
On Tue, 30 Nov 1999, Mark Hammond wrote:
* The [blah] notation is good, but needs to be well defined. eg, "[module.function]" when used in the context of a package should use the same "module scoping" that Python itself uses. However, the use of brackets may conflict with people who use inline code (rather than an example "block" - maybe something like "@" could be used? @module.function@ would be reasonable.
I personally would prefer to keep [] for references and introduce @..@ (or some other delimiter) for inline code, mostly because [] is so common in journals as a way of indicating bibliographic references. I do *not* like StructuredText's use of quotes to do inline code markup.
* IMO, importing the module to extract this information is fine. For the 1% of cases where it is not and the author of the module needs to use the tool, we could offer a hack - eg "sys.doc_building" will be defined when the tool is running, so could fine tune their code appropriately. For the vast majority of cases, I guess that importing would be just fine and make the tool simpler, thereby giving more chance of it one day existing :-) Indeed, do it the simple way, and the first person who needs the parse-only option can help code it :-)
I see. So the workaround for those scripts which can't be imported is to start them with: import sys; if sys.doc_building: sys.exit() Not too bad.
* Example/test code should be clearly identifiable. Tim Peters docstring tester could also be hacked to work with with format.
I need to go back and look at Tim's code again.
Further, it should be possible to have lots of discrete sample code, each with their own discussion - eg: """ The following code shows how to do this: Example: def foo(): etc
/Example: The following code shows how to do that: Example: def bar(): etc
That would be written (with the current proposal): The following code shows how to do this: Example: def foo(): etc The following code shows how to do that: Example: def bar(): etc Is that ok w/ you? --david
[MarkH]
* Example/test code should be clearly identifiable. Tim Peters docstring tester could also be hacked to work with with format.
[DavidA]
I need to go back and look at Tim's code again.
I already did <wink>. Tim's code looks for: ^\s*>>> and then sucks up everything following until the next all-whitespace line or end of docstring (whichever comes first). That is, I figured the contents of an interactive shell window didn't need any markup beyond the leading PS1 Python already sticks there. Given that doctest.py is meant to be usable with near-zero effort, it wouldn't do to require more markup than that. Luckily, it almost fits your definition of a paragraph already. It shouldn't be any real effort to declare that ">>>" introduces a structureless code paragraph extending until the next all-whitespace etc -- given that it's a format for Python docstrings, Python's own output deserves some special treatment <wink>. As to whether doctest should be fiddled to try to interpret some other form of markup too, I don't think so. The markup it inherits from the Python shell is both sufficient and pleasant for its users. Any other kind of embedded sample code almost certainly isn't intended to be auto-verified, so doctest *should* ignore it. Nothing you're likely to do with docstrings is going to create problems for doctest, so the only question is whether doctest's conventions create problems for docstring markup. I think they do now, but "shouldn't": anyone pasting in an interactive session, whether for use with doctest or for some other purpose, is going to want it treated as a code block. full-speed-ahead-ly y'rs - tim
On Tue, 30 Nov 1999, Tim Peters wrote:
Luckily, it almost fits your definition of a paragraph already. It shouldn't be any real effort to declare that ">>>" introduces a structureless code paragraph extending until the next all-whitespace etc -- given that it's a format for Python docstrings, Python's own output deserves some special treatment <wink>.
The only question I suppose is whether one should require a keyword (Test: or other) to keep the top-level syntax trivial, or special-case the recognition of >>>-beginning paragraphs. I'm leaning for the former, as it can evolve to the latter if there is sufficient call for it from the user base, and I think it does keep the code simpler. But I'm willing to be swayed. --david
On Tue, 30 Nov 1999, Tim Peters wrote:
Luckily, it almost fits your definition of a paragraph already. It shouldn't be any real effort to declare that ">>>" introduces a structureless code paragraph extending until the next all-whitespace etc -- given that it's a format for Python docstrings, Python's own output deserves some special treatment <wink>.
The only question I suppose is whether one should require a keyword (Test: or other) to keep the top-level syntax trivial, or special-case the recognition of >>>-beginning paragraphs.
I'm leaning for the former, as it can evolve to the latter if there is sufficient call for it from the user base, and I think it does keep the code simpler. But I'm willing to be swayed.
--david <sway> I would rather minimize the invention (and consequential memorization) of special keywords. Parsing them is not made quite as trivial as it seems (especially when alternate languages are involved). Structured text had the favorable trait of being very easy to remember. Parsers are built using
----- Original Message ----- From: David Ascher <da@ski.org> To: <doc-sig@python.org> Sent: Tuesday, November 30, 1999 12:02 PM Subject: RE: [Doc-SIG] docstring grammar formal definition of special case rules anyway. Where special casing based on context becomes non-obvious to remember is where I would draw the line and resort to literal keywords. </sway> -Robin
[David Ascher]
The only question I suppose is whether one should require a keyword (Test: or other) to keep the top-level syntax trivial, or special-case the recognition of >>>-beginning paragraphs.
I'm leaning for the former, as it can evolve to the latter if there is sufficient call for it from the user base, and I think it does keep the code simpler. But I'm willing to be swayed.
Parsing is never trivial, although it can be made *easy*, and you're already e.g. special-casing the snot out of the first line of the docstring (unless, as someone else recently and regrettably <0.3 wink> suggested, you split that into keyword-introduced "Signature:" and "Summary:" paragraphs). You're going to have *some* way to spell "what follows is an unstructured code block up until the next empty line or end-of-docstring". You'll end up recognizing that with a regexp, like r"^\s*Example:\s*" The code support will consist almost entirely of changing the regexp to r"\s*(Example:|>>>)\s*" Not trivial, but as easy as it could be and stay this side of trivial <wink>. Here's the start of a very long module docstring: """ Rat objects support exact, unbounded rational arithmetic. Skip to MODULE SUMMARY at the end for the short story. Rat objects also support rounding and formating methods sufficient to emulate floating-point arithmetic in your choice of base, number of significant digits, and rounding discipline.
from Rational.Rat import Rat # Rat constructor from Rational import Format # used later print Rat() # no args gets 0 0
Construct a Rat from an int, long, float, or string representing a rational or float:
print Rat(5), Rat(5L), Rat(5.0), Rat("5"), Rat("50e-1") 5 5 5 5 5
Or you can pass two ints or longs, to construct their ratio. The denominator must be >= 1:
print Rat(5, 1), Rat(1, 5), Rat("1/5"), Rat("10/1010_2") 5 1/5 1/5 1/5
The "_2" at the end of the last one there is a "base tag", and says "10/1010" is to be interpreted as a ratio in binary notation. Any int base >= 2 can be used. etc etc etc """ There are dozens of example code blocks in this docstring. Indenting them all and tagging them with redundant "Example:" labels would be both ugly and wasteful. I *love* the thrust of the new proposals here because, frankly, I'm likely never to run a docstring thru any sort of docstring processor (except perhaps to extract the first line), and the current proposals have a nice WYSIWYG flavor. When a *Python* programmer sees ">>>", they know exactly what they're about to get. if-swaying-isn't-effective-next-it's-pouting-and-then-on-to- threats<wink>-ly y'rs - tim
OK - I wrote the following whilst reading through today's batch of messages (that's what happens when you take your elder son swimming on Tuesday afternoons, I guess). So apologies if its a bit stream-of-consciousness... David Ascher wrote:
The only question I suppose is whether one should require a keyword (Test: or other) to keep the top-level syntax trivial, or special-case the recognition of >>>-beginning paragraphs.
I'm leaning for the former, as it can evolve to the latter if there is sufficient call for it from the user base, and I think it does keep the code simpler. But I'm willing to be swayed.
No - keep the keyword. My reasoning is (a) I like it [emotional reaction, which is the real reason (parse that as "it feels more elegant")], and (b) I still have the feeling that on occasion I might want non-test Python script in there, and (c) it *is* a 'logical' subdivision of the text in exactly the same way as the other major divisions, and so deserves its own place. I don't think it leaves us with too many such subdivisions (to reply to Robin Friedrich later on - although I also don't understand his point about more tags making it harder to parse things (unless he means "for humans to parse")). David Ascher later says:
I'd like to finalize the top-level structure, get it in front of GvR's eyeballs, and then we can tackle each subtopic (so far: list processing, reference handling, signature, mandatory keywords, keyword registration process, multilingual keyword support, etc.) at a later date.
Yeah, go for it (I don't actually think that we have *much* fiddly disagreement to resolve about the sub-items, but I agree they ARE less important than the grand sweep. Get the grand sweep approved and documented and we can start playing with appearance - and dammit I might give in and write something - I already know that my translator for the mxTextTools metalanguage has useful chunks of stuff to be lifted out to get SOMETHING going quick, and I hope no-one realises how close I am to giving in and writing something instead of doing work I get paid for...) Later still:
A keyword is a case-sensitive string which: ... As (I think it was) Tibs mentioned, it's syntactic sugar for XML notation
Nah - it was Eddy (he works somewhere not too far away in this building, and keeps mentioning my name in passing (all very flattering) - partly because I stop to interrupt his work every time I go past to see if he's writing something about this...). In the same message, he dislikes:
[Python Language Web Site] is the main source for Python itself. [Starship Python] houses a number of Python user resources.
[Python Language Web Site] -> http://www.python.org [Starship Python] -> http://starship.python.net
in preference for:
[PythonLanguageWebSite] is the main source for Python itself. [StarshipPython] houses a number of Python user resources.
References: PythonLanguageWebSite: http://www.python.org StarshipPython: http://starship.python.net
I like the "References:" tag on aesthetic grounds, whether it makes parsing easier or not (it makes parsing more *regular*, but personally I think we're working for humans here, to a great extent, and should punt the parsing issues a *little* bit). David then worries:
Which leaves open the question of how we can have 'space-enabled' labels for references which can't have spaces in them.
One idea is to tag the [] markup with a ="stringlabel":
[PythonLanguageWebSite="The Python.org website"] is the main source for Python itself.
Another possibility hinted at previously is to enrich the References section:
References: PythonLanguageWebSite: Label: The Python.org website Link: http://www.python.org
either of which, when rendered, would 'do the right thing. I only expect this to be an issue when referring to URLs. Python modules, classes and functions already have perfectly good names.
Hmm. I don't like the "enriched" form - it's just too verbose for the job it's doing (which I *think* will translate into "people won't want to use it"). I don't actually see the problem with allowing spaces in references here, by the way. Granted they need removing (translating in some manner - do they?) when generating XML (but perhaps NOT when generating some other output format). This problem won't go away anyway - if one is translating to J. Random Format that doesn't allow hyphens in names used as references then we would still have the same problem. It doesn't, in this instance, matter to me that the text in [..] need be the same sort of thing as that before a ":", either, if that were the objection. I would favour: [Python Language WebSite] is the main source for Python itself. [Starship Python] houses a number of Python user resources. See [ascher29] for the source of the algorithm. References: [Python Language WebSite]: http://www.python.org [Starship Python]: http://starship.python.net [ascher29]: My famous Ph.D. Dissertation, Foo University, 2029. as being (a) easy to read and (b) easy to parse. The colon after the [..] in the References section is syntactic sugar I would like to keep. The use of the [..] in the References section makes it plainer (to me) that we are talking about the same "label" as used earlier (heh, it looks *exactly* the same!) - the colon reminds me we are "defining" it. I would then not worry too much about what goes between the [..] - I'd be happy for it to be alphanumerics plus underscore, hyphen and whitespace (nb: I'd treat all whitespace as self-identical for this purpose!), or for it to be "anything except [ and ]". On a slight divergence, I would favour allowing non-Referenced references (e.g., the famous "[None]") to dangle happily, with the translation/checking tool emitting a simple (short) warning about them. It also wouldn't disturb me to have "[ text ]" regarded as Not-A-Reference (even if we allow whitespace in references).
PS: I'm working on updating the proposal, but I have other pressing deadlines (such as getting the JPython tutorial ready for IPC8!), so it may not be ready for a couple of days.
Good - without an updated proposal I can probably hold off the urge to program... Am I allowed to disagree with Tim Peters (may he have nice things happen to him)?:
You'll end up recognizing that with a regexp, like
r"^\s*Example:\s*"
No! No! Whilst I realise that any General Purpose, Released With Python tool will probably have to use re 'cos that's all it has, *I* (for one) would never end up recognising anything much with a regexp. Follow the One True Way - convert to mxTextTools (gosh, I feel better now). Otherwise, I think he's causing me to rethink my opinions on tagging code examples (damn, I hate it when that happens). OK - granted we don't need to tag most ">>>" code, because it's 'obvious', is it still valid to tag *test* code? I had assumed it was, but now I'm not sure it is, because I have a sneaky feeling Tim's doc-code-tester *wants* to test all code given as examples to make sure they all work (or "fail in the right way"). Hmm. Ok, that's enough. Really must do the job they're paying me for. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.demon.co.uk/ 2 wheels good + 2 wheels good = 4 wheels good? 3 wheels good + 2 wheels good = 5 wheels better? My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)
[David Ascher]
The only question I suppose is whether one should require a keyword (Test: or other) to keep the top-level syntax trivial, or special-case the recognition of >>>-beginning paragraphs.
I'm leaning for the former, as it can evolve to the latter if there is sufficient call for it from the user base, and I think it does keep the code simpler. But I'm willing to be swayed.
[Tony J Ibbs (Tibs)]
No - keep the keyword. My reasoning is (a) I like it [emotional reaction, which is the real reason (parse that as "it feels more elegant")],
Note that I'm not asking to get rid of the keyword ("Test: or other" -- btw, the very fact that David can't think of a compelling name is the very reason ">>>" is so highly desirable: the latter is the only choice that isn't fabricated out of thin air -- ">>>" is *natural*). Use a keyword if you like -- doctest doesn't care, so long as it finds ">>>" sooner or later.
and (b) I still have the feeling that on occasion I might want non- test Python script in there,
It's certainly odd that people who don't use doctest are suddenly worried about how to stop it from testing their code <wink/frown> -- doctest doesn't run tests unless you run doctest. If you do run doctest and have Python script you don't want tested, simply refrain from starting it with ">>>"! For example, doctest won't touch Example: m = MyClass(4, "red") assert len(m.color()) != m.int() I'm not advocating that *all* code examples start with ">>>", just that ">>>" be accepted as one of the ways of introducing an example. I have (at least) hundreds of code examples already in that format, and they already look nice and work great (people recognize them instantly for what they are, and create their own with ease).
and (c) it *is* a 'logical' subdivision of the text in exactly the same way as the other major divisions, and so deserves its own place.
Ah -- I don't view it as a major division at all. So far as a doc parser is concerned, at the "major" level a code block should be a single token (it has no internal structure of interest). I'd say it's much less complication than the baroque proposed rules for recognizing bulleted lists, but is of the same nature: "if a line begins with such-and-such a sequence of characters, interpret it as meaning so-and-so". Looking at it from that view, the requirement that I write my doctest examples as: Test:
x + 1 3
instead of as
x + 1 3
is like requiring that everyone write: Unordered-List: List-Item: First point. List-Item: Second point. instead of as e.g. + First point. + Second point. Since I have 100x more doctest examples in my modules than bulleted lists of any flavor, the idea that the latter should be made especially easy but the former made artificially clumsy does tend to grate <wink>.
(to reply to Robin Friedrich later on - although I also don't understand his point about more tags making it harder to parse things (unless he means "for humans to parse")).
As well as for humans to write and to remember.
Am I allowed to disagree with Tim Peters
Certainly!
You'll end up recognizing that with a regexp, like
r"^\s*Example:\s*"
No! No! Whilst I realise that any General Purpose, Released With Python tool will probably have to use re 'cos that's all it has, *I* (for one) would never end up recognising anything much with a regexp. Follow the One True Way - convert to mxTextTools (gosh, I feel better now).
I didn't mean to proselytize on that issue one way or the other. Recognizing ">>>" is near-trivial with mxTextTools too, or even with string.find -- I'm trying to introduce <wink> some sanity against the notion that ">>>" is some kind of *burden* for a programmed parser to recognize. It's not: it's a fixed string that's extremely unlikely to appear by accident, and by that measure is less a headache than list-item prefixes.
... because I have a sneaky feeling Tim's doc-code-tester *wants* to test all code given as examples to make sure they all work (or "fail in the right way"). Hmm.
doctest tests all and only stuff it finds in ">>>" blocks, and I've never seen a ">>>" block in a docstring *unless* it was put there specifically for doctest to find. People writing "plain old" (not-to-be tested) examples simply don't paste interactive sessions into their docstrings, so there's no ">>>", so doctest leaves their examples alone. Instead they mix prose with inline code fragments that fail to work as advertised 3 hours after the docs are written <0.8 wink>. Changing what doctest does isn't an option here: in practice, it's proved to be an essentially perfect solution to the problems it tried to address, and part of "perfection" was making it dirt simple enough that even sub-average programmers can and do use it successfully within minutes of downloading the pkg. I'm not mucking with the hard-won qualities that made this possible! doctest will continue to work fine no matter what we do about doc markup; the only question I have here is whether Doc-SIG markup will play nice with existing and future doctest-using modules. the-difference-is-about-one-line-of-code<0.5-wink>-ly y'rs - tim
On Wed, 1 Dec 1999, Tim Peters wrote: On the issue of >>> vs. 'Example:\n\t>>>' -- I am willing to allow doctests' current style as well as the 'verbose' keyword-tagged & indented one. I don't think the redundancy is a problem as far as users are concerned, and if it complicates the code at all, I can just email Tim saying it's not possible, and he'll provide the patch in 5 minutes. =) --david
David> I don't think the redundancy is a problem as far as users are David> concerned, and if it complicates the code at all, I can just David> email Tim saying it's not possible, and he'll provide the patch David> in 5 minutes. =) David, You forgot about the time machine. You should have written: I don't think the redundancy is a problem as far as users are erned, and if it complicates the code at all, I can just email Tim saying it's not possible, and he provided the patch 5 minutes ago. Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ 847-971-7098 | Python: Programming the way Guido indented...
On Wed, 1 Dec 1999, Skip Montanaro wrote:
David> I don't think the redundancy is a problem as far as users are David> concerned, and if it complicates the code at all, I can just David> email Tim saying it's not possible, and he'll provide the patch David> in 5 minutes. =)
David,
You forgot about the time machine. You should have written:
I don't think the redundancy is a problem as far as users are erned, and if it complicates the code at all, I can just email Tim saying it's not possible, and he provided the patch 5 minutes ago.
Well, it's all a matter of what timezones. Tim 'happens' three hours later than I do, so his time machine sometimes misses 0 by a few minutes. =) 'Tim happens. Get used to it' --da
Tim Peters wrote:
It's certainly odd that people who don't use doctest are suddenly worried about how to stop it from testing their code <wink/frown>.
OK, I admit it, I've been entirely convinced that ">>>" is indeed enough of a signifier, and that I should stop worrying now. Tim's explication of why it wouldn't be a problem does explain to me why it won't be (damn, its really irksome when someone makes having your mind changed so enjoyable - maybe I just need a new mind...). So do we need a "Test" keyword at all now? Only perhaps if one wants to have a section which *explains* what is going on that has a separate heading, and maybe *that* just means we want a "Title" keyword or somesuch, and if we're getting to that level of detail right now I think I should stop. Eddy's point that we'll probably have something that translates to <pre>..</pre> is also relevant, although I think that's actually now for different reasons. Are we there yet? David Ascher wrote:
'Tim happens. Get used to it'
Thank you! Thank you! I was wanting a Python signature to use as well as the HPV ones! Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.demon.co.uk/ 'Tim happens. Get used to it'. (David Ascher, on the Doc-SIG) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)
David Ascher writes:
The only question I suppose is whether one should require a keyword (Test: or other) to keep the top-level syntax trivial, or special-case the recognition of >>>-beginning paragraphs.
I'm leaning for the former, as it can evolve to the latter if there is sufficient call for it from the user base, and I think it does keep the code simpler. But I'm willing to be swayed.
Sway. ;-) I think anything that starts ">>>" or "..." should automagically be a verbatim-thingy. This is easy enough to implement and avoids excess cruft in docstrings. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives
[Fred L. Drake, Jr.]
I think anything that starts ">>>" or "..." should automagically be a verbatim-thingy. This is easy enough to implement and avoids excess cruft in docstrings.
Don't jinx it, Fred! David already caved in on this one -- and seemed very happy to get his child back <wink>. Note that I was pushing for less and more than that: did not ask for "..." to be special (for all I know, it's someone's ellipsis attempt), but did ask for leading ">>>" to mean that the entire *paragraph* starting there be treated verbatim(*). Where "a paragraph" means "all lines up to but not including the next all-whitespace line, or end of docstring, whichever comes first". That's meant to cover pasted-in interactive shell sessions (which my doctest.py makes heavy use of). I don't think it's enough to supply <PRE>...</PRE> functionality, if for no other reason than that an empty line ends it. But then I'm not sure I've seen anything else proposed that can span embedded whitespace lines either. computers-suck-ly y'rs - tim (*) "Verbatim" meaning, of course, not verbatim, but with leading whitespace equal to the leading whitespace of the initial >>> line stripped.
On Fri, 3 Dec 1999, Tim Peters wrote:
I don't think it's enough to supply <PRE>...</PRE> functionality, if for no other reason than that an empty line ends it. But then I'm not sure I've seen anything else proposed that can span embedded whitespace lines either.
I'm not sure what the first sentence means, but re: the second, I just want to point out that any block which starts w/ a keyword and doesn't have a block-on-same-line-as-keyword can span lots of embedded whitespace. Verbatim: this is paragraph one this is paragraph two this is paragraph three and the body of Verbatim is: """ this is paragraph one this is paragraph two this is paragraph three """ i.e., six lines (counting the first empty line)
Fred> I think anything that starts ">>>" or "..." should automagically Fred> be a verbatim-thingy. This is easy enough to implement and avoids Fred> excess cruft in docstrings. I'd amend that to Anything that starts ">>>" or "..." should automagically begin a verbatim-thingy that extends to the first whitespace-only or blank line. Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ 847-971-7098 | Python: Programming the way Guido indented...
Skip Montanaro writes:
I'd amend that to
Anything that starts ">>>" or "..." should automagically begin a verbatim-thingy that extends to the first whitespace-only or blank line.
Definately; I was thinking about working with the docstrings pre-chunked into paragraph-like sections based on blank lines. I'm *not* proposing new syntax to combine these things! (At least not today. ;) -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives
I personally would prefer to keep [] for references and introduce @..@ (or some other delimiter) for inline code, mostly because [] is so common in journals as a way of indicating bibliographic references. I
Fair enough.
I see. So the workaround for those scripts which can't be imported is to start them with:
import sys; if sys.doc_building: sys.exit()
Not too bad.
I more had in mind: if sys.doc_building: # Normally critical we do this. dont_do_something_really_expensive() We dont need to execute the bulk of the code, just import the module and get a few of the symbols.
That would be written (with the current proposal):
The following code shows how to do this: Example: def foo(): etc
The following code shows how to do that: Example: def bar(): etc
Is that ok w/ you?
Perfect. Mark.
On Tue, 30 Nov 1999, Mark Hammond wrote:
I more had in mind:
if sys.doc_building: # Normally critical we do this. dont_do_something_really_expensive()
We dont need to execute the bulk of the code, just import the module and get a few of the symbols.
But lots of modules currently do everything in the leftmost column (they're called "scripts" =). Some of them never end (they're called " "daemons" =). I don't want to force someone to take their 'global' code and put it in a function just to get around the docstring tool. Anyway, the point is moot, as one or the other solution will work, depending on the script. --david
Thus: * Any line starting with a word followed by a colon can be considered a keyword. If you dont want this, just make sure its not the first word on the line.
Not happy. A paragraph of text which precedes an example may be relied upon to end in `for example:', in which the last contiguous block of non-space characters is of length 8; if I modify an earlier part of the paragraph, I'm going to ask my authoring tool (python-mode.el) to reformat the paragraph, without necessarily being aware of a gotcha waiting for me at the paragraph's end; my margins will be within 72 characters of one another, giving a roughly 1 in 9 chance that `example:' ends up being alone on the last line ... gotcha. A cure for this would just be to do keyword-recognition case sensitively, and Capitalise keywords; otherwise, we have to insist on either a dedent or a blank line preceding any keyword. Which offends folk worse: case sensitivity or needing a dedent/vspace ?
* A star or dash starting a line can be considered a new list item. Again, if it is truly a hyphen or whatever else, just adjust your line wrap slightly so it is no longer the first word.
Alternatively, all lists use the same `item-introducer' character and follow it with an optional character indicating what bullet to use. Thus one might have (taking ~ as the introducer for the illustration) ~ outermost list, first item ~ outer second which may contain a subordinate ~ which is dedented so it can use the same introducer without confusion ~ and output formatters can chose different symbols in place of the star for successive nesting layers ~ by the way, should further lines line up with the text or the bullet ? my reckoning is with the text ... ~ outer third, whose subordinate might want Roman numerals ~i so it indicates them thus ~i and can chose to leave the engine to sort out numbering ~iii but can effectively assert that one item (referred to elsewhere) has a particular number ~i without having to mention numbers for the rest ~i and of course ~1 we can use the other numbering styles ~2 including alphabetic, upper or lower, using ~A or ~a. ~1 with use of first in series taken as `work out right number' ~7 but I think the tool should complain if you get later positions wrong: it's an assertion, and it indicates that this item is going to be referred to from other text as item 7 - I need to be told I got it wrong ! Obviously I've deleted a few items before this one without realising what's happening below ... ~ outer fourth ~o must the bullets in a given list all match ? ~. should stand for mid-dot, and star is likewise easy using * ~o I think so, anyway ~- dash is obvious and now unambiguous, as are + and = ~o mind you, o requires care: if it's the first item in a list, that list is going to use o as its bullet; but if it appears in a list which began with a ~a then we have to read it as item fifteen. ~ and if we're insisting on all items in a list having the same bullet, does it make sense to allow items after the first to just use an unadorned star meaning re-use of first item's symbol, thus saving us lots of editing when we want to change the symbol in use by a list, or shuffle an item from a sub-list out into its parent list (or etc.) ~ of course, ~ needn't be the bullet-introducer, we could use pretty much any punctuator as long as it doesn't obviously clash; candidate egs: #, @, $, %, &, * and even | ~ outer fifth ~ as for descriptive lists, I'd go with the old gendoc form, which uses double dash -- which just feels so natural, but needs vspace -- to separate items, given that -- might be used within an item on a later-than-first line. I can live with this.
Other random thoughts: * The [blah] notation is good, but needs to be well defined. eg, "[module.function]" when used in the context of a package should use the same "module scoping" that Python itself uses.
The thing that saves [this] from being problematic is that the format in which it was introduced presumed that one was going to use a brief mnemonic as [this] word and end the docstring with a chunk which explains the cross-references (new keyword: Xrefs ?) and, in particular, tells the doc-string-reader which [tokens] actually have a translation, the rest being left as typed; thus, if this paragraph appeared in a docstring which says how to translate [this] (giving an xref and - optionally - a text to use (default `this') in place of [this]), the digested form would duly replace [this] but leave [tokens] as it is. To further simplify life, I'd understood the [this] keys that are translatable to insist on [nowhitespace] to save the parser most of its `this might be an xref' pending decisions - which is why the Xrefs section needs to at least have the option of specifying the text to be used in place of [this] as well as the Xref to point it at. What we're doing is citation, which is widely done with []. No need for [this] to be a [module.function] or anything like - the Xrefs section provides the translation. Xrefs: [gendoc] http://www.python.org/contrib/gendoc/ [this] http://www.python.org/lists/doc-sig/hideous?with=data&as=you+will The present message [copy] string.copy the standard string copy function [etc] location sub sti tute [sorry, all exhibited xrefs are bogus - illustrative only] I'm sure that's only a minor paraphrase of a spec I saw a while ago on this list ... Of course, Xrefs might better be called Bibliography. We can use as `location' some pythonic reference that can be resolved in the ways that the suggested module.function semantics point to: indeed, I would take this as what to try first, falling back on recognising other stuff as URLs and similar.
... However, the use of brackets may conflict with people who use inline code (rather than an example "block" - maybe something like "@" could be used? @module.function@ would be reasonable.
With the above, can we evade this ? The fact that [citations] are so widely used argues for the [form]; and the fact that [anything with space in it] isn't a citation should make all the `ordinary text' and `python denotations' [usages] unproblematic, while leaving untranslated ones as [literal] uses of [ and ]. If nothing else, I find my eye latches onto [cite] better than @cite@ ... and bear in mind that @ has some other magic uses, parser error - unclosed citation at line 137: Sender: eddyw@lsl.co.uk All told, we seem to have a fairly good spec ... save for some nitpickery ;^> Tibs said:
David (Ascher) - is it time to re-release your initial "docstring grammar" and I confess that's something I'd like to see too. After all, we have to have someone to play Gdo ...
Eddy.
I said: ~ and output formatters can chose different symbols in place of the star for successive nesting layers and ~ and if we're insisting on all items in a list having the same bullet, does it make sense to allow items after the first to just use an unadorned star meaning re-use of first item's symbol, thus saving us lots of editing when we want to change the symbol in use by a list, or shuffle an item from a sub-list out into its parent list (or etc.) but `unadorned star' should be `unadorned twiddle' - I missed a conversion after being persuaded that *'s font role prohibits its use as, for instance, *o or *1, which would match `begin italic': hence the use of ~ and remarks about other candidates. Likewise, in the first, the presumption was that * is the default symbol, but I don't imagine we'd be using ~ as a bullet much (well, we could), so that snippet should have vanished. The output formatters chose symbols as appropriate: the parser just identifies the list structure and which bits are subordinate to which others. Eddy.
Edward Welbourne wrote:
Thus: * Any line starting with a word followed by a colon can be considered a keyword. If you dont want this, just make sure its not the first word on the line.
Not happy. A paragraph of text which precedes an example may be relied upon to end in `for example:', in which the last contiguous block of non-space characters is of length 8; if I modify an earlier part of the paragraph, I'm going to ask my authoring tool (python-mode.el) to reformat the paragraph, without necessarily being aware of a gotcha waiting for me at the paragraph's end; my margins will be within 72 characters of one another, giving a roughly 1 in 9 chance that `example:' ends up being alone on the last line ... gotcha.
A cure for this would just be to do keyword-recognition case sensitively, and Capitalise keywords; otherwise, we have to insist on either a dedent or a blank line preceding any keyword. Which offends folk worse: case sensitivity or needing a dedent/vspace ?
Why not just raise an exception ? I don't think that the usage of "some text:" is common in doc strings except for maybe examples which should then adapted to use the new "Example:" keyword. Here's an example docstring... the format looks pretty nice, IMHO. """ foo(bar,rab,oof) -> integer -- single line desription Longer description spanning multiple lines Arguments: bar -- some string rab -- another string oof -- an integer Returns: 42 in most cases History: 19991130 MAL -- Added oof argument 19991101 MAL -- Created """ Not sure if this is already somewhere in the proposal, but I would like to see '--' as indicator of a single line text block. This would be useful in vertically compressing the docstrings somewhat (and it already being used in the signature line for such a purpose).
* A star or dash starting a line can be considered a new list item. Again, if it is truly a hyphen or whatever else, just adjust your line wrap slightly so it is no longer the first word.
Alternatively, all lists use the same `item-introducer' character and follow it with an optional character indicating what bullet to use. Thus one might have (taking ~ as the introducer for the illustration)
...
Let's leave this to some list parser (are we starting to head for NP-completeness again ;-).
Other random thoughts: * The [blah] notation is good, but needs to be well defined. eg, "[module.function]" when used in the context of a package should use the same "module scoping" that Python itself uses.
Right. It should ideally perform the same lookup as Python would in the global namespace. The resulting object could then either be handled recursively by the doc tool or simply stored by reference for later use (e.g. via the file name of a module or the id of an object).
The thing that saves [this] from being problematic is that the format in which it was introduced presumed that one was going to use a brief mnemonic as [this] word and end the docstring with a chunk which explains the cross-references (new keyword: Xrefs ?) and, in particular, tells the doc-string-reader which [tokens] actually have a translation, the rest being left as typed; thus, if this paragraph appeared in a docstring which says how to translate [this] (giving an xref and - optionally - a text to use (default `this') in place of [this]), the digested form would duly replace [this] but leave [tokens] as it is.
To further simplify life, I'd understood the [this] keys that are translatable to insist on [nowhitespace] to save the parser most of its `this might be an xref' pending decisions - which is why the Xrefs section needs to at least have the option of specifying the text to be used in place of [this] as well as the Xref to point it at. What we're doing is citation, which is widely done with [].
No need for [this] to be a [module.function] or anything like - the Xrefs section provides the translation.
Xrefs: [gendoc] http://www.python.org/contrib/gendoc/ [this] http://www.python.org/lists/doc-sig/hideous?with=data&as=you+will The present message [copy] string.copy the standard string copy function [etc] location sub sti tute
[sorry, all exhibited xrefs are bogus - illustrative only] I'm sure that's only a minor paraphrase of a spec I saw a while ago on this list ...
Of course, Xrefs might better be called Bibliography.
Or perhaps "References:" as in David's proposal ?!
We can use as `location' some pythonic reference that can be resolved in the ways that the suggested module.function semantics point to: indeed, I would take this as what to try first, falling back on recognising other stuff as URLs and similar.
... However, the use of brackets may conflict with people who use inline code (rather than an example "block" - maybe something like "@" could be used? @module.function@ would be reasonable.
With the above, can we evade this ? The fact that [citations] are so widely used argues for the [form]; and the fact that [anything with space in it] isn't a citation should make all the `ordinary text' and `python denotations' [usages] unproblematic, while leaving untranslated ones as [literal] uses of [ and ]. If nothing else, I find my eye latches onto [cite] better than @cite@ ... and bear in mind that @ has some other magic uses,
parser error - unclosed citation at line 137: Sender: eddyw@lsl.co.uk
All told, we seem to have a fairly good spec ... save for some nitpickery ;^>
Since [] is only used for lists in Python, we could define the RE '\[[a-zA-Z0-9_.]+\]' for our purposes and raise an exception in case the enclosed reference cannot be mapped to a symbol in the global namespace (note: no whitespace, no commas) which either evaluates to a function, method, module or reference object. Doc strings like "...use [None]*10 as argument..." will fail, but are easily avoided by inserting some extra whitespace, e.g. "...use [ None ] * 10 as argument...". -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 31 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
Mark Hammond:
Thus: * Any line starting with a word followed by a colon can be considered a keyword. If you dont want this, just make sure its not the first word on the line.
I agree with Edward on this one -- this is too fragile. I consider the whitespace issue to be real only in the context of lists, and I think that gendoc has shown that it's solvable within the context of lists. I stand by the keyword notation I presented: either Keyword: text block spanning one or more lines or Keyword: one-line block as long as they are both in separate paragraphs.
Not sure if this is already somewhere in the proposal, but I would like to see '--' as indicator of a single line text block. This would be useful in vertically compressing the docstrings somewhat (and it already being used in the signature line for such a purpose).
Isn't that just redundant with the : notation? Note that I don't mind a little redundancy, but it's unpythonic.
* A star or dash starting a line can be considered a new list item. Again, if it is truly a hyphen or whatever else, just adjust your line wrap slightly so it is no longer the first word.
Alternatively, all lists use the same `item-introducer' character and follow it with an optional character indicating what bullet to use. Thus one might have (taking ~ as the introducer for the illustration)
...
Let's leave this to some list parser (are we starting to head for NP-completeness again ;-).
Absolutely! Mark:
Other random thoughts: * The [blah] notation is good, but needs to be well defined. eg,
MAL:
Right. It should ideally perform the same lookup as Python would in the global namespace. The resulting object could then either be handled recursively by the doc tool or simply stored by reference for later use (e.g. via the file name of a module or the id of an object).
Edward:
The thing that saves [this] from being problematic is that the format in which it was introduced presumed that one was going to use a brief mnemonic as [this] word and end the docstring with a chunk which explains the cross-references (new keyword: Xrefs ?)
I think that both are needed. I believe that the namespaces looked up should be: 1) the local namespace of the docstring -- i.e., the set of keywords defined in the "References" keyword block in the current docstring. 2) the global namespace of the docstrings -- i.e. the set of keywords defined in the "References" keyword block in the MODULE docstring. 3) The global Python namespace for that module 4) Some namespace corresponding to builtins & unimported modules, yet ill-defined. The point of 2) is that I often want to introduce references that I use in a given module at the level of a docstring, but then want to refer to those documents in specific function docstrings. (Good thing we don't have to worry about garbage collection with these circular references =)
Since [] is only used for lists in Python, we could define the RE '\[[a-zA-Z0-9_.]+\]' for our purposes and raise an exception in case the enclosed reference cannot be mapped to a symbol in the global namespace (note: no whitespace, no commas) which either evaluates to a function, method, module or reference object.
Doc strings like "...use [None]*10 as argument..." will fail, but are easily avoided by inserting some extra whitespace, e.g. "...use [ None ] * 10 as argument...".
I like that bit, especially since the 'complete' tagging of that example would wrap [None]*10 in whatever inline code markup is chosen. --david
Since [] is only used for lists in Python, we could define the RE '\[[a-zA-Z0-9_.]+\]' for our purposes and raise an exception in case the enclosed reference cannot be mapped to a symbol in the global namespace (note: no whitespace, no commas) which either evaluates to a function, method, module or reference object.
umm ... hang on, two things seem stirred up here. The proposal I remember from ages ago and tried to echo has [token] and the token doesn't have to be intelligible to the python engine: elsewhere in the doc string, we'll have References: [token] reference text which the parsed docstring uses to decode each use of [token] that appeared in the docstring. Here, reference would normally be something recognised by the python engine (and would be the thing I understand you to be putting in [brackets]), but the Reference-handler might also cope with it being, e.g., an URL. The text that ends the reference becomes the text of the `anchor' generated: -> ... and tried to echo has <a href="reference">text</a> and the token ... note non-appeareance of [token] in the digested form: but if `text' had been omitted from the Reference spec, [token] is the default text (e.g. when what you're doing really is a citation and that's just how you want it to appear). Then any uses of [None] that appear in your doc string, meaning `the list with one entry, None', it suffices that your References section doesn't have an entry for [None] - the parsed docstring will then just say [None] (and not even attempt to wrap an anchor round it). The only real relevance to forbidding [spaces within] the citation token is to ensure that where authors use [square brackets] for parenthetical remarks or as list denotations, the parser hasn't got to do the piece of jiggery-pokery that marks it as `maybe a xref' and obliges it to come back later to settle the maybe once it knows. This cost will remain for [None], but it'll be well-defined that the parser marks it as a maybe, discovers that it isn't and settles on it being just text, not a reference. Now, it seems to me that what you were describing was slightly different ... am I merely confused ? Eddy.
Ed is correct. Gendoc solved the HREF problem with: "...An addition was made to support hypertext references. Hypertext references are marked with double quotes in the body of the doc string. At the end of the doc string will be a matching line starting with two dots '.. ' and a space followed by the same quoted text and then followed by the mapping (URL). This is patterned after the footnote notion in setext but is easier on the eyes. For example, "Pythonland" will be marked as a hyper-references to Python.org. If no matching trailing reference is found then nothing is done. " Which might be modified with current thinking to yield: """ Marking refs with [brackets], and at the end of the doc string place the annotations ala bibliography one per line. Key "brackets" is placed in the local namespace and used by other (lower) doc strings. In the gendoc implementation if the key doesn't match anything stored in the ref mapping no markup in done, so that things like [None]*5 are safe and no exception need be raised. [brackets] -> http://www.howto.python.org/rtfm.html """ -Robin ----- Original Message ----- From: Edward Welbourne <eddyw@lsl.co.uk> To: M.-A. Lemburg <mal@lemburg.com> Cc: <mhammond@skippinet.com.au>; 'David Ascher' <da@ski.org>; <doc-sig@python.org> Sent: Tuesday, November 30, 1999 11:34 AM Subject: Re: [Doc-SIG] docstring grammar
Since [] is only used for lists in Python, we could define the RE '\[[a-zA-Z0-9_.]+\]' for our purposes and raise an exception in case the enclosed reference cannot be mapped to a symbol in the global namespace (note: no whitespace, no commas) which either evaluates to a function, method, module or reference object.
umm ... hang on, two things seem stirred up here. The proposal I remember from ages ago and tried to echo has [token] and the token doesn't have to be intelligible to the python engine: elsewhere in the doc string, we'll have
References: [token] reference text
which the parsed docstring uses to decode each use of [token] that appeared in the docstring. Here, reference would normally be something recognised by the python engine (and would be the thing I understand you to be putting in [brackets]), but the Reference-handler might also cope with it being, e.g., an URL. The text that ends the reference becomes the text of the `anchor' generated:
-> ... and tried to echo has <a href="reference">text</a> and the token ...
note non-appeareance of [token] in the digested form: but if `text' had been omitted from the Reference spec, [token] is the default text (e.g. when what you're doing really is a citation and that's just how you want it to appear). Then any uses of [None] that appear in your doc string, meaning `the list with one entry, None', it suffices that your References section doesn't have an entry for [None] - the parsed docstring will then just say [None] (and not even attempt to wrap an anchor round it).
The only real relevance to forbidding [spaces within] the citation token is to ensure that where authors use [square brackets] for parenthetical remarks or as list denotations, the parser hasn't got to do the piece of jiggery-pokery that marks it as `maybe a xref' and obliges it to come back later to settle the maybe once it knows. This cost will remain for [None], but it'll be well-defined that the parser marks it as a maybe, discovers that it isn't and settles on it being just text, not a reference.
Now, it seems to me that what you were describing was slightly different ... am I merely confused ?
Eddy.
_______________________________________________ Doc-SIG maillist - Doc-SIG@python.org http://www.python.org/mailman/listinfo/doc-sig
On Tue, 30 Nov 1999, Robin Friedrich wrote:
""" Marking refs with [brackets], and at the end of the doc string place the annotations ala bibliography one per line. Key "brackets" is placed in the local namespace and used by other (lower) doc strings. In the gendoc implementation if the key doesn't match anything stored in the ref mapping no markup in done, so that things like [None]*5 are safe and no exception need be raised.
[brackets] -> http://www.howto.python.org/rtfm.html """
Nicely said. I'd like to point out that the transformation I had in mind is in fact, given the above and an HTML output: [brackets] -> <a href="http://www.howto.python.org/rtfm.html">brackets</a> In other words the keyword is kept until the rendering stage. I suppose that it might be necessary to allow the reference to define a different bit of text to render instead of the keyword. So given: """ ... References: PythonDotOrg: Text: "Python's Main Website" Link: http://www.python.org """ we could have: [PythonDotOrg] -> <a href="http://www.python.org">Python's main website</a> Or not. Luckily I think that issue can be left to the 'bibliography engine', just like the bullet processing can be left to the 'list engine'. --david PS: I would suggest that the 'if no key exists, no markup is done' behavior be modifiable at runtime to 'a warning is emitted', as I think that this sort of silent behavior is problematic given the presence of typos in the world.
On Tue, 30 Nov 1999, Robin Friedrich wrote:
""" Marking refs with [brackets], and at the end of the doc string place the annotations ala bibliography one per line. Key "brackets" is placed in
From: David Ascher <da@ski.org> the
local namespace and used by other (lower) doc strings. In the gendoc implementation if the key doesn't match anything stored in the ref mapping no markup in done, so that things like [None]*5 are safe and no exception need be raised.
[brackets] -> http://www.howto.python.org/rtfm.html """
Nicely said. I'd like to point out that the transformation I had in mind is in fact, given the above and an HTML output:
[brackets] -> <a href="http://www.howto.python.org/rtfm.html">brackets</a>
grumble grumble...see below.
In other words the keyword is kept until the rendering stage. I suppose that it might be necessary to allow the reference to define a different bit of text to render instead of the keyword.
Why? keywords are arbitrary strings. (may include spaces, etc.)
So given:
""" ... References:
PythonDotOrg: Text: "Python's Main Website" Link: http://www.python.org """
we could have:
[PythonDotOrg] -> <a href="http://www.python.org">Python's main
website</a>
Or not. Luckily I think that issue can be left to the 'bibliography engine', just like the bullet processing can be left to the 'list engine'.
Yes. However I really don't like the idea of HTML finding its way into the doc string. The BiblioEngine would be told the information of the reference and, along with what rendering mode she is in, emit the appropriate output format, be it HTML, XML, PDF, etc.
--david
PS: I would suggest that the 'if no key exists, no markup is done' behavior be modifiable at runtime to 'a warning is emitted', as I think that this sort of silent behavior is problematic given the presence of typos in the world.
Agreed.
On Tue, 30 Nov 1999, Robin Friedrich wrote:
Nicely said. I'd like to point out that the transformation I had in mind is in fact, given the above and an HTML output:
[brackets] -> <a href="http://www.howto.python.org/rtfm.html">brackets</a>
grumble grumble...see below.
In other words the keyword is kept until the rendering stage. I suppose that it might be necessary to allow the reference to define a different bit of text to render instead of the keyword.
Why? keywords are arbitrary strings. (may include spaces, etc.)
We should watch our language =). Keywords in my proposal are things before :'s which lead a paragraph and cannot contain whitespaces. Maybe we don't need that restrictions on things in []'s.
References:
PythonDotOrg: Text: "Python's Main Website" Link: http://www.python.org
Yes. However I really don't like the idea of HTML finding its way into the doc string. The BiblioEngine would be told the information of the reference and, along with what rendering mode she is in, emit the appropriate output format, be it HTML, XML, PDF, etc.
I don't recall putting HTML in the docstring. Just a URL.
[brackets] -> <a
My bad. ----- Original Message ----- From: David Ascher <da@ski.org> href="http://www.howto.python.org/rtfm.html">brackets</a> I was interpreting the above as a doc string rewrite of my [brackets] -> http://www.howto.python.org/rtfm.html *in* the doc string. Sorry.
Why? keywords are arbitrary strings. (may include spaces, etc.)
We should watch our language =). Keywords in my proposal are things before :'s which lead a paragraph and cannot contain whitespaces. Maybe we don't need that restrictions on things in []'s.
References:
PythonDotOrg: Text: "Python's Main Website" Link: http://www.python.org
Hmmm. Gosh we need a glossary quick! Yup, we had different notions of "keyword". Do you really want arbitrary DAkeywords (stuff before colons) usable for internal/external references? Since this confused me, I might conclude that it would confuse others as well. I would have placed the following in my doc string and been satisfied... """..... For further information visit: [Python Language Web Site] is the main source for Python itself. [Starship Python] houses a number of Python user resources. [Python Language Web Site] -> http://www.python.org [Starship Python] -> http://starship.python.net """ Intuitively I don't think of the word "visit" as a keyword that can be referenced, while anything in brackets seems fair game. What other features did you have in mind? Dejavu'ly yours, Robin
On Tue, 30 Nov 1999, Robin Friedrich wrote:
Hmmm. Gosh we need a glossary quick! Yup, we had different notions of "keyword".
A keyword is a case-sensitive string which: - starts a paragraph - matches '^ *[a-zA-Z_]+[\-a-zA-Z_0-9]*: +' (Python identifiers with the addition of hyphens and which end with a : and one or more spaces) As (I think it was) Tibs mentioned, it's syntactic sugar for XML notation, with the same aim of making a 'labeled' hierarchy. Maybe the word 'Label' is better. Foo: this is the body of foo which spans multiple lines is isomorphic to <Foo> this is the body of foo which spans multiple lines </Foo>
Do you really want arbitrary DAkeywords (stuff before colons) usable for internal/external references? Since this confused me, I might conclude that it would confuse others as well.
No. I intend only the DAKeywords listed in a special "References:" section to be available as the targets of references (see below).
I would have placed the following in my doc string and been satisfied... """..... For further information visit: [Python Language Web Site] is the main source for Python itself. [Starship Python] houses a number of Python user resources.
[Python Language Web Site] -> http://www.python.org [Starship Python] -> http://starship.python.net """
This is, I would assume, harder to parse -- you must have some implicit rules in there regarding which [Starship Python] is a 'mention of something else' and which is a 'this is the thing I mentioned'. Is it the sequential order, the 0-indent? My vision for the same semantics as above was: """..... For further information visit: [PythonLanguageWebSite] is the main source for Python itself. [StarshipPython] houses a number of Python user resources. References: PythonLanguageWebSite: http://www.python.org StarshipPython: http://starship.python.net """ Which leaves open the question of how we can have 'space-enabled' labels for references which can't have spaces in them. One idea is to tag the [] markup with a ="stringlabel": [PythonLanguageWebSite="The Python.org website"] is the main source for Python itself. Another possibility hinted at previously is to enrich the References section: References: PythonLanguageWebSite: Label: The Python.org website Link: http://www.python.org either of which, when rendered, would 'do the right thing. I only expect this to be an issue when referring to URLs. Python modules, classes and functions already have perfectly good names. For things which are more like *real* bibliographic references, I'd be just as happy with the conventional [keyword] notation seen in many CS papers. See [ascher29] for the source of the algorithm. References: ascher29: My famous Ph.D. Dissertation, Foo University, 2029. which would get rendered just the way it looks on your screen even in a printed format.
Intuitively I don't think of the word "visit" as a keyword that can be referenced, while anything in brackets seems fair game. What other features did you have in mind?
I don't understand the above paragraph. The word 'visit' isn't a DAKeyword because it wasn't starting a paragraph. --david PS: I'm working on updating the proposal, but I have other pressing deadlines (such as getting the JPython tutorial ready for IPC8!), so it may not be ready for a couple of days.
In David's original proposal he wrote: For compatibility with Guido, IDLE and Pythonwin (and increasing the likelihood that the proposal will be accepted by GvR), the docstrings of callables must follow the following convention established in Python's builtins: >>> print len.__doc__ len(object) -> integer Return the number of items of a sequence or mapping. In other words, the first paragraph must fit on a line, repeat the name of the callable, with a 'wordy' signature, the ' -> ' string, and the type of the return value. Chiming in rather late. Perhaps this was already discussed, but I didn't see it in the immediate followups to David's original proposal... The one complaint I have with the wordy signature is that it partially types the function. It specifies a return type, but not the input parameter types. Why go only halfway? I suggest you either use type names for parameters and return value or annotate the parameter names with types: len(o:sequence) -> IntType There should be a couple shorthands, for instance, using "sequence", "mapping" or "number" to represent objects that exhibit the given behavior, or "object" to represent an arbitrary (untyped) parameter or return value. Otherwise, I'd suggest the types be the names defined by the types module. Of course, I'm ignoring the types of the elements of aggregate types. I'll let someone smarter make a more concrete proposal in this regard. Why worry about this? Well, people have been asking over and over for type information. This looks parseable to me, doesn't change the language, yet could be used by a type inferencer, "safer" compiler or other type-oriented tools. Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ 847-971-7098 | Python: Programming the way Guido indented...
On Tue, 30 Nov 1999, Skip Montanaro wrote:
The one complaint I have with the wordy signature is that it partially types the function. It specifies a return type, but not the input parameter types. Why go only halfway? I suggest you either use type names for parameters and return value or annotate the parameter names with types:
len(o:sequence) -> IntType
I propose to defer this discussion. I think it's a fine idea in general, but raises a whole bunch of issues, and mixes with other threads like typing etc. Furthermore, the current uses of this first line (popups in IDLE and Pythonwin) might suffer from a significant lengthening of said line. Getting the type information in the docstring is however a worthy goal, but perhaps best left for a subsection: Arguments: o (sequence) -- an arbitrary sequence object I'd like to finalize the top-level structure, get it in front of GvR's eyeballs, and then we can tackle each subtopic (so far: list processing, reference handling, signature, mandatory keywords, keyword registration process, multilingual keyword support, etc.) at a later date. --david
David said:
Or not. Luckily I think that issue can be left to the 'bibliography engine', just like the bullet processing can be left to the 'list engine'.
Yup. We've explored more than enough of the territory towards each of these: working out what to do with the loose ends is now down to the level where I'll trust whoever *does the work* to implement a tool that `does something sensible' and then we can take that sensible, abstract it away from that reference implementation and call it a docstring spec ;^) Skip said:
len(o:sequence) -> IntType
no, yuck, don't do it. Pack that information into the argument sections by all means; but the way for that one-liner to `name' the arguments should be about getting across the `what does this argument mean' information. Being told transcribe(s1:stream, s2:stream) doesn't tell me the thing I really want to know, where transcribe(source, target) tells me the only thing I really care about (given that the arguments section will say that source and target are streams - aka file descriptors - and I probably found the function in a module which defines tools for manipulating streams so this part is obvious). (I'd have called those arguments (from, to) but for the keyword ...) Crucially, transcribe(source, target) looks just like a real call of the function and is archetypical among calls of the function. David said:
I'd like to finalize the top-level structure, get it in front of GvR's eyeballs, and then we can tackle each subtopic (so far: list processing, reference handling, signature, mandatory keywords, keyword registration process, multilingual keyword support, etc.) at a later date. Yes please.
Tibs said, of the Example:/>>> debate:
No - keep the keyword. ... (a) I like it ... (b)... non-test Python script ... (c)... 'logical' subdivision ... unless he means "for humans to parse"
OK, start with the last: Tibs, you observed a while back that the human brain holds up to 7 (or is it 12 ?) things at the same time. That's the `for humans to parse' constraint. We want to keep to a minimum the set of keywords a programmer needs to be familiar with to be able to get pay-back from using the document format. (a) I'll merrily vote contrary to you and hope that we cancel out. Then we can come to the observation that Tim Peters seems to want to be able to do >>> without the keyword. I know you'll stop arguing. (b) so ? we're bound to have *some* form of construct equivalent to <PRE>, so the non-test pieces can be indented with that, leaving the more common `this is what would really happen, try it and see' flavour of embedded code (which Tim's tool will duly verify ;^) to be written the way the pythoneer wanted to write it. (c) A bunch of lines sharing initial indent and mostly starting >>> form a logical sub-division just as long as the audience know to recognise it as such - and the audience here consists of pythoneers, so we will recognise it. Various folk discussed language. My ha'p'n'rth on that would go for a variable in the module namespace, nominally __language__ = 'en:UK' # expect English spellings, like colour, sulphur and I'd vote for the *default* to be Dutch (to encourage US anglophones to get used to admitting that they speak 'en:US' or whatever it's called) though I realise I might have to live with 'en:US'. Why, you might ask, do I want it in the module namespace ? So that the contents of the doc string are *all* in the same language: it'd just be perverse to have an anglophone keyword (Language) as the one keyword which we don't translate, in doc-strings; and the magic names in a module's namespace, like the reserved words of the language (*outside* the doc string) are already condemned to monolinguism, so we might as well leverage their sacrifice to enable the purity of the doc strings. Either that or do something entertaining which involves looking for a match to: <keyword meaning `language'>: <language in which that keyword means `language'> and I'll be immensely impressed if you can make that work. Of course, *within* the selected language, I'd be more than happy to watch (if anyone can be bothered to implement) something along the lines of (with __language__ set to an English variant) """blah(burble) -> wibble -- rumbles ... in English ... Translation: Language: French ... en Francais ... Traduction: # (perverse but legitimate ...) Langue: Allemande ... im Deutsch ... # (no, really, I'm just guessing) Translation: Language: Norse ... på Norsk ... etc. """ in which Translation, Language and the language selected are given in the host __language__, but the rest of each translation block is in the guest language (if you see what I mean). But, as the Norse case illustrates, how will docstrings cope with encoding languages which need more than ASCII provides ? I've defaulted to borrowing HTML's character entities for this, but I'll bet a Norse author would get swiftly fed up with doing that ... However, this is yet more gratuitous over-complete specification ... We have, collectively, said enough that I'd trust any of the assembled folk (and, for that matter, any lurkers we may have) to take David's revised grammar (due some time soon ?) and, whatever they implement, I'm sure I'll be much happier with it than with what we have now. Eddy. -- Celui qui parle trois langues s'appelle un trilangue. Celui qui parle deux langues s'appelle un bilangue. Mais celui qui parle seulement un langue s'appelle un anglophone. -- Quebecois joke.
Edward> Skip said: >> len(o:sequence) -> IntType Edward> no, yuck, don't do it. Edward> Pack that information into the argument sections by all means; Edward> but the way for that one-liner to `name' the arguments should be Edward> about getting across the `what does this argument mean' Edward> information. Being told Edward> transcribe(s1:stream, s2:stream) Edward> doesn't tell me the thing I really want to know, where Edward> transcribe(source, target) Whatever... ;-) The point I guess I tried to make but didn't was that the sig -> type line types the return value but not the parameters. If the arguments are to be typed someplace else, that's cool. Where will the return type be "declared"? Just in the -> sig? Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ 847-971-7098 | Python: Programming the way Guido indented...
OK, start with the last: Tibs, you observed a while back that the human brain holds up to 7 (or is it 12 ?) things at the same time. That's the `for humans to parse' constraint. We want to keep to a minimum the set of keywords a programmer needs to be familiar with to be able to get pay-back from using the document format.
It sounds as if you are going for the Miller 7 +/- 2 rule, which holds that most people can hold between 5 and 9 units of related information at a time, particularly in short-term memory. It is always a good idea in API and protocol design to keep main elements within the 7 +/- 2, but there is no reason not to have a few less used, optional and advisedly more arcane elements for the more experienced users. So far it seems we're close to that point in "canonical doc-string", so I agree that as soon as the scapegoat (too bad it appears to be you, David, but that's what you get for introducing such a sterling proposal) comes up with a second proposal that takes all the recent discussion into account, we should call it alpha and start hacking.
Various folk discussed language. My ha'p'n'rth on that would go for a variable in the module namespace, nominally
__language__ = 'en:UK' # expect English spellings, like colour, sulphur
and I'd vote for the *default* to be Dutch (to encourage US anglophones to get used to admitting that they speak 'en:US' or whatever it's called) though I realise I might have to live with 'en:US'.
Why, you might ask, do I want it in the module namespace ?
So that the contents of the doc string are *all* in the same language: it'd just be perverse to have an anglophone keyword (Language) as the one keyword which we don't translate, in doc-strings; and the magic names in a module's namespace, like the reserved words of the language (*outside* the doc string) are already condemned to monolinguism, so we might as well leverage their sacrifice to enable the purity of the doc strings.
I rather like this idea. We are building a similar mechanism into 4Suite to support i18n messages, and it would be nice to unify the doc-string and code l10n mechanism. I would rename it __locale__ = 'en:UK' To go in line with more common usage. It would alctually then be nice for Python (1.6 feature?) to read the LC_ALL environment variable in UNIX, and the equivalent on other platforms, and set a default for the __locale__. I also like the idea that all keywords would be in the local language.
Either that or do something entertaining which involves looking for a match to:
<keyword meaning `language'>: <language in which that keyword means `language'>
Umm. Or much rather not.
Celui qui parle trois langues s'appelle un trilangue. Celui qui parle deux langues s'appelle un bilangue. Mais celui qui parle seulement un langue s'appelle un anglophone. -- Quebecois joke.
This is an absolute jem (or should I say 'bijou'?) Strange how often the heartiest humor comes from the most cynical attitudes. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org
... you are going for the Miller 7 +/- 2 rule ... sounds familiar. Anyone unfamiliar with this might benefit from the URL Tibs just sent me in connection with your message:
http://www.well.com/user/smalin/miller.html
We are building a similar mechanism into 4Suite to support i18n messages, <awed>ooo</awed>
__locale__ = 'en:UK' Sounds good.
... LC_ALL environment variable ... default for the __locale__.
No. This is a user-specified value; many users may be using the same source file, at many sites; the source file isn't changing as you change user; __locale__ is telling pythonic tools which set of keywords is being used in the source file's doc strings; this also isn't changing from one user to another. However, using LC_ALL (or equivalent) to configure which Language section the pythonic tools extract to show to the user, now that'd be cool. And if the right Language is absent ... try talking to babelfish (although I have to confess this is better for humour than sense).
<keyword meaning `language'>: <language in which that keyword means `language'>
Umm. Or much rather not.
Sorry, it may not have been clear that that suggestion was of form `of course, if anyone is volunteering to do something absurdly hard, far be it from me to miss out on an opportunity to laugh myself breathless'. Even if fate allows that there really is a way to do this (such that no two languages will get confused by one another's keywords), it's a fairly safe bet that about half of the keywords meaning `language' would be words in other languages, with respectably many of them offensive/lewd/absurd/funny in at least several other languages. I must confess, I am looking forward to the growth of document-transfer via babelfish and its kin. Laughter is said to be good for one ... Eddy.
Edward Welbourne wrote:
Since [] is only used for lists in Python, we could define the RE '\[[a-zA-Z0-9_.]+\]' for our purposes and raise an exception in case the enclosed reference cannot be mapped to a symbol in the global namespace (note: no whitespace, no commas) which either evaluates to a function, method, module or reference object.
umm ... hang on, two things seem stirred up here. The proposal I remember from ages ago and tried to echo has [token] and the token doesn't have to be intelligible to the python engine: elsewhere in the doc string, we'll have
References: [token] reference text
which the parsed docstring uses to decode each use of [token] that appeared in the docstring.
Right, but we extended the lookup notion to what David summarized in a recent post: I believe that the namespaces looked up should be: 1) the local namespace of the docstring -- i.e., the set of keywords defined in the "References" keyword block in the current docstring. 2) the global namespace of the docstrings -- i.e. the set of keywords defined in the "References" keyword block in the MODULE docstring. 3) The global Python namespace for that module 4) Some namespace corresponding to builtins & unimported modules, yet ill-defined. + I would like to add: The looked up object will only be converted to a reference if it is either an object having a doc string, or a reference object (these are created through the Reference: section). In case this condition is not met, either a warning is issued or the [token] text is taken as is. + modify the RE to include hyphens: '\[[a-zA-Z0-9_.-]+\]' Given the above, [None] would then either cause a warning or be left in the doc string with no further magic applied. Other uses of square brackets would have to include at least one of the characters not allowed by the above RE, e.g. spaces. This makes mixing [references] and [ code, examples ] very simple and straight forward. As always, the details of how to convert the reference to markup should be left to a reference engine. We should focus on tokenizing first and only then start thinking about what to do with those tokens... e.g. automagically convert them to HTML anchors or whatever. AFAICT, we have these tokens and symbols: Keyword: A Keyword is a case-sensitive string which: - starts a paragraph - matches '^ *[a-zA-Z_]+[\-a-zA-Z_0-9]*: +' (Python identifiers with the addition of hyphens and which end with a : and one or more spaces) Keyword Block: A Keyword Block is a paragraph of text starting with a Keyword and followed by Single Line Text or a Text Block. Reference: A Reference is a case-sensitive string which: - matches '\[[a-zA-Z0-9_.-]+\]' (lookup as indicated above is left to the reference engine to implement) Single Line Text: Single Line Text is all remaining text on the current line. Text Block: A Text Block is a paragraph of indented text. Bullet Block: A Bullet Block is a paragraph of indented text using a bullet character as first non-whitespace character at the indention index. First Line: A line of text matching <RE for "name(args,kws) -> returns -- does"> Blank Lines: One or more lines of whitespace text. All Blocks may be nested (is this true?). Nesting is indicated by indention level. Anthing missing ? -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 30 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
Arg, splurgle. Sorry, earlier I was saying that I didn't see why [..] references shouldn't contain whitespace. Since then I've thought of all sorts of reasons (*especially* why I, personally, don't want them to contain line breaks, which is kind-of inevitably allowed if one allows whitespace and is writing automatically wrapped explanatory text...), so I take it back. I hereby change my vote to sticking with the alphanumerics plus hyphen plus underline that everyone else was agreeing with anyway (although I still don't want to use an RE to explain that to people!). Which is, of course: ref_label is: 'text' = AllIn alphanumeric + "_-" reference = Table is: Is "[" ref_label Is "]" although I wouldn't propose explaining it in those terms to "most people". Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.demon.co.uk/ My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) [I've read it twice. I've thought it over. I'm sending it anyway.]
On Tue, 30 Nov 1999, Edward Welbourne wrote:
Tibs said:
David (Ascher) - is it time to re-release your initial "docstring grammar" and I confess that's something I'd like to see too. After all, we have to have someone to play Gdo ...
I must have missed Tibs' posting. I agree, and I'll try to do that ASAP. --david
Mark Hammond writes:
* IMO, importing the module to extract this information is fine. For the 1% of cases where it is not and the author of the module needs to
No, it's not. Never trust someone else's code until you've read the documentation, and don't trust the documentation if they wrote it. Well, maybe *that's* going a little too far... but import is not acceptable. Using the parse tree also allows the order to be well-defined, while introspection doesn't allow that at all.
chance of it one day existing :-) Indeed, do it the simple way, and the first person who needs the parse-only option can help code it :-)
I maintain that the parse tree is the simple route to getting a reliable tool, and I am working on coding one. Neat, huh?
As a final note: The tool should be written with distinct "generate" and "collate" phases, simply to resolve the cross-references. It is unreasonable to expect that all cross-references will be capable of being resolved in a single pass. Note sure exactly what this means from an implementation POV, but it is important.
Easy from an implementation POV, and that's exactly my approach. (My intention is to be able to document entire packages at a time as well, rather than individual modules.) -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives
From: Fred L. Drake, Jr. <fdrake@acm.org>
Mark Hammond writes:
* IMO, importing the module to extract this information is fine. For the 1% of cases where it is not and the author of the module needs to
No, it's not. Never trust someone else's code until you've read the documentation, and don't trust the documentation if they wrote it. Well, maybe *that's* going a little too far... but import is not acceptable. Using the parse tree also allows the order to be well-defined, while introspection doesn't allow that at all.
We could argue this forever. Gendoc solved this by collecting info either way, based on a runtime switch. As an author running this tool obviously I trust the module/package to be imported and generate the docs that way. And of course C module doc strings will need this feature. Again, this is an application issue not a doc string grammar issue.
chance of it one day existing :-) Indeed, do it the simple way, and the first person who needs the parse-only option can help code it :-)
I maintain that the parse tree is the simple route to getting a reliable tool, and I am working on coding one. Neat, huh?
I hope everyone slinging code looks at pythondoc first. -Robin
Robin Friedrich writes:
We could argue this forever. Gendoc solved this by collecting info either way, based on a runtime switch. As an author running this tool obviously I trust the module/package to be imported and generate the docs that way. And of course C module doc strings will need this feature.
And I maintain that you can't get enough information from a C extension that implements new types. Method information is not available without instantiating the new objects, and doing that is difficult to determine how to do (or even if it's needed if the module doesn't export the newly-defined type objects).
Again, this is an application issue not a doc string grammar issue.
Aside from the information discoverability issue, I'm perfectly happy for someone to write a tool (or make pythondoc easily usable off the shelf) that uses import. I have no intention of stopping anyone from producing tools here! -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives
[Robin Friedrich, on runtime vs. compile-time docstring extraction]
We could argue this forever. Gendoc solved this by collecting info either way, based on a runtime switch. As an author running this tool obviously I trust the module/package to be imported and generate the docs that way. And of course C module doc strings will need this feature. Again, this is an application issue not a doc string grammar issue.
One issue: if I'm sloppy in my writing, I could easily have escape sequences like \n in the doc string that are expanded by the importing. E.g. the comments for asynchat contain sentences like # The handle_read() method looks at the input stream for the current # 'terminator' (usually '\r\n' for single-line responses, '\r\n.\r\n' # for multi-line output), calling self.found_terminator() on its # receipt. If we translate this into a doc string, either the doc string has to be a triple-quoted *raw* string, or we'll have to double all the backslashes, lest they be indistinguishable from real newlines: """ The handle_read() method looks at the input stream for the current 'terminator' (usually '\\r\\n' for single-line responses, '\\r\\n.\\r\\n' for multi-line output), calling self.found_terminator() on its receipt. """ Without doubling, this would become """ The handle_read() method looks at the input stream for the current 'terminator' (usually '\r ' for single-line responses, '\r .\r ' for multi-line output), calling self.found_terminator() on its receipt. """ Just an annoyance, but something that the tool needs to consider. (Now going back to trust-Fred-and-David mode :-) --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum writes:
One issue: if I'm sloppy in my writing, I could easily have escape sequences like \n in the doc string that are expanded by the importing. E.g. the comments for asynchat contain sentences like ... Just an annoyance, but something that the tool needs to consider.
I don't think the concern is likely to go away by using a parse-based approach. Get your docstrings write or you've got a real newline! Anyway it gets done, I don't think there's any call for tons of magic interpretation for the escapes you put in your docstring. We don't want to introduce the magic-python-docstring-extraction encoding. The only reasonable way I know to get the actual string from the parse tree is to yank the string representation of the token.STRING node and eval() it. Otherwise I have to re-write the escaped-string to Python-string conversion in Python for both raw and cooked strings. If I have to consider screwed up strings on top of that, we're back to the HTML-as-practiced problem. I've done that once, and don't plan to do it again!
(Now going back to trust-Fred-and-David mode :-)
That's right. Go back to sleep, go back to sleeep..... -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives
David Ascher writes:
Indeed. We've slyly conned someone who *does* know about parsing to do the tool. =)
What? You mean I have to go back to school for all those classes I skipped? Well, maybe if you cover tuition+salary, and ask nicely. Can I take a few classes on type theory as well? Actually, I started the code well before your proposal, but I've been too swamped with various non- and semi-related things to do much the last few days. But that's starting to look better, and I got a little more done last night. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives
All of the following are minor nit-pickings, because it all looks VERY GOOD. (Personally, I'm not too worried about the tool as-such, I just want the grammar defined so I can use it!). David Ascher wrote:
I forgot two markups: *this* is bold and _this_ is italic. Bold and italic markups must begin and end within a paragraph (I'd say 'within a sentence' but I don't want to complicate the parser with a sentence type). No space allowed between *'s and _'s and their contents.
And I hope it's also possible to nest them arbitrarily, with some "sensible" effect (yes, this *is* useful in english text, and I would not want to lose it in documentation!). [Technically, that's a viewer problem, but I want the grammar to *say* this can be done, so the software writers have an onus on them to cope with it.] Marc-Andre Lemburg wrote:
I'd suggest using '^ *[a-zA-Z_]+[a-zA-Z_0-9]*: *' as RE for keywords, i.e. keywords are Python identifiers immediatly followed by a colon starting a line of a doc string. That should avoid most complications, I guess.
Sounds sensible to me - the advantages outweigh the disadvantages. On Tim Peters' test texts - I think this is actually an important enough idea that it might warrant its own keyword - perhaps "TestScript" (no, I know that's clumsy) - thus giving subliminal encouragement to the concept (hmm - must use it someday, he said guiltily). This would also allow us to distinguish odd chunks of code which are NOT test scripts (a new ability, since at the moment the tester will try to use all >>> text?), which I think could sometimes be useful... David Ascher wrote:
How about another keyword?
List: * foo * bar * spam
I would vote against that, firstly on the grounds that it doesn't read well, and secondly that it is probably the sort of thing that people wouldn't do (!). As with what others think, I believe we can hack lists without the keyword (is this now the consensus?). In another message, David continued:
I propose that part of the definition of a keyword is (along with any special parsing rules) whether it can be duplicated in a docstring.
Hmm - then I think we're going to need some serious support in "The Standard Editors" to give a hint about whether something can be included more than once, since I have a sneaky feeling we're getting quite a lot of keywords (is it about 7 things that humans remember easily?). On the other hand, modulo the clever peoples' time, I rate that as "not a problem". NB: how picky is the tool going to be about getting the indentation exactly right? I'm not fussed by it being very picky, but I know I'm odd that way. David Hammond votes for doing lists by detecting the bullets (good), but I'd like to reserve more than two characters (hyphen and asterisk are OK, but I do sometimes use 3 level lists, and would like another one - on the other hand, I'm not sure what other than @ and he wants that for something else... hmm - if we're not worried by hyphen confusing us with negative numbers, maybe plus would be sensible). I also tend to agree with Davids Hammond and Ascher that [ and ] are very valuable AS TEXT. The use of @..@ is visually very obvious to me, which is presumably a good thing in context, so I also vote for that (gosh, I've just voted positively for something delimited by the same character at start and end - obviously the start of the slippery road to hell). Whilst I don't know owt about parsing (well, more precisely, parse trees scare me), I don't see any of the proposals so far as giving any great problems with extracting information from the text. David (Ascher) - is it time to re-release your initial "docstring grammar" email with the comments you're happy with edited in? I *really* don't have time to do it, or I already would... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.demon.co.uk/ My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) [I've read it twice. I've thought it over. I'm sending it anyway.]
On Mon, 29 Nov 1999, Edward Welbourne wrote:
Manuel: if David includes `Keyword::' in his bits and pieces, would
Keyword:: indexing keyword data retrieval searching
(within a doc-string) contain the information you've been wanting to take out of your
\indexaboutindexing \indexaboutkeyword \indexretrieval \indexsearching
No. They're completely different things. In fact there's consensus, David insists on "bullets" for args and javadoc-ish things, and I insist on Encyclopedia-Higher-Level-python-stuff. My system , currently, is for marking/sorting "general" info.
I know you have bits that define an indexing command that expands to several indexing commands, which this lacks: but could the same effect be arrived at by turning your set of indexing command definitions into an `expert system' that expands some keywords ?
Yes, expert system are fine, but the greatest difficulty with my proposal is that people *must* input hundreds of attributed info if we want to anything useful. Expert system is phase 2, when we have to group indexes and extract info. Regards/Saludos Manolo www.ctv.es/USERS/irmina /TeEncontreX.html /texpython.htm /SantisimaInquisicion/index.html Everything in this book may be wrong. -- Messiah's Handbook : Reminders for the Advanced Soul
Proposed format for docstrings:
The whitespace at the beginning of a docstring is ignored.
Paragraphs are separated by one or more blank lines.
For compatibility with Guido, IDLE and Pythonwin (and increasing the likelihood that the proposal will be accepted by GvR), the docstrings of callables must follow the following convention established in Python's builtins:
>>> print len.__doc__ len(object) -> integer
Return the number of items of a sequence or mapping.
The only thing I'd _maybe_ suggest in order to allow some structure is to eliminate the non-keyword sections: >>> print len.__doc__ sig:: len(object) -> integer desc:: Return the number of items of a sequence or mapping. I know this loses a bit from the point of view of the user's readability, but it would provide some structure which increases the author's flexibility, and makes conversion to "library format" easier. Otherwise, your proposal seems a good start.
Miscellaneous Thoughts:
I chose double-colon notation for keywords so that one can have text paragraphs which match the 'word:' notation without having them be interpreted as keywords.
There are other conventions that would work, but '::' is as good as any.
Does this proposal make docstrings whitespace-heavy -- the requirement to break each paragraph with a line of whitespace means that a lot of lines are blank, especially when doing 'bulleted lists'
I would suggest dropping the requirement, which can be done if everything is keyword-modified.
The above was (quickly) written with parsing in mind. Is it really easily parseable? If not, what needs to be changed so that it is parseable?
I see no major parsing problems. Bullets might be a bit of a bore, but nothing to kill progress.
Are there normal uses in docstrings where one wants to turn off the automatic link detection?
I think we can come up with a basic escaping mechanism for this. Maybe by preceding not-to-be-processed URLs and link keywords with '!'.
Is there value in having string interpolation? David Arnold mentioned
__version__ = "$Revision$[11:-2] __date__ = "$Date$
I'd say leave this to a later version.
PS: It goes without saying that while I railed against design by committee, I am of course hopeful for feedback, for technical reasons (dummy, you forgot special cases X, Y and Z!) and because I realize that a standards proposal needs at least broad agreement if not consensus to be effective in the long run. The sharper-eyed will note that I stacked the deck in my favor in the above proposal by including what Guido does naturally as valid in the proposed grammar.
Damn the politics. Full speed ahead. -- Uche Ogbuji FourThought LLC, IT Consultants uche.ogbuji@fourthought.com (970)481-0805 Software engineering, project management, Intranets and Extranets http://FourThought.com http://OpenTechnology.org
participants (14)
-
Daniel Larsson -
David Ascher -
Edward Welbourne -
Fred L. Drake, Jr. -
Guido van Rossum -
M.-A. Lemburg -
Manuel Gutierrez Algaba -
Mark Hammond -
Paul Prescod -
Robin Friedrich -
Skip Montanaro -
Tim Peters -
Tony J Ibbs (Tibs) -
uche.ogbuji@fourthought.com