Pydoc Improvements / Rewrite
I started to discuss this on the new python-ideas list because I thought it may need a bit more completion before bringing it here. But It was suggested that it really does belong here as it is something that would be nice to have in python 2.6. So I'm reposting the introduction here. The still very rough source files can be downloaded from: http://ronadam.com/dl/_pydoc.zip There is still much to do, but I think having some experienced feed back on where it should go is important. Cheers, Ron Adam ps.. Please disregard the website for now, it's purpose was to share with family and friends, although I will probably share the python source to the ledger program there and other python toys and experiments (for fun) in the near future. [Introduction From python-ideas list] Improving pydoc has been suggested before by me and others. I've been working on a version that is probably 80% done and would like to get feed back at this point to determine if I'm approaching this in the best way. Basically pydoc.py consists of over 2,000 lines of python code and is not well organized inside which means (my thoughts) it will pretty much stay the way it is and not be as useful as it could be. Currently pydoc.py has the following uses. * It is imported and used as pythons console help function. * It can be used to generate help text files. * It can open a tkinter search index and from that launch a web server and a browser to veiw a html help page. * It can be used to generate static html help files. It looks (to me) like splitting it into two modules would be good. One module for just the text help and introspection functions, and the other for the html server and html output stuff. [It was suggested on python-ideas that making it a package may be good.] 1. pyhelp.py - Pythons help function from inside the console, and running it directly would open an interactive text help session. 2. _pydoc.py - Python html help browser. This starts an html server and opens a web page with a modules/package index. The html page headers would contain the current Python version info and the following inputs fields. * get field - directly bring up help on an object/module/or package. * Search field - returns a search results index. * Modules button - modules index page * Keywords button - keywords index page * Help button - pydoc Help instructions, version, and credits info. Note: The leading underscore "_pydoc.py" is to keep it from clashing with the current pydoc version. It probably should be something else. An additional feature is clicking on a filename opens up a numbered source listing. Which is nice for source code browsing and for referencing specific lines of code in python list discussions. ;-) The colors, fonts and general appearance can be changed by editing the style sheet. The output is readable as plain (outline form) text if the style sheet is ignored by the browser. _pydoc.py imports pyhelp and uses it to do the introspection work and extends classes in pyhelp to produce html output. I've tried to make pyhelp.py useful in a general way so that it can more easily be used as a base that other output formats can be built from. That's something that can't be done presently. These improvements to pydoc mean you can browse pythons library dynamically without ever leaving the web browser. Currently you switch back and forth between the browser and a tkinter index window. Something I found to be annoying enough to discourage me from using pydoc. The version I'm working on is split up into eight python files, each addressing a particular function of pydoc. That was to help me organize things better. These will be recombined into fewer files. Some parts of it could be moved to other modules if they seem more generally useful. For example, the console text pager could be used in many other console applications. Things that still need to be done are adding the object resolution order output back in. And adding inter-page html links back in. And a few other things I just haven't gotten to yet. I got a bit burned out on this project a while back, and then moved to a new house.. etc.. etc.. But I'm starting to have more time, and with the current discussion s turning on to refining pythons library this seems like it would be a useful tool in that effort. Any comments on this general approach? Any suggestions, questions, or comments? I'll post a link to the zipped files in the next day or two and announce it here. I need to look into a glitch on my computer first that's probably a windows path/name problem. I don't think it's anything that needs to be fixed in the files but I want to be sure. Cheers, Ron Adam
Ron Adam wrote:
Improving pydoc has been suggested before by me and others. I've been working on a version that is probably 80% done and would like to get feed back at this point to determine if I'm approaching this in the best way. Just asking--are you going in a PEP-287-ly way as you work? If not, would your work make PEP 287 easier to implement?
For those of us without eidetic memories, PEP 287 is "use reStructuredText for docstrings": http://www.python.org/dev/peps/pep-0287/ Cheers, /larry/
Larry Hastings wrote:
Just asking--are you going in a PEP-287-ly way as you work? If not, would your work make PEP 287 easier to implement?
Pydoc does no reformatting or changes to doc strings. They are displayed "as is" in plain text. About the only formatting that is done is to wrap long lines a bit better, such as 100 character length lines on a 80 character (or less) console window. In those cases, it tries to maintain the indent and break lines on white space. The html pages produced also makes html, rfc, and pep referfences into links. One of the goals is to make it easer to use it as a base for generating other types of formats. So it should also make it easier for someone (else) to implement a PEP-287 extended version for their own needs.
For those of us without eidetic memories, PEP 287 is "use reStructuredText for docstrings": http://www.python.org/dev/peps/pep-0287/
Thanks for the link. PEP 287 looks to be fairly general in that it expresses a general desire rather than a specification. Ron
Ron Adam wrote:
Larry Hastings wrote:
For those of us without eidetic memories, PEP 287 is "use reStructuredText for docstrings": http://www.python.org/dev/peps/pep-0287/
Thanks for the link. PEP 287 looks to be fairly general in that it expresses a general desire rather than a specification.
Apologies for the digression, but I have a comment on this. Rather than fixing on a standard markup, I would like to see support for a __markup__ module variable which specifies the specific markup language that is used in that module. Doc processors could inspect that variable and then load the appropriate markup translator. Why? Because its hard to get everyone to agree on which markup language is best for documentation. I personally think that reStructuredText is not a good choice, because I want to add markup that adds semantic information, whereas reStructuredText deals solely with presentation and visual appearance. (In other words, I'd like to be able to define machine-readable metadata that identifies parameters, return values, and exceptions -- not just hyperlinks and text styles.) Having used a lot of different documentation markup languages, and written a few of them, I prefer "non-invasive" semantic markup as seen in markup processors such as Doc-o-matic and NaturalDocs. (By non-invasive, I mean that the markup doesn't detract in any way from the readability of the marked-up text. Doc-o-matic's markup language is very powerful, and yet unless you know what you are looking for you'd think its just regular prose.) I have a prototype (called "DocLobster") which does similar types of processing on Python docstrings, but I haven't publicized it because I didn't feel like competing against ReST. However, I realize that I'm in the minority with this opinion; I don't want to force anyone to conform to my idea of markup, but at the same time I'd prefer not to have other people dictate my choice either. Instead, what I'd like to see is a way for multiple markup languages to coexist and compete with each other on a level playing field, instead of one being chosen as the winner. -- Talin
Ron Adam wrote:
Thanks for the link. PEP 287 looks to be fairly general in that it expresses a general desire rather than a specification. I thought it was pretty specific. I'd summarize PEP 287 by quoting entry #1 from its "goals of this PEP" section:
Rather than fixing on a standard markup, I would like to see support for a __markup__ module variable which specifies the specific markup language that is used in that module. Doc processors could inspect that variable and then load the appropriate markup translator. I guess I'll go for the whole-hog +1.0 here. I was going to say +0.8, citing "There should be one---and preferably only one---obvious way to do it.". But I can see organizations desiring something besides ReST,
* To establish reStructuredText as a standard structured plaintext format for docstrings (inline documentation of Python modules and packages), PEPs, README-type files and other standalone documents. Talin wrote: like if they already had already invested in their own internal standardized markup language and wanted to use that. This makes the future clear; the default __markup__ in 2.6 would be "plain", so that all the existing docstrings work unmodified. At which point PEP 287 becomes "write a ReST driver for the new pydoc". Continuing my dreaming here, Python 3000 flips the switch so that the default __markup__ is "ReST", and the docstrings that ship with Python are touched up to match---or set explicitly to "plain" if some strange necessity required it. (And when do you unveil DocLobster?) Cheers, /larry/
Larry Hastings wrote:
Ron Adam wrote:
Thanks for the link. PEP 287 looks to be fairly general in that it expresses a general desire rather than a specification. I thought it was pretty specific. I'd summarize PEP 287 by quoting entry #1 from its "goals of this PEP" section:
* To establish reStructuredText as a standard structured plaintext format for docstrings (inline documentation of Python modules and packages), PEPs, README-type files and other standalone documents.
Rather than fixing on a standard markup, I would like to see support for a __markup__ module variable which specifies the specific markup language that is used in that module. Doc processors could inspect that variable and then load the appropriate markup translator. I guess I'll go for the whole-hog +1.0 here. I was going to say +0.8, citing "There should be one---and preferably only one---obvious way to do it.". But I can see organizations desiring something besides ReST,
Talin wrote: like if they already had already invested in their own internal standardized markup language and wanted to use that.
This makes the future clear; the default __markup__ in 2.6 would be "plain", so that all the existing docstrings work unmodified. At which point PEP 287 becomes "write a ReST driver for the new pydoc". Continuing my dreaming here, Python 3000 flips the switch so that the default __markup__ is "ReST", and the docstrings that ship with Python are touched up to match---or set explicitly to "plain" if some strange necessity required it.
(And when do you unveil DocLobster?)
Well, I'd be more interested in working on it once there's something to plug it into - I didn't really want to write a whole pydoc replacement, just a markup transformer. One issue that needs to be worked out, however, is the division of responsibility between markup processor and output formatter. Does a __markup__ plugin do both jobs, or does it just do parsing, and leave the formatting of output to the appropriate HTML / text output module? How does the HTML output module know how to handle non-standard metadata? Let me give an example: Suppose you have a simple markup language that has various section tags, such as "Author", "See Also", etc.: """ Description: A long description of this thing whatever it is. Parameters: fparam - the first parameter sparam - the second parameter Raises: ArgumentError - when invalid arguments are passed. Author: Someone See Also: PyDoc ReST """ So the parser understands these various section headings - how does it tell the HTML output module that 'Author' is a section heading? Moreover, in the case of "Parameters" and "Exceptions", the content of the section is parsed as a table (parameter, description) which is stored as a list of tuples, whereas the content of the "Description" section is just a long string. I guess the markup processor has to deliver some kind of DOM tree, which can be rendered either into text or into HTML. CSS can take over from that point on. -- Talin
On Friday 05 January 2007 02:49, Talin wrote:
One issue that needs to be worked out, however, is the division of responsibility between markup processor and output formatter. Does a __markup__ plugin do both jobs, or does it just do parsing, and leave the formatting of output to the appropriate HTML / text output module? How does the HTML output module know how to handle non-standard metadata?
There's already __docformat__; see: http://www.python.org/dev/peps/pep-0258/#choice-of-docstring-format -Fred -- Fred L. Drake, Jr. <fdrake at acm.org>
On Thu, 4 Jan 2007, Talin wrote:
One issue that needs to be worked out, however, is the division of responsibility between markup processor and output formatter. Does a __markup__ plugin do both jobs, or does it just do parsing, and leave the formatting of output to the appropriate HTML / text output module? How does the HTML output module know how to handle non-standard metadata? [...] I guess the markup processor has to deliver some kind of DOM tree, which can be rendered either into text or into HTML. CSS can take over from that point on.
If the markup processor is going to deliver a tree, let me just point out that it would be a pretty major project to define the format of that tree -- about as large as inventing ReST or any other markup language, except that the design of such an intermediate format has to foresee future changes to the input and be flexible enough to target multiple output formats. The design would also have to tackle the question of whether the intermediate format should contain semantic information (what about cross-references?) and what types of such information should be allowed (e.g. names of modules, arguments, exceptions, Python expressions, etc.) -- ?!ng
2007/1/5, Ka-Ping Yee <python-dev@zesty.ca>:
On Thu, 4 Jan 2007, Talin wrote:
One issue that needs to be worked out, however, is the division of responsibility between markup processor and output formatter. Does a __markup__ plugin do both jobs, or does it just do parsing, and leave the formatting of output to the appropriate HTML / text output module? How does the HTML output module know how to handle non-standard metadata? [...] I guess the markup processor has to deliver some kind of DOM tree, which can be rendered either into text or into HTML. CSS can take over from that point on.
If the markup processor is going to deliver a tree, let me just point out that it would be a pretty major project to define the format of that tree -- about as large as inventing ReST or any other markup language, except that the design of such an intermediate format has to foresee future changes to the input and be flexible enough to target multiple output formats. The design would also have to tackle the question of whether the intermediate format should contain semantic information (what about cross-references?) and what types of such information should be allowed (e.g. names of modules, arguments, exceptions, Python expressions, etc.)
Wouldn't it be conceivable to have the processing of the markup performed by a separate function, that could eventually be overridden/passed as a parameter when specific needs regarding the markup are needed ? L.
-- ?!ng _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lgautier%40gmail.com
Talin wrote:
Rather than fixing on a standard markup, I would like to see support for a __markup__ module variable which specifies the specific markup language that is used in that module. Doc processors could inspect that variable and then load the appropriate markup translator.
Ideally, a module should be able to specify what *documentation provider* to use. Not everyone wants to stuff everything into docstrings, and, especially if you're building larger components, automatic introspection simply doesn't work very well. fwiw, I have hacks for PythonDoc that monkey-patches "inspect" to provide "virtual docstrings", but it would be nice to have an official API for this. It doesn't have to be much more complicated than: def __inspect__(path, format_hint=None): ... return format, data, subpaths where path is a dotted path to the target object, and format_hint is a preferred format.
Why? Because its hard to get everyone to agree on which markup language is best for documentation. I personally think that reStructuredText is not a good choice, because I want to add markup that adds semantic information, whereas reStructuredText deals solely with presentation and visual appearance.
And does a rather bad job at that too (the "squint if you don't want to see the markup" approach is fundamentally flawed), but that's another story for another forum. </F>
On Friday, January 05, 2007, at 02:30PM, "Fredrik Lundh" <fredrik@pythonware.com> wrote:
Talin wrote:
Rather than fixing on a standard markup, I would like to see support for a __markup__ module variable which specifies the specific markup language that is used in that module. Doc processors could inspect that variable and then load the appropriate markup translator.
Ideally, a module should be able to specify what *documentation provider* to use. Not everyone wants to stuff everything into docstrings, and, especially if you're building larger components, automatic introspection simply doesn't work very well.
+lots on that. This is not only true for larger components but projects like wxpython, pyqt and pyobjc could also use these hooks to add links to the C version of those libraries (I don't know about pyqt but wxpython and pyobjc both have choosen to document the differences with the C library instead of trying to duplicate their work). Ronald
Talin wrote:
Ron Adam wrote:
Larry Hastings wrote:
For those of us without eidetic memories, PEP 287 is "use reStructuredText for docstrings": http://www.python.org/dev/peps/pep-0287/ Thanks for the link. PEP 287 looks to be fairly general in that it expresses a general desire rather than a specification.
Apologies for the digression, but I have a comment on this.
Rather than fixing on a standard markup, I would like to see support for a __markup__ module variable which specifies the specific markup language that is used in that module. Doc processors could inspect that variable and then load the appropriate markup translator.
Why? Because its hard to get everyone to agree on which markup language is best for documentation. I personally think that reStructuredText is not a good choice, because I want to add markup that adds semantic information, whereas reStructuredText deals solely with presentation and visual appearance. (In other words, I'd like to be able to define machine-readable metadata that identifies parameters, return values, and exceptions -- not just hyperlinks and text styles.) Having used a lot of different documentation markup languages, and written a few of them, I prefer "non-invasive" semantic markup as seen in markup processors such as Doc-o-matic and NaturalDocs. (By non-invasive, I mean that the markup doesn't detract in any way from the readability of the marked-up text. Doc-o-matic's markup language is very powerful, and yet unless you know what you are looking for you'd think its just regular prose.) I have a prototype (called "DocLobster") which does similar types of processing on Python docstrings, but I haven't publicized it because I didn't feel like competing against ReST.
However, I realize that I'm in the minority with this opinion; I don't want to force anyone to conform to my idea of markup, but at the same time I'd prefer not to have other people dictate my choice either.
Instead, what I'd like to see is a way for multiple markup languages to coexist and compete with each other on a level playing field, instead of one being chosen as the winner.
How about if plain text be the default, with the ability to over ride it to generate another type of output? This is pretty much the design I'm following. It doesn't choose any markup style or have a preference other than plain text. Basically you import the gettext module, then add methods to the class's for each section to produce the marked up output you want. Then add a few functions to assemble it into a page. I'm sure there's room for improvements, but this seems like the most direct way to do it. Pydoc doesn't just process doc strings. If that was all there was to it, then it would be much easier to automate in a generalized way. Each section, may get it's data from different sources other than doc strings, and those may need to be handled in special ways, especially in the cases where it is nested within other sections. This is what makes it difficult to generalize into a sequential producer to consumer pattern. Not weather or not the doc string has additional markup in it. I think the main goal for 2.6 should be a cleaned up package with some modest user interface and API improvements. But also to keep things open ended for later enhancements. [Maybe for python 3.0] While reading some of the other discussions on multi-methods, generic functions, and ABC's with mixins, it occurred to me that pydoc may be a good place to test some of those ideas in. It's complex enough of a problem that it may benefit from those more advanced python features. But I think that we shouldn't wait for those to do a basic face lift now. Think of it a preliminary clean up if you will in this case. Ron
No time to review this now, but I'd just like to say that the 1 thing I'd like to see is support for decent mathematical markup. I think at this point that support for latex markup is the way to achieve this.
Neal Becker wrote:
No time to review this now, but I'd just like to say that the 1 thing I'd like to see is support for decent mathematical markup. I think at this point that support for latex markup is the way to achieve this.
There are two separate issues related to this I'd like to point out because some of the other suggestions have indicated both of these without spelling out which they are addressing. (1.) Processing existing text markup from additional text hints that are inserted into doc strings. I think this can easily be handled with a single text output point. General *post* formatters of this type are very doable I think. We just need to document the function or method to get the *raw* plain text for that purpose. (2.) Parsing and inserting additional markup where there is none, based on what and where the information came from. This is the more difficult problem. I've tried to handle this case by creating an object for each "thing", that can be extended by adding a formatting method to it. This type of markup can be very specific and may depend on context as well as what or where the source data came from. I don't know latex markup, but it seems like mathematical latex markup might be done either way. Cheers, Ron
participants (9)
-
Fred L. Drake, Jr.
-
Fredrik Lundh
-
Ka-Ping Yee
-
Larry Hastings
-
Laurent Gautier
-
Neal Becker
-
Ron Adam
-
Ronald Oussoren
-
Talin