Hi all, This last Friday I had a chance to talk to Tom Abel and Oliver Hahn (both CC'd on this message) about their experiences with using yt, and they brought up some points which I've now had a chance to think about, and which I find very interesting, certainly as something to discuss. Here are my notes on it, along with a proposal for moving forward. As a quick note, what really hit home that we need better documentation was trying to make a thin projection. The definition of what a 'source' could be wasn't there, there were no examples, and I had to go look at the source to figure out what the parameters were even called. I think that's not ... good. Python Inline Documentation =========================== One of the coolest things about Python is the help() function, which prints out the function signature and the contents of the doc string. In the source code, the docstring is inline in the function, like so: def some_function(a, b, c): """ This function does something. """ return a+b+c The output of help(some_function) would look like this:
help(some_function) Help on function some_function in module __main__:
some_function(a, b, c) This function does something.
Generated Documentation ======================= The yt docs are generated using an extension to Sphinx called autodoc. What this does, as you can see by going to the API docs and clicking "view source" (which, counterintuitively, displays the doc source and not the source code of the functions) is at documentaion build time, pull all the docstrings from the source and render them in the document. Ideally, we would want something that renders nicely as well as looks good in the inline help -- and to maximize the detail without becoming encumbering. For most of the functions in yt that have docstrings, they have been written in a narrative style, with parameters inside asterisks, so that they would render nicely in the API docs: http://yt.enzotools.org/doc/modules/amrcode.html#yt-lagos-outputtypes-output... But, it's becoming clear that perhaps this is not the best approach. I think a combination of narrative and explicit parameter declaration would be better. The NumPy/SciPy projects have a CodingStandards description: http://projects.scipy.org/numpy/wiki/CodingStyleGuidelines that covers docstrings, with a very detailed example of a completely filled out docstring here: http://svn.scipy.org/svn/numpy/trunk/doc/example.py As an example, the 'tensorsolve' function is defined here: http://svn.scipy.org/svn/numpy/trunk/numpy/linalg/linalg.py and the API docs are here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.tensorsolve... This looks great, I think. yt is a bit more class-oriented than NumPy, but I believe that we should strive for a similar level of detail as well as a similar style: presenting parameters, what those parameters can be, and a brief word on the return type. Ideal Type Of Documentation =========================== A few weeks ago, Tom and I were chatting and he mentioned to me a Pascal manual. In this manual, there was a single function on every page: a description, parameters (often repeated between functions, but explicitly listed for each), and an example. My first Unix manual was exactly like this, and I remember it being one of the best sets of documentation I've ever used. I believe this is the model NumPy and SciPy are striving for, as well. I think this is what yt should strive for, too. One page per class or function, with a description, parameters, and examples -- just like mentioned above. In doing so, I think that the online help -- which right now is sort of helpful, but not amazingly helpful, would become much more useful. The fact that on the mailing lists we get questions asking us about fundamental operations in yt is, I think, an indictment of the way it's presented. As the Enzo Workshop revs up, a couple of us will be writing talks about using Enzo, using yt, etc, and I think this is a time to harness that momentum to reorganize and rewrite some of the doc strings. Of course, I would take the lead on the initial rewrite, as I'm the one who wrote all the bad docstrings. What does everyone think about this? Action Items ============ (It wouldn't be a long email about procedures if we didn't use a buzzword like 'action items' :) Firstly: a vote and a request for comments. Do we want to agree on the NumPy standard for docstrings? What does everyone think about this idea, of a set of docstring guidelines, and trying to focus on a better set of API documentation, to be used both in generated form and inline via help()? If we can agree on the NumPy standard, I believe that I should be able to convert most of the docstrings with some relative ease; it's mostly going to be a matter of typing, copy/pasting, etc. I will copy a style guide into doc/, which will be largely taken from the NumPy style guide, but I will additionally add a document with examples for common strings: I would prefer we have a single, consistent manner for referring to things like AMR3DData as a source, for instance. I will then go through and convert all the doc strings that I am familiar with. This would leave us with three files: * Example docstring, which can be read in verbatim and edited. * List of yt idioms for cross-referencing and describing things. * File describing this standard, largely pulling from the NumPy standard. The next thing will be, going forward, how do we ensure that the doc strings are correctly inserted with new code? I am more guilty of this than I would care to admit (I sometimes fall into the camp of thinking that functions with well-named parameters are self-documenting, which is probably a mistake!) but I think having someone agree to review incoming changesets for documentation updates, and then to email the committer if they do not have a sufficient docstring. My inclination is to suggest that someone who already reviews incoming changesets to do this, which I think means either me, Sam or Stephen. Sam, would you be willing to take this on? It should be relatively straightforward. Additionally, would anyone volunteer to help me out with rewriting some of the existing docstrings? In particular, for code you have contributed? The End ======= I think that if we really take the docstrings seriously, then the documentation on the whole will vastly improve. I am in the process of rewriting some sections, removing the old-style tutorial and trying to better walk the user through the process of getting up and running. The current documentation has a lot of information, but it's not very good at getting people up and running in anything other than the most simple manner. I think that getting started on improving the docstrings will also help refocus efforts toward better documentation on the whole. And, I'd like to end by admitting culpability for the sorry state of the docstrings we currently have. But I think this might be good, in the long run, because it'll help out with getting us on track for a better code that's much easier to use! And finally, thanks to Tom and Oliver for taking the time to chat with me about this -- I really appreciate their thoughtful feedback on this. Best, Matt