[Twisted-Python] doc bloat
Jean-Paul raised a salient point about documentation on a ticket <https://twistedmatrix.com/trac/ticket/4804#comment:21> recently. To quote that: We seem to be going the direction of "document every possible thing" these days. This stems from good intentions but the result is ever more bloated developer documentation of which any individual contributor has an ever shrinking knowledge. Rather than continuing to block the docs even further ... I think we need to get serious about pursuing a different strategy - for example, making twistedchecker a piece of software we could actually maintain and the output of which we could actually rely on (this really is just an example - the notion of a tool that tells you every single thing that's wrong with a piece of software is, of course, quite enticing - but perhaps unachievable). I think we've been moving in the direction of making twistedchecker maintainable, although it does still present some challenges. For example, I've been wanting to eliminate this <https://github.com/twisted/twistedchecker/issues/75 <https://github.com/twisted/twistedchecker/issues/75>> for a while, but I just haven't been able to figure it out. I am also thinking that we may want to create a category of private implementation details that don't require docstring coverage. In a public API, every parameter, every attribute, every detail should have accompanying prose and type annotations. In the innards of an implementation, these details can crowd out the code they're supposed to be elucidating. As a first approximation, I think we could ask twistedchecker to stop enforcing docstring requirements for objects directly matching a "private" naming pattern. Thoughts? -glyph
On 30 Nov 2014, at 23:09, Glyph <glyph@twistedmatrix.com> wrote:
Jean-Paul raised a salient point about documentation on a ticket recently. To quote that: We seem to be going the direction of "document every possible thing" these days. This stems from good intentions but the result is ever more bloated developer documentation of which any individual contributor has an ever shrinking knowledge. Rather than continuing to block the docs even further ... I think we need to get serious about pursuing a different strategy - for example, making twistedchecker a piece of software we could actually maintain and the output of which we could actually rely on (this really is just an example - the notion of a tool that tells you every single thing that's wrong with a piece of software is, of course, quite enticing - but perhaps unachievable).
I think we've been moving in the direction of making twistedchecker maintainable, although it does still present some challenges. For example, I've been wanting to eliminate this <https://github.com/twisted/twistedchecker/issues/75> for a while, but I just haven't been able to figure it out. I am also thinking that we may want to create a category of private implementation details that don't require docstring coverage. In a public API, every parameter, every attribute, every detail should have accompanying prose and type annotations. In the innards of an implementation, these details can crowd out the code they're supposed to be elucidating. As a first approximation, I think we could ask twistedchecker to stop enforcing docstring requirements for objects directly matching a "private" naming pattern. Thoughts? -glyph
This sounds like a good idea. How do we tackle things that should be documented in some form (ie. important implementation details), that do need docstring-like things, but should not be exposed to users? For example, http://twistedmatrix.com/documents/current/api/twisted.web.http.Request.html... , to an application developer, has no use, and is just cluttering up the docs, but may want to have a docstring for code documentation purposes, rather than user documentation purposes. It’s like, 11pm here, so apologies if that made zero sense :) -hawkie
On Nov 30, 2014, at 16:09, Glyph <glyph@twistedmatrix.com> wrote:
I think we've been moving in the direction of making twistedchecker maintainable, although it does still present some challenges. For example, I've been wanting to eliminate this <https://github.com/twisted/twistedchecker/issues/75 <https://github.com/twisted/twistedchecker/issues/75>> for a while, but I just haven't been able to figure it out.
Well, I guess we have been moving in that direction after all. I figured it out, there's a PR here: <https://github.com/twisted/twistedchecker/pull/86 <https://github.com/twisted/twistedchecker/pull/86>> -glyph
As a Twisted API users, I don't find a big problem the fact that __repr__ docstring is presented by pydoctor together with the other method, As a Twisted contributor, I don't like when I have to copy/paste docstring from an interface or from an inherited class ... or too explicity state that the type of None is NoneType... and then to add a link to it :)
As a first approximation, I think we could ask twistedchecker to stop enforcing docstring requirements for objects directly matching a "private" naming pattern.
Is there a ticket/issue for that? What does `stop enforcing docstring requirements` mean? I think that docstring are important, and testing code or private methods is still code which should be maintained and it should have docstrings, but it would be nice not to ask for strict @param @type @return @rtype. ------- +1 for getting serious about twistedcker ... I would be happy to help with twistedchecker, as well as with the other automated tools. What do you say if we put all problems/issues/comments here https://github.com/twisted/twistedchecker/issues and have 2 or 3 people who can decide if an issues is accepted, or help define the right behaviour or reject it and then also review the pull requests. Right now there are many "undecided" or not tagged tickets, and I don't know which one is a real problem or not. One important requirement to move things forward is to provide quick feedback, both for the initial issue report and the review. Waiting months to get feedback from a patch is not fun and is a sign that core developers don't care about the newly proposed changes. ---------- +1 for getting serious about the other code checkers: pyflakes and pydoctor. Right now, is a PITA to run all these tools on your local computer ... Cheers -- Adi Roiban
On 10:02 am, adi@roiban.ro wrote:
As a Twisted API users, I don't find a big problem the fact that __repr__ docstring is presented by pydoctor together with the other method,
As a Twisted contributor, I don't like when I have to copy/paste docstring from an interface or from an inherited class ...
I hate it when people do this, too. I never want anyone to do this. When I review changes in which this appears, I ask the author to change the submission to include useful, non-redundant documentation instead. The coding standard doesn't explicitly say that you should not copy the docstring from an implementation to an interface - but this is a good example of the kind of thing our documentation can't be expected to cover. Copying an interface docstring to an implementation is just one of the many possible bad practices a developer could engage in; a list of all possible bad practices would make the developer documentation infinitely long. To the larger issue, I think the most productive thing to do would be to actually look at our code base and see if there are glaring examples of useless documentation. If we can generalize from these examples to a rule, we can add that rule to the project policy (preferably encoded in a tool, not as prose in the developer documentation). I'm a little bit tempted to agree with the `__repr__` case. People who are reading the API documentation might not be put off by documentation for these methods but are they helped in any way? I think that most of the time they probably aren't. I'm basing this on times I can recall having written a `__repr__` docstring and how I had to struggle to put useful words into that docstring instead of pointless boilerplate. I still think it would be good to collect a bunch of actual `__repr__` docstrings from Twisted and see if such evidence supports this conclusion.
or too explicity state that the type of None is NoneType... and then to add a link to it :)
It sounds like there's agreement on this point, at least. For what it's worth, this seems uncontroversial to me: the type of `None` is `NoneType` and the only instance of `NoneType` is `None`. Documenting one or the other conveys all possible information there is to convey so documenting one should be enough. Even if we wanted the rendered version of the documentation for this case to be redundant, pydoctor can synthesize that redundancy - it doesn't need the source material to be redundant.
As a first approximation, I think we could ask twistedchecker to stop enforcing docstring requirements for objects directly matching a "private" naming pattern.
Is there a ticket/issue for that? What does `stop enforcing docstring requirements` mean?
I think that docstring are important, and testing code or private methods is still code which should be maintained and it should have docstrings, but it would be nice not to ask for strict @param @type @return @rtype.
Eh. Test code has to be maintained just like anything else. As the de facto specification for most features, it's perhaps (but only perhaps) more important to have everything clearly documented in a test suite. I think that requiring parameter and return type documentation in the test suite is a very good thing. However, requiring the same degree of documentation for *all* code may not be necessary. If a test method defines a simple nested function, perhaps that doesn't need a docstring. This is another thing we could inspect the current Twisted codebase for. Are there lots of useless docstrings on nested function definitions purely for the sake of twistedchecker? Or are there undocumented nested functions that are actually a little bit difficult to understand on their own? Jean-Paul
-------
+1 for getting serious about twistedcker ...
I would be happy to help with twistedchecker, as well as with the other automated tools.
What do you say if we put all problems/issues/comments here https://github.com/twisted/twistedchecker/issues and have 2 or 3 people who can decide if an issues is accepted, or help define the right behaviour or reject it and then also review the pull requests.
Right now there are many "undecided" or not tagged tickets, and I don't know which one is a real problem or not.
One important requirement to move things forward is to provide quick feedback, both for the initial issue report and the review. Waiting months to get feedback from a patch is not fun and is a sign that core developers don't care about the newly proposed changes.
----------
+1 for getting serious about the other code checkers: pyflakes and pydoctor. Right now, is a PITA to run all these tools on your local computer ...
Cheers -- Adi Roiban
On Dec 2, 2014, at 20:05, exarkun@twistedmatrix.com wrote:
Are there lots of useless docstrings on nested function definitions purely for the sake of twistedchecker? Or are there undocumented nested functions that are actually a little bit difficult to understand on their own?
twistedchecker does not presently require nested function definitions to have docstrings. I recently merged a fix to an incongruity where it was requiring this of classes defined within functions: <https://github.com/twisted/twistedchecker/commit/4af4e97f99d6e5f683b65272a8dbe7bce2087aa7>. So this one, at least, we can cross off for the future :). -glyph
On 12:55 am, glyph@twistedmatrix.com wrote:
On Dec 2, 2014, at 20:05, exarkun@twistedmatrix.com wrote:
Are there lots of useless docstrings on nested function definitions purely for the sake of twistedchecker? Or are there undocumented nested functions that are actually a little bit difficult to understand on their own?
twistedchecker does not presently require nested function definitions to have docstrings. I recently merged a fix to an incongruity where it was requiring this of classes defined within functions: <https://github.com/twisted/twistedchecker/commit/4af4e97f99d6e5f683b65272a8dbe7bce2087aa7>. So this one, at least, we can cross off for the future :).
The broader context of this suggestion was that we should inspect the codebase to see what policy changes would improve the quality of the code/documentation while reducing the effort required to develop and maintain it. It sounds like you have some ideas about such changes already. Does that mean you'd like to suggest them (presumably in the form of issues filed against twistedhecker) instead of doing this investigation? Jean-Paul
On Dec 3, 2014, at 18:46, exarkun@twistedmatrix.com wrote:
On 12:55 am, glyph@twistedmatrix.com wrote:
On Dec 2, 2014, at 20:05, exarkun@twistedmatrix.com wrote:
Are there lots of useless docstrings on nested function definitions purely for the sake of twistedchecker? Or are there undocumented nested functions that are actually a little bit difficult to understand on their own?
twistedchecker does not presently require nested function definitions to have docstrings. I recently merged a fix to an incongruity where it was requiring this of classes defined within functions: <https://github.com/twisted/twistedchecker/commit/4af4e97f99d6e5f683b65272a8dbe7bce2087aa7>. So this one, at least, we can cross off for the future :).
The broader context of this suggestion was that we should inspect the codebase to see what policy changes would improve the quality of the code/documentation while reducing the effort required to develop and maintain it.
It sounds like you have some ideas about such changes already. Does that mean you'd like to suggest them (presumably in the form of issues filed against twistedhecker) instead of doing this investigation?
In this one case, I believed I was simply bringing our tooling into line with existing policy. twistedchecker already doesn't require docstrings on callbacks, it seemed like a clear bug that it required docstrings on those callbacks if they existed inside a class statement. The letter of the policy is a little ambiguous, and perhaps it should be updated. <https://twistedmatrix.com/documents/current/core/development/policy/coding-s... <https://twistedmatrix.com/documents/current/core/development/policy/coding-standard.html#docstrings>> says docstrings "should" always be used, but it doesn't say "must". And, as written, it has no exception for callbacks, which clearly contravenes our existing tooling and practice. I do have a couple of other more specific ideas and I will almost certainly file twistedchecker issues for them :). More broadly speaking I think it would be great to have some investigation into our code-quality issues. I think that there are definitely lots of useless docstrings for the sake of twistedchecker, and I'm not quite sure how to classify those. -glyph
On 3 December 2014 at 00:55, Glyph <glyph@twistedmatrix.com> wrote:
On Dec 2, 2014, at 20:05, exarkun@twistedmatrix.com wrote:
Are there lots of useless docstrings on nested function definitions purely for the sake of twistedchecker? Or are there undocumented nested functions that are actually a little bit difficult to understand on their own?
twistedchecker does not presently require nested function
From my experience, even nested functions need a sentence to describe
them....there are many nested functions used as deferred callbacks and I prefer to have a sentence describing when they are called. For callback methods I still don't know whether I should name based on what they do or after the condition in which they are called. I prefer to name them after what they do, but also to document in the docstring the condition But I don't think that nested functions required extensive apidoc/pydoctor markup. ---------- In order to survey the current code, maybe we can create a wiki page, and while reading/writing/reviewing code we can extract examples and put them in the wiki page. -- Adi Roiban
participants (4)
-
Adi Roiban
-
exarkun@twistedmatrix.com
-
Glyph
-
HawkOwl