This is a VERY VERY rough draft of a PEP. The idea is that there should be some formal way that reST parsers can differentiate (in docstrings) between variable/function names and identical English words, within comments.
<br>
<div class="gmail_quote"><div class="gmail_quote"><br>PEP: XXX<br>Title: Catching unmarked identifiers in docstrings<br>Version: 0.0.0.0.1<br>Last-Modified: 23-Aug-2007<br>Author: Jameson Quinn <firstname dot lastname at gmail>
<br>Status: Draft
<br>Type: Informational
<br>Content-Type: text/x-rst<br>Created: 23-Aug-2007<br>Post-History: 30-Aug-2002<br><br><br>Abstract<br>========<br><br>This PEP makes explicit some additional ways to parse docstrings and comments<br>for python identifiers. These are intended to be implementable on their own or
<br>as extensions to reST, and to make as many existing docstrings<br>as possible usable by tools that change the visible<br>representation of identifiers, such as translating (non-english) code editors<br>or visual programming environments. Docstrings in widely-used modules are
<br>encouraged to use \`explicit backquotes\` to mark identifiers which are not<br>caught by these cases.<br><br>THIS IS AN EARLY DRAFT OF THIS PEP FOR DISCUSSION PURPOSES ONLY. ALL LOGIC IS<br>INTENTIONALLY DEFINED ONLY BY EXAMPLES AND THERE IS NO REFERENCE IMPLEMENTATION
<br>UNTIL A THERE ARE AT LEAST GLIMMERINGS OF CONSENSUS ON THE RULE SET.<br><br><br>Rationale<br>=========<br><br>Python, like most computer languages, is based on English. This can<br>represent a hurdle to those who do not speak English. Work is underway
<br>on Bityi_, a code viewer/editor which translates code to another language<br>on load and save. Among the many design issues in Bityi is that of<br>identifiers in docstrings. A view which translates the identifiers in
<br>
code, but leaves the untranslated identifier in the docstrings, makes<br>the docstrings worse than useless, even if the programmer has a <br>rudimentary grasp of English. Yet if all identifiers in docstrings are<br>translated, there is the problem of overtranslation in either direction.
<br>It is necessary to distinguish between the variable named "variable",<br>which should be translated, and the comment that something is "highly<br>variable", which should not. <br><br>.. _Bityi: <a href="http://wiki.laptop.org/go/Bityi" target="_blank">
http://wiki.laptop.org/go/Bityi</a><br><br>Note that this is just one use-case; syntax coloring and docstring hyperlinks are another one. This PEP is not the place for a discussion of all the pros<br>and cons of a translating viewer.
<br><br>PEP 287 standardizes reST as an optional way to markup docstrings. <br>This includes the possibility of using \`backquotes\` to flag Python<br>identifiers. However, as this PEP is purely optional, there are many<br>
cases of identifiers in docstrings which are not flagged as such. <br>Moreover, many of these unflagged cases could be caught programatically. <br>This would reduce the task of making a module internationally-viewable,<br>
or hyperlinkable, considerably.<br><br>This syntax is kept relatively open to allow for reuse with <br>other programming languages.<br><br><br>Common cases of identifiers in docstrings<br>=========================================
<br><br>The most common case is that of lists of argument or<br>method names. We call these "identifier lists"::<br><br> def register(func, *targs, **kargs):<br> """register a function to be executed someday
<br><br> func - function to be called<br> targs - optional arguments to pass<br> kargs - optional keyword arguments to pass<br> """<br><br> #func, targs, and kargs would be recognized as identifiers in the above.
<br> <br> class MyClass(object):<br>
"""Just a silly demonstration, with some methods:<br> <br> thisword : is a class method and you can call<br> it - it may even return a value.<br> <br> As with reST, the associated text can have
<br> several paragraphs.<br> <br> BUT - you can't nest this construct, so BUT isn't counted.<br> anothermethod: is another method.<br> eventhis -- is counted as a method.<br>
<br> anynumber --- of dashes are allowed in this syntax<br> <br> But consider: two words are NOT counted as an identifier.<br> <br> things(that,look,like,functions): are functions (see below)<br>
<br> Also, the docstring may have explanatory text, below or by<br> itself: so we have to deal with that.<br> Thus, any paragraph which is NOT preceded by an empty line<br> or another identifier list - like "itself" above - does not count
<br> as an identifier.<br> """<br> #thisword, anothermethod, eventhis, anynumber, and things would be <br> #recognized as identifiers in the above.<br><br>Another case is things which look like functions, lists, indexes, or
<br>dicts::<br><br> """<br> afunction(is,a,word,with,parentheses)<br>
[a,list,is,a,bunch,of,words,in,brackets]<br> anindex[is, like, a, cross, between, the, above]<br> {adict:is,just:words,in:curly, brackets: likethis}<br> """<br> #all of the above would be recogniszed as identifiers.
<br> <br>The "syntax" of what goes inside these is very loose. <br>identifier_list ::= [<initial_word>]<opening_symbol> <content_word> {<separator_symbol> <content_word>} <closing symbol>
<br>, with no whitespace after initial_word, and where separator_symbol is the set of symbols ".,<>{}[]+-*^%=|/()[]{}" MINUS closing_symbol. content_word could maybe be a quoted string, too.
<br>In the "function name", no whitespace<br>is allowed, but the symbols ".,*^=><-" are. Thus::<br><br> """<br> this.long=>function.*name(counts, and: so |do| these {so-called] arguments)
<br> {but,you - cant|use[nested]brackets{so,these,are.identifiers}but,these,arent}<br> {heres.an.example.of."a string, no identifiers in here",but.out.here.yes}<br> {<a href="http://even.one.pair.of.words.with.no" target="_blank">
even.one.pair.of.words.with.no</a> symbols.means.nothing.here.is.an.identifier}<br> Any of these structures that open on one line {but.close.on.<br> the.next} are NOT counted as identifiers. <br> """
<br> #in the above: lines 1,2,and the parts of 3 outside the quotes <br> #would be recognized as identifiers<br><br>The above flexibility is intended to cover the various possibilities for<br>argument lists in a fair subset of other languages. Languages which use only
<br>whitespace for argument separation are not covered by these rules.
<br> <br>The final case is words that are in some_kind of mixedCase. These are only<br>optionally counted as identifiers if they are also present as an identifier OUTSIDE<br>the comments somewhere in the same file.<br>
<br>Doctest and preformatted reST sections should be considered as 100% python
<br>code and treated as identifiers (or keywords).<br><br>Recommended use<br>===============<br><br>The rules above are designed to catch the large majority of identifiers<br>already present in docstrings, while applying only extremely rarely to words
<br>that should properly be considered as natural language. However, they are <br>inevitably imperfect. All docstrings of modules intended for wide use should<br>manually fix all cases in which these rules fail. If the rules underapply,
<br>you can use either \`back quotes\` or parentheses() to mark words as<br>identifiers; if they overapply and reformatting tricks don't fix the<br>problem, <SOME DIRECTIVE TO TURN OFF ALL THIS LOGIC FOR A STRING>
<br><br>Optional use inside comments or non-docstring strings<br>=====================================================<br><br>Comments<br>--------<br><br>Comments or blocks of comments alone on consecutive lines should be able,
<br>optionally, to use these same tricks to spotlight identifiers.<br><br>Other strings<br>-------------<br><br>I'm not sure yet what the rules should be here. One option I'm considering<br>is to be able to turn on all the above logic with some evil hack such
<br>as '' 'a string like this, concatenated with an empty string'.<br><br><br>Copyright<br>=========<br><br>This document has been placed in the public domain.<br><br><br> <br>..<br> Local Variables:<br>
mode: indented-text<br> indent-tabs-mode: nil<br> sentence-end-double-space: t<br> fill-column: 70<br> coding: utf-8<br> End:<br><br>
</div><br>
</div><br>