[Doc-SIG] which characters to use for docstring markup
Guido van Rossum
guido@digicool.com
Fri, 06 Apr 2001 16:06:19 -0500
> I've been a bit busy lately, but I'm still working on coming up with
> a good markup language for docstrings...
>
> I was trying to figure out which characters should be used for
> markup.. (e.g., to delimit colored regions, etc). And so I wrote
> a script to see who often different characters are used in
> docstrings, using all the docstrings in the standard library (well,
> actually, in /usr/local/lib/python2.0/*.py) as a "representative"
> sample.
You should also look into /usr/local/lib/python2.0/*/*.py -- that's a
vast collection of code, e.g. Tkinter.py.
[Table omitted]
> 1. Any character(s) that are used for markup will have to be either
> backslashed/quoted whenever they are used, or will have to be
> only allowed in literal blocks. Clearly, we want to keep either
> of these to a minimum.
> 2. These results suggest that using perldoc style coloring, like
> B<this>, may not be the best idea, given that '<' and '>' are
> used so often. This is because people often talk about orderings
> between elements, like x>y. We might be better off using B{this}
> instead. '<' and '>' are used 53 times more frequently than
> '{' and '}'.
But you counted single characters. I grepped for '[A-Z]<' and found
none that occurred in docstrings. (The actual re should be
r'\B[A-Z]<'; I believe the POD rules ask for a single upper case
letter before the <.
Now, there's one significant use of [A-Z]< that might trip us up: the
regular expression syntax (?P<...>...). I certainly could see this
being useful in docstrings for methods that take regular expression
argument. There's also one use of [A-Z]{: \N{...} means something in
Unicode literal syntax.
> 3. It makes much more sense to use "`" rather than "'" for
> literals, since "'" occurs 18 times more often. Of course, we
> would probably want to use *either* "`" for literals *or*
> something like L{literal} or C{code} or whatever.
I don't like `...`, because (a) it means something very specific in
Python (and in the Unix shell), (b) it's hard to distinguish from
'...' in some fonts, and (c) except for the `...` Python and shell
notation, I expect ` to be closed with '.
> 4. You should keep in mind that any of these characters will be used
> in the docstring for *something* (well, actually, I was surprised
> to see a backspace in a docstring..).
Where?
> So, for the most part, it's
> a matter of inconveniencing the least number of people the least
> amount of time..
>
> I'm leaning towards using either::
>
> C{code}, E{emph} etc.
>
> or::
>
> `literal` and *one* *word* *emph* (and that's it)
>
> to color code in my markup. Any comments?
I still like C<code> and *multi word emph* better. :-)
--Guido van Rossum (home page: http://www.python.org/~guido/)