[Doc-SIG] which characters to use for docstring markup

Guido van Rossum guido@digicool.com
Fri, 06 Apr 2001 16:06:19 -0500


> I've been a bit busy lately, but I'm still working on coming up with
> a good markup language for docstrings...
> 
> I was trying to figure out which characters should be used for
> markup..  (e.g., to delimit colored regions, etc).  And so I wrote 
> a script to see who often different characters are used in 
> docstrings, using all the docstrings in the standard library (well, 
> actually, in /usr/local/lib/python2.0/*.py) as a "representative" 
> sample.

You should also look into /usr/local/lib/python2.0/*/*.py -- that's a
vast collection of code, e.g. Tkinter.py.

[Table omitted]

> 1. Any character(s) that are used for markup will have to be either
>    backslashed/quoted whenever they are used, or will have to be
>    only allowed in literal blocks.  Clearly, we want to keep either
>    of these to a minimum.  
> 2. These results suggest that using perldoc style coloring, like
>    B<this>, may not be the best idea, given that '<' and '>' are
>    used so often.  This is because people often talk about orderings
>    between elements, like x>y.  We might be better off using B{this} 
>    instead.  '<' and '>' are used 53 times more frequently than
>    '{' and '}'.

But you counted single characters.  I grepped for '[A-Z]<' and found
none that occurred in docstrings.  (The actual re should be
r'\B[A-Z]<'; I believe the POD rules ask for a single upper case
letter before the <.

Now, there's one significant use of [A-Z]< that might trip us up: the
regular expression syntax (?P<...>...).  I certainly could see this
being useful in docstrings for methods that take regular expression
argument.  There's also one use of [A-Z]{: \N{...} means something in
Unicode literal syntax.

> 3. It makes much more sense to use "`" rather than "'" for 
>    literals, since "'" occurs 18 times more often.  Of course, we
>    would probably want to use *either* "`" for literals *or*
>    something like L{literal} or C{code} or whatever.

I don't like `...`, because (a) it means something very specific in
Python (and in the Unix shell), (b) it's hard to distinguish from
'...' in some fonts, and (c) except for the `...` Python and shell
notation, I expect ` to be closed with '.

> 4. You should keep in mind that any of these characters will be used
>    in the docstring for *something* (well, actually, I was surprised
>    to see a backspace in a docstring..).

Where?

>    So, for the most part, it's
>    a matter of inconveniencing the least number of people the least
>    amount of time..
> 
> I'm leaning towards using either::
> 
>     C{code}, E{emph} etc.
> 
> or::
> 
>     `literal` and *one* *word* *emph* (and that's it)
> 
> to color code in my markup.  Any comments?

I still like C<code> and *multi word emph* better. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)