[Doc-SIG] which characters to use for docstring markup
Edward D. Loper
edloper@gradient.cis.upenn.edu
Fri, 06 Apr 2001 14:52:21 EDT
I've been a bit busy lately, but I'm still working on coming up with
a good markup language for docstrings...
I was trying to figure out which characters should be used for
markup.. (e.g., to delimit colored regions, etc). And so I wrote
a script to see who often different characters are used in
docstrings, using all the docstrings in the standard library (well,
actually, in /usr/local/lib/python2.0/*.py) as a "representative"
sample. Here are the results:
Character Count Module Count Character
--------------------------------------------------
1 1 ^H
10 3 ^M
11 4 ^
12 5 ~
13 10 {
13 10 }
16 6 %
28 7 $
48 12 ?
50 20 !
70 16 `
75 8 &
87 12 \
108 18 +
130 12 |
197 7 @
222 22 *
229 20 #
269 35 ]
277 36 [
313 44 =
331 53 ;
421 48 /
441 46 "
514 23 <
663 67 :
779 54 _
875 28 >
1302 75 '
1858 94 (
1874 94 )
2145 97 ,
2277 92 -
3413 110 .
1. Any character(s) that are used for markup will have to be either
backslashed/quoted whenever they are used, or will have to be
only allowed in literal blocks. Clearly, we want to keep either
of these to a minimum.
2. These results suggest that using perldoc style coloring, like
B<this>, may not be the best idea, given that '<' and '>' are
used so often. This is because people often talk about orderings
between elements, like x>y. We might be better off using B{this}
instead. '<' and '>' are used 53 times more frequently than
'{' and '}'.
3. It makes much more sense to use "`" rather than "'" for
literals, since "'" occurs 18 times more often. Of course, we
would probably want to use *either* "`" for literals *or*
something like L{literal} or C{code} or whatever.
4. You should keep in mind that any of these characters will be used
in the docstring for *something* (well, actually, I was surprised
to see a backspace in a docstring..). So, for the most part, it's
a matter of inconveniencing the least number of people the least
amount of time..
I'm leaning towards using either::
C{code}, E{emph} etc.
or::
`literal` and *one* *word* *emph* (and that's it)
to color code in my markup. Any comments?
-Edward
p.s., I'll probably have a preliminary description of my proposed
markup language in about 2 weeks.. I hope. :)