From fdrake@acm.org Sun Apr 1 16:55:48 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sun, 1 Apr 2001 11:55:48 -0400 (EDT) Subject: [Doc-SIG] Parro-DL -- new documentation project Message-ID: <15047.20356.586507.158341@beowolf.pythonlabs.org> Announcing a new joint documentation effort... Parro-DL Parrot Documentation Language With the public announcement of the development of Parrot (see http://use.perl.org/article.pl?sid=01/03/31/206248 and http://www.python.org/parrot.htm), a new documentation effort is being initiated to provide developer information on the new language and its libraries. Guido van Rossum and Larry Wall, joint creators of the new language, are both aware of the significance of quality documentation in the adoption of Parrot. Shortly after the decision to create Parrot, they enlisted Fred Drake and Tom Christiansen to begin work on the documentation system for Parrot. The two advocates of language and library documentation have collaborated privately for the past six months to design a new markup language that can be embedded into the language or used indepentently, similar to POD, but which allows richer semantic markup similar to the LaTeX-based markup used by the Python documentation project. Drake and Christiansen expect to release the reference manual for the new markup language, call Parro-DL (for "Parrot Documentation Language") within two weeks. The specification, which weighs in at about 150 typeset pages, was written in Parro-DL and is processed by new tools written using an early prototype interpreter for the Parrot language. The specification includes information on syntax, linguistic integration, and processing expectations. ISO standardization is expected to be complete in 3rd quarter of 2006. Drake and Christiansen are joining their efforts to organize a documentation project dedicated to producing free documentation for Parrot to avoid a monopoly on the reference documentation by the technical publisher O'Reilly. The effort will be subsidized by their new joint venture, Iterpolated Documentation Systems. Offices for the new firm will be located in Chicago. Drake's separation from PythonLabs came as a surprise to his colleagues there. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From vbruand@infonie.fr Mon Apr 2 15:17:36 2001 From: vbruand@infonie.fr (vbruand) Date: Mon, 2 Apr 2001 16:17:36 +0200 Subject: [Doc-SIG] LaTeX question References:
Message-ID: <001f01c0bb7f$ab138dc0$f375f2c3@letitbeii> Thanks a lot... I have tried already \~{} and a symbol called \sim ( or something like that, I have found it in the later help) I do really think it's a problem is the \url thing. Bye. ----- Original Message ----- From: "Edward Welbourne" To: Cc: Sent: Saturday, March 31, 2001 5:05 PM Subject: Re: [Doc-SIG] LaTeX question > > I don't understand what I should exactly write instead of tilde. ( > > because %7e counts as a remark nor does \symbol{"7e}) > Have you tried \~{} > > > the tilde character ('~') is mis-handled; > hrm. By the \url{} directive ? > > Anyhow, being confused by what you're saying, here's what's special > about tilde in TeX: > > The ~ character is TeX's non-breaking space. You can obtain a ~ accent > on a letter, e.g. n, by writing \~n or \~{n} and, in the second form, > you can use \~{} to give \~ nothing to put its accent on, so it gives > you a ~ character, of sorts. > > There may also be something like \tilde somewhere in TeX's huge > vocabulary of defined names, but I don't know it. > > However, the problem with \url{url} may be that the \url command does > some weird things to its arguments which make a mess of the results. > The answer in such a case would be to fix the definition of \url ... > anyone tell me where the relevant definition is in a .sty file or > similar and I'll see what I can do to it. LaTeX is infinitely flexible, > albeits internals nearly unmaintainably ugly. > > Eddy. > From guido@digicool.com Mon Apr 2 22:20:57 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 02 Apr 2001 16:20:57 -0500 Subject: [Doc-SIG] syntax vs semantics: implicit --> explicit In-Reply-To: Your message of "Fri, 30 Mar 2001 13:57:03 EST." References: Message-ID: <200104022120.QAA04326@cj20424-a.reston1.va.home.com> > (Last year I wrote a chapter on Python for Wrox Press' "Professional > Linux Programming". I would have been much happier using a complete > ST-like markup than futzing around in MSWord.) I've a feeling that one reason the doc-sig is going around in circles is the tension between the needs of formatting docstrings and the needs of formatting larger documents. For docstrings, there's an explicit requirement (although not everybody gives it the same weight) that the source is pleasantly readable as plain text. For authoring larger documents like your Wrox chapter, that argument doesn't have the same importance: wat you want is easy authoring, which is subtly but importantly different. I propose that this time around, we should focus on docstrings only, and not on authoring other documents, lest we never reach an agreement. (Aside: I'd like to know more about why you think Word didn't work for you; I wonder if it could be unfamiliarity with advanced Word features? When using styles properly, Word is quite a capable authoring tool -- depending, of course, on the processing done by the publisher.) --Guido van Rossum (home page: http://www.python.org/~guido/) From dgoodger@atsautomation.com Mon Apr 2 21:46:43 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Mon, 2 Apr 2001 16:46:43 -0400 Subject: [Doc-SIG] syntax vs semantics: implicit --> explicit Message-ID: GvR wrote: > I propose that this time around, we should focus on docstrings only, > and not on authoring other documents, lest we never reach an > agreement. Agreed. But how long is a docstring? :> I like to put mini-to-full man-pages in my programs (accessible through --help), with section headers; perhaps I'm the exception. > (Aside: I'd like to know more about why you think Word didn't work for > you; I wonder if it could be unfamiliarity with advanced Word > features? No, I'm quite comfortable with Word and its features (having actually *read* the fine manual ... some versions ago, I admit). > When using styles properly, Word is quite a capable > authoring tool -- depending, of course, on the processing done by the > publisher.) It was a matter of ill-defined styles in the publisher's stylesheet, Word quirks, cross-platform mangling, inter-version incompatibilities (I refuse to make M$ richer whenever they change a digit), and editorial bungling (I made sure all my curly quotes were right, and somebody converted them all to straight-quotes). Plus the stress of an all-nighter. Add a crash or two. I've never lost work due to an emacs crash! /DG From guido@digicool.com Tue Apr 3 03:05:24 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 02 Apr 2001 21:05:24 -0500 Subject: [Doc-SIG] syntax vs semantics: implicit --> explicit In-Reply-To: Your message of "Mon, 02 Apr 2001 16:46:43 -0400." References: Message-ID: <200104030205.VAA05100@cj20424-a.reston1.va.home.com> > > I propose that this time around, we should focus on docstrings only, > > and not on authoring other documents, lest we never reach an > > agreement. > > Agreed. But how long is a docstring? :> I like to put mini-to-full man-pages > in my programs (accessible through --help), with section headers; perhaps > I'm the exception. How important is it that those mini-man pages are readable as part of the source? I've sometimes put arbitrary data in Python programs, for bundling reasons, but not necessarily cared too much about how it looks in the Python source (or even how easily editable it is). I'm just trying to figure out if your requirements really fit in the requirements for docstrings. It still sounds like you're stretching things a bit. I'm trying to argue for a smaller set of requirements, so we can make progress. > It was a matter of ill-defined styles in the publisher's stylesheet, Word > quirks, cross-platform mangling, inter-version incompatibilities (I refuse > to make M$ richer whenever they change a digit), and editorial bungling (I > made sure all my curly quotes were right, and somebody converted them all to > straight-quotes). Plus the stress of an all-nighter. Add a crash or two. Sounds like a stretch to blame it on Word. --Guido van Rossum (home page: http://www.python.org/~guido/) From dgoodger@atsautomation.com Tue Apr 3 14:46:00 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Tue, 3 Apr 2001 09:46:00 -0400 Subject: [Doc-SIG] syntax vs semantics: implicit --> explicit Message-ID: > How important is it that those mini-man pages are readable as part of > the source? In the case of the man-pages, not terribly. They could be in a separate file. I guess I'm a proponent of the "keep it all together" school. One of the advantages of docstrings: they're *there*, easily accessible, no dependency on possibly nonexistant external files and all those headaches. > I'm trying to argue for a smaller set of requirements, so we can make > progress. You've convinced me. I hope to spend a few hours over the next week or so revising my spec, maybe even turning it into a PEP (to complement/compete with Tibs' & Edward L's work). Watch this space... > Sounds like a stretch to blame it on Word. Such an easy target! /DG From support@internetdiscovery.com Fri Apr 6 08:42:41 2001 From: support@internetdiscovery.com (Mike Clarkson) Date: Fri, 06 Apr 2001 00:42:41 -0700 Subject: [Doc-SIG] Python documentation in info? Message-ID: <3.0.6.32.20010406004241.007de100@popd.ix.netcom.com> Is there a version of the Python documentation in info format? I looked in the canonical place but couldn't find it. I tried regenerating the info from the html in the Doc tree, but the perl script to do it is missing some HTML:: packages. Does anyone know the right packages and versions for these? Will it work with these packages under Perl 5? Many thanks, Mike From edloper@gradient.cis.upenn.edu Fri Apr 6 19:52:21 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 06 Apr 2001 14:52:21 EDT Subject: [Doc-SIG] which characters to use for docstring markup Message-ID: <200104061852.f36IqMp07364@gradient.cis.upenn.edu> I've been a bit busy lately, but I'm still working on coming up with a good markup language for docstrings... I was trying to figure out which characters should be used for markup.. (e.g., to delimit colored regions, etc). And so I wrote a script to see who often different characters are used in docstrings, using all the docstrings in the standard library (well, actually, in /usr/local/lib/python2.0/*.py) as a "representative" sample. Here are the results: Character Count Module Count Character -------------------------------------------------- 1 1 ^H 10 3 ^M 11 4 ^ 12 5 ~ 13 10 { 13 10 } 16 6 % 28 7 $ 48 12 ? 50 20 ! 70 16 ` 75 8 & 87 12 \ 108 18 + 130 12 | 197 7 @ 222 22 * 229 20 # 269 35 ] 277 36 [ 313 44 = 331 53 ; 421 48 / 441 46 " 514 23 < 663 67 : 779 54 _ 875 28 > 1302 75 ' 1858 94 ( 1874 94 ) 2145 97 , 2277 92 - 3413 110 . 1. Any character(s) that are used for markup will have to be either backslashed/quoted whenever they are used, or will have to be only allowed in literal blocks. Clearly, we want to keep either of these to a minimum. 2. These results suggest that using perldoc style coloring, like B, may not be the best idea, given that '<' and '>' are used so often. This is because people often talk about orderings between elements, like x>y. We might be better off using B{this} instead. '<' and '>' are used 53 times more frequently than '{' and '}'. 3. It makes much more sense to use "`" rather than "'" for literals, since "'" occurs 18 times more often. Of course, we would probably want to use *either* "`" for literals *or* something like L{literal} or C{code} or whatever. 4. You should keep in mind that any of these characters will be used in the docstring for *something* (well, actually, I was surprised to see a backspace in a docstring..). So, for the most part, it's a matter of inconveniencing the least number of people the least amount of time.. I'm leaning towards using either:: C{code}, E{emph} etc. or:: `literal` and *one* *word* *emph* (and that's it) to color code in my markup. Any comments? -Edward p.s., I'll probably have a preliminary description of my proposed markup language in about 2 weeks.. I hope. :) From tim.one@home.com Fri Apr 6 20:10:21 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 6 Apr 2001 15:10:21 -0400 Subject: [Doc-SIG] which characters to use for docstring markup In-Reply-To: <200104061852.f36IqMp07364@gradient.cis.upenn.edu> Message-ID: [Edward D. Loper] > ... > 3. It makes much more sense to use "`" rather than "'" for ... In the font I'm using to view this email, I can't see the difference between those suggestions . > ... > I'm leaning towards using either:: > > C{code}, E{emph} etc. > > or:: > > `literal` and *one* *word* *emph* (and that's it) > > to color code in my markup. Any comments? I happen to like the former better, because it's extensible and unambiguous. It wasn't suitable for Perl because, e.g., $C{$i} is legit Perl code. And they invented the C<$i> notation before "->" (as in $A->[$i]) thingies were added to the language. X{...} is never legit Python syntax today, and seems very unlikely it ever will be. From guido@digicool.com Fri Apr 6 22:06:19 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 06 Apr 2001 16:06:19 -0500 Subject: [Doc-SIG] which characters to use for docstring markup In-Reply-To: Your message of "Fri, 06 Apr 2001 14:52:21 EDT." <200104061852.f36IqMp07364@gradient.cis.upenn.edu> References: <200104061852.f36IqMp07364@gradient.cis.upenn.edu> Message-ID: <200104062106.QAA15927@cj20424-a.reston1.va.home.com> > I've been a bit busy lately, but I'm still working on coming up with > a good markup language for docstrings... > > I was trying to figure out which characters should be used for > markup.. (e.g., to delimit colored regions, etc). And so I wrote > a script to see who often different characters are used in > docstrings, using all the docstrings in the standard library (well, > actually, in /usr/local/lib/python2.0/*.py) as a "representative" > sample. You should also look into /usr/local/lib/python2.0/*/*.py -- that's a vast collection of code, e.g. Tkinter.py. [Table omitted] > 1. Any character(s) that are used for markup will have to be either > backslashed/quoted whenever they are used, or will have to be > only allowed in literal blocks. Clearly, we want to keep either > of these to a minimum. > 2. These results suggest that using perldoc style coloring, like > B, may not be the best idea, given that '<' and '>' are > used so often. This is because people often talk about orderings > between elements, like x>y. We might be better off using B{this} > instead. '<' and '>' are used 53 times more frequently than > '{' and '}'. But you counted single characters. I grepped for '[A-Z]<' and found none that occurred in docstrings. (The actual re should be r'\B[A-Z]<'; I believe the POD rules ask for a single upper case letter before the <. Now, there's one significant use of [A-Z]< that might trip us up: the regular expression syntax (?P<...>...). I certainly could see this being useful in docstrings for methods that take regular expression argument. There's also one use of [A-Z]{: \N{...} means something in Unicode literal syntax. > 3. It makes much more sense to use "`" rather than "'" for > literals, since "'" occurs 18 times more often. Of course, we > would probably want to use *either* "`" for literals *or* > something like L{literal} or C{code} or whatever. I don't like `...`, because (a) it means something very specific in Python (and in the Unix shell), (b) it's hard to distinguish from '...' in some fonts, and (c) except for the `...` Python and shell notation, I expect ` to be closed with '. > 4. You should keep in mind that any of these characters will be used > in the docstring for *something* (well, actually, I was surprised > to see a backspace in a docstring..). Where? > So, for the most part, it's > a matter of inconveniencing the least number of people the least > amount of time.. > > I'm leaning towards using either:: > > C{code}, E{emph} etc. > > or:: > > `literal` and *one* *word* *emph* (and that's it) > > to color code in my markup. Any comments? I still like C
 and *multi word emph* better. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward@python.net  Fri Apr  6 22:22:17 2001
From: gward@python.net (Greg Ward)
Date: Fri, 6 Apr 2001 17:22:17 -0400
Subject: [Doc-SIG] which characters to use for docstring markup
In-Reply-To: <200104061852.f36IqMp07364@gradient.cis.upenn.edu>; from edloper@gradient.cis.upenn.edu on Fri, Apr 06, 2001 at 02:52:21PM -0400
References: <200104061852.f36IqMp07364@gradient.cis.upenn.edu>
Message-ID: <20010406172217.B2749@gerg.ca>

On 06 April 2001, Edward D. Loper said:
> I'm leaning towards using either::
> 
>     C{code}, E{emph} etc.
> 
> or::
> 
>     `literal` and *one* *word* *emph* (and that's it)
> 
> to color code in my markup.  Any comments?

I definitely prefer C{code} or E or E{emph} or whatever.  (I'm not
terribly concerned about the shape of the brackets used.)

Guido's point about C<> being OK because the (opening) regex would be
r'[A-Z]<' missed one nit: code samples like C 5> are ambiguous; the
">" would have to be escaped.  But then, with curly braces,
C{d = {'a': 37}} is ambiguous.

So whatever delimiter you pick -- and I can live with <> or {} -- there
must be a simple escaping mechanism.  I definitely prefer backslash to
POD's E hack: C 5> vs. C 5>.  The former is yucky, the
latter is super-yucky.  Backslash also means you can escape anything;
with POD's E<> escaping mechanism, there has to be alternate spelling
for any characters you want to escape, eg. "gt" for ">".  Yuck.

        Greg
-- 
Greg Ward - programmer-at-big                           gward@python.net
http://starship.python.net/~gward/
I used to be a FUNDAMENTALIST, but then I heard about the HIGH
RADIATION LEVELS and bought an ENCYCLOPEDIA!!


From guido@digicool.com  Fri Apr  6 23:31:38 2001
From: guido@digicool.com (Guido van Rossum)
Date: Fri, 06 Apr 2001 17:31:38 -0500
Subject: [Doc-SIG] which characters to use for docstring markup
In-Reply-To: Your message of "Fri, 06 Apr 2001 17:22:17 -0400."
 <20010406172217.B2749@gerg.ca>
References: <200104061852.f36IqMp07364@gradient.cis.upenn.edu>
 <20010406172217.B2749@gerg.ca>
Message-ID: <200104062231.RAA21298@cj20424-a.reston1.va.home.com>

> I definitely prefer C{code} or E or E{emph} or whatever.  (I'm not
> terribly concerned about the shape of the brackets used.)
> 
> Guido's point about C<> being OK because the (opening) regex would be
> r'[A-Z]<' missed one nit: code samples like C 5> are ambiguous; the
> ">" would have to be escaped.  But then, with curly braces,
> C{d = {'a': 37}} is ambiguous.

Good point!

> So whatever delimiter you pick -- and I can live with <> or {} -- there
> must be a simple escaping mechanism.  I definitely prefer backslash to
> POD's E hack: C 5> vs. C 5>.  The former is yucky, the
> latter is super-yucky.  Backslash also means you can escape anything;
> with POD's E<> escaping mechanism, there has to be alternate spelling
> for any characters you want to escape, eg. "gt" for ">".  Yuck.

I think I'd prefer to have to write \> for > than \} for }, so I
*still* prefer C<> over C{}.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From hernan@orgmf.com.ar  Fri Apr  6 23:21:38 2001
From: hernan@orgmf.com.ar (Hernan Martinez Foffani)
Date: Sat, 7 Apr 2001 00:21:38 +0200
Subject: [Doc-SIG] which characters to use for docstring markup
In-Reply-To: <200104062231.RAA21298@cj20424-a.reston1.va.home.com>
Message-ID: 

Eduard on a mini-markup language for docstrings propose:
"... either C{code}, E{emph} etc.
or `literal` and *one* *word* *emph* (and that's it)..."

Guido: "... I still like C and *multi word emph* better. :-)"

Tim: "... C{code}, E{emph} ..."

Greg: "... C{code} or E or E{emph} or whatever ..."

And it's just a two-tags markup language! :-)

Sorry guys, it's almost 1:00AM and couldn't resist!
:-)

Seriously now.
Here are my 2 cents just to fill the email with something other than a
stupid joke.
I don't care the tag syntax if:
	a) either not many more than the "two-tags ML", or
	I don't have to be aware of the rest of the tags while
	commenting code.
	b) escape = backslash

Again :-)
Have a nice weekend!
-Hernan

--
Hern�n Mart�nez Foffani
hernan@orgmf.com.ar
http://www.orgmf.com.ar/condor/




From fdrake@cj42289-a.reston1.va.home.com  Sat Apr  7 06:45:43 2001
From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake)
Date: Sat,  7 Apr 2001 01:45:43 -0400 (EDT)
Subject: [Doc-SIG] [development doc updates]
Message-ID: <20010407054543.226DD2879A@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/devel-docs/


Lots of small fixes, but also the first installment of the unittest
documentation.



From support@internetdiscovery.com  Sat Apr  7 22:21:48 2001
From: support@internetdiscovery.com (Mike Clarkson)
Date: Sat, 07 Apr 2001 14:21:48 -0700
Subject: [Doc-SIG] which characters to use for docstring markup
In-Reply-To: <200104062106.QAA15927@cj20424-a.reston1.va.home.com>
References: 
 <200104061852.f36IqMp07364@gradient.cis.upenn.edu>
Message-ID: <3.0.6.32.20010407142148.00847660@popd.ix.netcom.com>

At 04:06 PM 4/6/01 -0500, Guido van Rossum wrote:
>> I've been a bit busy lately, but I'm still working on coming up with
>> a good markup language for docstrings...

FYI, I have the HappyDoc formatter/docstring extractor
(happydoc.sourceforge.net),
generating standard Python documentation LaTeX from docstrings. It's kind
of nice, because it means that I immediately have all of the python.sty
features available
to me for crosseferencing etc., plus it immediately gives me my docstring
derived documents in PDF, PS, HTML, and info (if I can get info working
again).

Because the delimiters are the standard \backslash and curlys, it means
you can convieniently take advantage of r"""strings"" to protect the
backslash markup in docstrings:

def foo():
	r"""
	My \code{foo} function \emph{breaks} the
	\module{bar} module.\index{Foos and Bars}
	"""
	pass

Seems that this is legal Python, and allows HappyDoc to use Python
to get the class structure with the docstrings. Also, the TeX convention
of a blank line to mark a paragraph is really useful.

Would this work for what you are wanting to do with docstring markup?

The advantages I see is that:
1)	It's quite complete for all of the entended uses (\emph(...})
	Because it's more or less TexInfo compatible, most people know it
	or can learn it easily, even if you don't know LaTeX.
2)	It means you can cut and paste between docstrings and the formal
	module documentation for Doc/.
3)	The macros/commands are already completely documented, and the 
	documentation for them ships with the core distribution.
4)	It would reinforce the use of the Doc/ tools.
5)	It would reinforce the use of HappyDoc (semi-literate programmming).
6)	It would avoid yet another documentation markup language.
7)	It's likely to be mainly backward compatible - I doubt many
	docstrings use \ a lot. On the other hand, I bet a lot of
	them use blank line as a paragraph seperator.
8)	No need for syntax discussions, as it's already done :-)

Mike.





From guido@digicool.com  Sat Apr  7 23:54:48 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sat, 07 Apr 2001 17:54:48 -0500
Subject: [Doc-SIG] which characters to use for docstring markup
In-Reply-To: Your message of "Sat, 07 Apr 2001 14:21:48 MST."
 <3.0.6.32.20010407142148.00847660@popd.ix.netcom.com>
References:  <200104061852.f36IqMp07364@gradient.cis.upenn.edu>
 <3.0.6.32.20010407142148.00847660@popd.ix.netcom.com>
Message-ID: <200104072254.RAA24765@cj20424-a.reston1.va.home.com>

> At 04:06 PM 4/6/01 -0500, Guido van Rossum wrote:
> >> I've been a bit busy lately, but I'm still working on coming up with
> >> a good markup language for docstrings...

That's a bogus attribution.  I said no such thing.  Ed Loper did.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From edloper@gradient.cis.upenn.edu  Sun Apr  8 20:20:33 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Sun, 08 Apr 2001 15:20:33 EDT
Subject: [Doc-SIG] which characters to use for docstring markup
In-Reply-To: Your message of "Fri, 06 Apr 2001 16:06:19 CDT."
 <200104062106.QAA15927@cj20424-a.reston1.va.home.com>
Message-ID: <200104081920.f38JKXp25694@gradient.cis.upenn.edu>

> But you counted single characters.  I grepped for '[A-Z]<' and found
> none that occurred in docstrings.  (The actual re should be
> r'\B[A-Z]<'; I believe the POD rules ask for a single upper case
> letter before the <.

Well, presumably the occurance of '[A-Z]{' will be comperably small.
However, it's not just the open delimiters that we have to worry about.
You can't include a close delimiter in a colored region.  For example,
if you want to put "x > y" in a "code" region, then you can't::

   C y>

There's no way for it to know that the first ">" isn't a close delimiter.
Similarly for bold, etc.  Also, there's a question of how context-
sensitive we want our delimiters to be.  It may confuse people that
they can say::

   x Now, there's one significant use of [A-Z]< that might trip us up: the
> regular expression syntax (?P<...>...).  I certainly could see this
> being useful in docstrings for methods that take regular expression
> argument.  

This may be important, but I see it as less important than the issues
of using < and > to mean less-than and greater-than.

> There's also one use of [A-Z]{: \N{...} means something in
> Unicode literal syntax.

I agree that there will be cases where any character gets used.  But
I would argue that, in these cases, we should either use literal 
blocks (do you really need to say "\N{...}" in a paragraph? Maybe..)
or use some sort of backslashing.  (But again, let's come back to the
discussions of backslashing).

> I don't like `...`, because (a) it means something very specific in
> Python (and in the Unix shell), (b) it's hard to distinguish from
> '...' in some fonts, and (c) except for the `...` Python and shell
> notation, I expect ` to be closed with '.

I'm leaning more towards the L{...} syntax anyway.  Although I would
argue against b on the count that, if you're viewing it in a non-parsed
form, then you're viewing it in your source-code editor, and 
presumably you chose a font for your source-code editor in which you
can distinguish "'" and "`", since they mean very different things
in Python.  Even if you did choose a different font for your docstring
comments, you'd still like to be able to distinghish "'x'" and "`x`" 
when you read a doctest block.. so presumably you'd pick a font in
which you can..?

> > 4. You should keep in mind that any of these characters will be used
> >    in the docstring for *something* (well, actually, I was surprised
> >    to see a backspace in a docstring..).
> 
> Where?

In the sre module docstring, if I remember correctly.

> I still like C and *multi word emph* better. :-)

I was thinking of these as mutually exclusive.  If we're going to use
C or C{code} or whatever, we might as well use E.  No
need to go removing even more characters from docstring writers' 
repetoires.

(Incidentally, multiword emph can be somewhat dangerous..  Especially
if you let people have docstrings like "*hi" and "bye*", where the
'*'s get parsed as normal asterisks..  People will get confused..  The
problem is made worse if *multi word emphs* can span lines.. But then,
if they can't, then word-rewrapping won't work as expected. etc...)

-Edward



From edloper@gradient.cis.upenn.edu  Sun Apr  8 20:40:39 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Sun, 08 Apr 2001 15:40:39 EDT
Subject: [Doc-SIG] which characters to use for docstring markup
In-Reply-To: Your message of "Sat, 07 Apr 2001 14:21:48 PDT."
 <3.0.6.32.20010407142148.00847660@popd.ix.netcom.com>
Message-ID: <200104081940.f38Jedp27592@gradient.cis.upenn.edu>

> FYI, I have the HappyDoc formatter/docstring extractor
> (happydoc.sourceforge.net),
> generating standard Python documentation LaTeX from docstrings. It's kind
> of nice, because it means that I immediately have all of the python.sty
> features available
> to me for crosseferencing etc., plus it immediately gives me my docstring
> derived documents in PDF, PS, HTML, and info (if I can get info working
> again).

The only formatters I could find for HappyDoc use StructuredTextClassic,
or some variant.  And many people (incl. Guido) are not terribly happy
with ST.  Does the formatter you're talking about use something else?
What does it do about lists, etc?

> def foo():
> 	r"""
> 	My \code{foo} function \emph{breaks} the
> 	\module{bar} module.\index{Foos and Bars}
> 	"""
> 	pass

Most people have objected to "heavyweight" markup for docstrings.. 
i.e., they don't want to have to write docstrings in LaTeX or XML
or whatever..  It *looks* like you're basically just writing 
docstrings using some subset of LaTeX?  If so, we'd have to carefully
define *which* subset, and what everything means, etc., before I
would accept it.  We don't want people assuming that, just because
they can use \emph{...}, they can use all their other favorite LaTeX
commands (we do, after all, want it to be possible to convert this
to HTML, info pages, etc.)

> 1)	It's quite complete for all of the entended uses (\emph(...})
> 	Because it's more or less TexInfo compatible, most people know it
> 	or can learn it easily, even if you don't know LaTeX.

But you can't make it *too* "complete," or it won't be a standard that
people can write tools to process anymore..  We don't want to just
reimplement LaTeX here...

> 2)	It means you can cut and paste between docstrings and the formal
> 	module documentation for Doc/.
> 3)	The macros/commands are already completely documented, and the 
> 	documentation for them ships with the core distribution.
> 4)	It would reinforce the use of the Doc/ tools.

I actually am not too familiar with the Doc/ tools..  Can you give
me a pointer to them?  But copy/paste does seem useful.  (Although it 
should at least be possible to write conversion tools, in any case, 
given a good standard).

> 5)	It would reinforce the use of HappyDoc (semi-literate programmming).

Happydoc seems like a nice tool.  Whatever markup language we settle
on (if we ever do), a HappyDoc formatter will probably be implemented..

> 7)	It's likely to be mainly backward compatible - I doubt many
> 	docstrings use \ a lot. On the other hand, I bet a lot of
> 	them use blank line as a paragraph seperator.

I believe that trying to be "backward compatible" with a markup language
is an extremely dangerous thing to do, esp. if your markup language is
relatively "forgiving," because you probably won't *notice* the places
where it gets confused.  I would rather be explicitly non-backward-
compatible.

-Edward



From edloper@gradient.cis.upenn.edu  Sun Apr  8 20:59:12 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Sun, 08 Apr 2001 15:59:12 EDT
Subject: [Doc-SIG] backslashing
Message-ID: <200104081959.f38JxCp29343@gradient.cis.upenn.edu>

Backslashing has been coming up lately.  So let's go over it again. :)

I used to be strongly in favor of backslashing, using '\'.  After all,
it's *the* standard backslashing character, and clearly we need a
backslashing character, etc. etc...

Now I'm not so sure anymore.  The one big problem with backslashing, as
I see it, arises from the following principle, which has been guiding
a lot of docstring ML design:

      * Intuitiveness: The meaning of a well-formed formatted
        documentation string should be obvious to a reader, even if
        that reader is not familiar with the formatting conventions.

Now consider what happens if a newbie user prints out a formatted
docstring::

  >>> print somefunc.__doc__
  Somefunc will start an interactive session.  When you want
  to exit the session, simply type "\\exit"
  >>> 

Now, the user gets confused, and types "\\exit" instead of "\exit".

Anoter reasonable example might be::

  >>> print otherfunc.__doc__
  [...]
  C should use "\\d" for digits and "\\s" for
  whitespace.  C should not include "\\w".
  Example use:

      >>> otherfunc("\\s\\d")
      hi there!

  >>>

Note that this is entirely separate from issues of whether to use
r"..." strings for docstrings, etc..  The problem is that, when a
docstring is *printed*, it should be easy to interpret it.

Of course, this problem might very well go away if people stopped
printed docstrings, and started using tools (pydoc, etc), and those
tools decided to take care of these things for them..  But that's
probably a bit distant in the future right now.

So what's the alternative?  Don't allow markup characters in
paragraphs, and force docstring writers to put them in literal
blocks if they want to use them.  Instead of::
    def f(s):
        """
        Return the number of occurances of the string "{}"
        in s.

you would have to write::

    def f(s):
        """
        Return the number of occurances of the string::
           {}
        in s.

Is this better?  I'm not sure.  As someone else mentioned, it's
possible (like perldoc does?) to say {lb} and {rb} or  and 
or some such.  But that would probably be just as non-intuitive
as using backslashes..

Of course, if we decide that "intuitiveness" is not as important to
us, then maybe we can go ahead and use backslashing anyway...

-Edward



From edloper@gradient.cis.upenn.edu  Sun Apr  8 21:06:32 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Sun, 08 Apr 2001 16:06:32 EDT
Subject: [Doc-SIG] which characters to use for docstring markup
Message-ID: <200104082006.f38K6Wp00128@gradient.cis.upenn.edu>

Greg:
>> code samples like C 5> are ambiguous

Hm, I guess I should make sure I've read *all* my email before I 
start writing. :)

Guido:
>> I think I'd prefer to have to write \> for > than \} for }, so I
>> *still* prefer C<> over C{}.

Is this just because you think it looks nicer?  Or for some semblance
of compatability with perldoc?  Or is there some other reason?

I guess I'm not strongly commited to C{}, but I think that it would
inconvenience fewer people, since talking about order relationships
(gt and lt) is fairly common..

Also, it makes for a simpler system that's easier to understand/use
if you just say that delimiter characters are always delimiter
characters.

-Edward




From edloper@gradient.cis.upenn.edu  Sun Apr  8 22:46:49 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Sun, 08 Apr 2001 17:46:49 EDT
Subject: [Doc-SIG] Structuring rules
Message-ID: <200104082146.f38Lknp10218@gradient.cis.upenn.edu>

I've been working on designing a docstring markup language, based
losely on ST and a few other sources..  And I wanted to see what
people thought of the structuring rules so far.  Note that these
rules only talk about how to indicate the *structure* of a docstring,
not coloring (things like emph and inline code).

There are 9 structural blocks:
  - basic block: these blocks do not contain other blocks
    - paragraph: a paragraph of text.  paragraphs are the only place
      where coloring (emph, etc) can occur.
    - literal block: a block of unprocessed text, which will be 
      displayed as-is.
    - doctest block: a block containing python code, which can be
      used by doctest. 
    - heading: a single line of text, providing the heading for a 
      section.
  - hierarchical blocks: these blocks contain other blocks
    - list item: a single item of a list.  List items can contain
      paragraphs, literal blocks, doctest blocks, and lists.
    - list: a list.  Lists contain one or more list items.
    - section: a section or subsection of text.  Contains a heading
      followed by paragraphs, literal blocks, doctest blocks, lists,
      and sections).
    - field: a semantically tagged section of text.  It is used to
      describe specific aspects of an object, like the return value,
      a parameter to a function, or the authors of a module.  
      Contains paragraphs, literal blocks, doctest blocks, and lists.
    - top: the top-level.  contains paragraphs, literal blocks,
      doctest blocks, lists, sections, and fields.

In case you're not familiar with these blocks, here's an example::

  This is a one-line paragraph.

  This is a multi-line paragraph.  Paragraphs are usually
  separated by blank lines.

      - This is a list.
      - Lists consist of list items.  List items may span
        multiple lines.

        List items may contain multiple paragraphs.

  Blocks
  ======

  That was a top-level heading.  Here's a subheading:

  Literal Blocks
  --------------

  Literal blocks are introduced with double-colons, like this::

       Literal /
              / Block

  And end on the first line whose indentation is equal to or less
  than the indentation of the paragraph that introduced them.

  Doctest Blocks
  --------------

  Doctest blocks start with '>>> '.  Here's a doctest block:

    >>> print 1+2
    3

  author: This is a field.  This particular field should be used 
          to describe the author of the object documented.
  param x: Fields can take arguments.

So..  the markup language I'm defining makes a fair amount of use 
of the concept of "indentation."  Instead of defining it right now,
I'll just show it by example.  I'll worry about formalizing it later::

   This paragraph has an indentation of 3, since it is 
   preceeded by 3 spaces.

      >>> # This doctest block has an indentation of 6
      >>> print("   even if some of its lines are indented more")
         even if some of its lines are indented more
      >>> # Indentation of a doctest block is the indentation of
      >>> # its first line.

   The following literal block has an indentation of 4::
       Literal Block!
   That's one plus the indentation of the paragraph that introduced it.

      - This list has an indentation of six
      - That's because each list item is preceeded by 6
        spaces.  This list item has an indentation of 8.

        That's because each of its paragraphs has an indentation
        of 8.

   Heading
   =======

      That heading had an indentation of 3, since it was preceeded 
      by 3 spaces.

      This section has an indentation of 6, since each of its
      paragraphs has an indentation of 6.

   Heading2
   ========

   This section has an indentation of 3.

   author: Field indentation works just like list item indentation.

Now we can discuss what rules to put on indentation.  These rules can
be used when parsing to figure out where blocks start/end etc.  I
propose:
  - all paragraphs must be left-justified.  i.e., the indentation
    of each line in a paragraph must be the same.
  - the indentation of a paragraph must be equal to the indentation
    of the block that contains it.
  - the indentation of a list must be greater than or equal to the
    indentation of the block that contains it.  Although I might
    consider changing this to strictly greater than.
  - the indentation of a list item must be strictly greater than
    the indentation of the list.  In other words, the following type
    of list item is not allowed::

        - a list item where the indentation of the paragraph is
        equal to the indentation of the list.

  - the indentation of a field must be strictly greater than the
    indentation of the block that contains it.  Thus, the following
    is not allowed:

        field: a field where the indentation of the field is
        equal to the indentation of the block that contains it.

  - the indentation of a section must be greater than or equal to
    the indentation of the block that contains it.

But this leaves open the question of how to figure out the indentation 
of certain entities, such as:
  - a paragraph starting on the first line of a docstring
  - list-items with one-line paragraphs
  - list-items with one-line paragraphs followed only by sublists
  - fields with one-line paragraphs
  - fields with one-line paragraphs followed only by sublists

For now, I'll set aside the issue of dealing with the first line of a
docstring.  I see two basic options for dealing with the rest of the 
issues.
  1. The indentation of a list item is the number of characters
     before the first non-space character following the bullet.
     Thus, the following would be an invalid list item::

        - This is a list item, where the number of characters before
             the first non-space non-bullet character on the first 
             line doesn't match the indentation of the subsequent 
             lines.

     But you could say, for example::

         - List item
             - sublist item 1
             - sublist item 2
           Another paragraph in the top-level list-item.  Note that
           its identation matches "List item"'s indentation.

     You could also say something like::

         1.  A list item that
             spans multiple lines
         [...]
         10. Another list item.  Note that the use of an extra space
             in (1) makes this line up prettily.

  2. The indentation of a list item is indeterminant unless there
     is a paragraph that constrains it.  Thus, for example, you 
     could say::

         - This is a multiline list
             item.

         - List item
               - sublist item 1
               - sublist item 2
             Another paragraph in the top list-item.

I see 2 main problems with approach (1):
  - it doesn't work well if you try to use a non-monospaced font for
    docstrings, since it's hard to tell if it's "lined up."
  - it may not be convenient for labels::

        param x: you have to line up with
                 the first line, like this.

    You can't go like this::

        param x: a multiline description
            of parameter x
        return: a multiline description of 
            the return value

I see 1 main problem with approach (2):
    - if a list item contains a one-line paragraph, then the 
      list item's indentation is indeterminant, so you can't
      figure out the indentation of a child literal block.  E.g.::

          - list item::

                What's the indentation of this literal block??
      
            Is this another paragraph in the list item, or part of
            the literal block?

Thoughts/comments?

-Edward

p.s., requiring paragraphs to be justified, and requiring
lists to be indented, gets rid of the problem of accidentally 
word-wrapping a sentence ending in 1.  There's still a minor
problem if we go with approach (2), since you can't tell if
the second line is a list item or a continuation of the first
line in::

    - a list item with a sentence that ends in 
      1. That's not easy for humans to parse,
      either, though. :)




From support@internetdiscovery.com  Mon Apr  9 00:08:18 2001
From: support@internetdiscovery.com (Mike Clarkson)
Date: Sun, 08 Apr 2001 16:08:18 -0700
Subject: [Doc-SIG] which characters to use for docstring markup
In-Reply-To: <200104081940.f38Jedp27592@gradient.cis.upenn.edu>
References: 
Message-ID: <3.0.6.32.20010408160818.007c3db0@popd.ix.netcom.com>

>At 03:40 PM 4/8/01 EDT, Edward D. Loper wrote:
>> At 04:06 PM 4/6/01 -0500, Guido van Rossum didn't write :-)
>> At 02:21 PM 4/7/01 -0700, I wrote:
>> FYI, I have the HappyDoc formatter/docstring extractor
>> (happydoc.sourceforge.net),
>> generating standard Python documentation LaTeX from docstrings. It's kind
>> of nice, because it means that I immediately have all of the python.sty
>> features available
>> to me for crosseferencing etc., plus it immediately gives me my docstring
>> derived documents in PDF, PS, HTML, and info (if I can get info working
>> again).
>
>The only formatters I could find for HappyDoc use StructuredTextClassic,
>or some variant.  And many people (incl. Guido) are not terribly happy
>with ST.  Does the formatter you're talking about use something else?
>What does it do about lists, etc?

The formatter is an extension that I've added to HappyDoc. I'm working with
the author to get the changes back into the distribution; with luck they
may be done RSN (days). I hope they will be adopted into the next version
(the changes are small, and it really just introduces a new hdformatter).

>> def foo():
>> 	r"""
>> 	My \code{foo} function \emph{breaks} the
>> 	\module{bar} module.\index{Foos and Bars}
>> 	"""
>> 	pass
>
>Most people have objected to "heavyweight" markup for docstrings.. 
>i.e., they don't want to have to write docstrings in LaTeX or XML
>or whatever..  It *looks* like you're basically just writing 
>docstrings using some subset of LaTeX? 

Yes it's a subset - I should have made that clear. There is a subset of
LaTeX implicitly defined by the Python \file{Doc/} tools, by virtue of
the constraint that the output be generatable in HTML and info as well.

It's really the subset of LaTeX that is equivalent to TeXinfo,
(give or take some minor naming differences). For the sake of discussion,
let me call this LaTeXinfo. The subset contains all of what you need for
docstring highlighting etc., plus, and in my eyes a big plus,
everything you need for cross-referencing, TOC and indexing of a group of
modules. For the sake of discussion, we'll say it contains nothing else.

Heavyweight is a relative term of course, and I think most users of TeXinfo
feel it's not too heavy. It's a fair balance between light and complete.

> If so, we'd have to carefully
>define *which* subset, and what everything means, etc., before I
>would accept it.  We don't want people assuming that, just because
>they can use \emph{...}, they can use all their other favorite LaTeX
>commands (we do, after all, want it to be possible to convert this
>to HTML, info pages, etc.)

Agreed. The subset is well defined and documented already, and in
widespread use as the current documentation standard for Python.
The current installed base equals the installed base of Python.

>> 1)	It's quite complete for all of the entended uses (\emph(...})
>> 	Because it's more or less TexInfo compatible, most people know it
>> 	or can learn it easily, even if you don't know LaTeX.
>
>But you can't make it *too* "complete," or it won't be a standard that
>people can write tools to process anymore..  We don't want to just
>reimplement LaTeX here...

You're right. I find the subset to be complete enough, especially for
docstrings, and the tools are already written. It has to be small for info.

>> 2)	It means you can cut and paste between docstrings and the formal
>> 	module documentation for Doc/.
>> 3)	The macros/commands are already completely documented, and the 
>> 	documentation for them ships with the core distribution.
>> 4)	It would reinforce the use of the Doc/ tools.
>
>I actually am not too familiar with the Doc/ tools..  Can you give
>me a pointer to them?  

They are with every Python distribution, or take a look at 
\citetitle[http://www.python.org/doc/current/doc/doc.html]{Documenting Python}

In my view, the documentation is one of Python's strengths, and the
benefits of having standardized the documentation early are huge.
But documentation is always a painful task, and I think there are real
benefits to a documentation approach that is scalable, from docstrings
all the way up to the reference documentation.

>But copy/paste does seem useful.  (Although it 
>should at least be possible to write conversion tools, in any case, 
>given a good standard).

It's \emph{really} nice to have the \key{PASTE} key as a conversion
tool. I find myself documenting modules a lot, classes a little, etc.
and by then, a first draft of the reference documentation is already done.

>> 5)	It would reinforce the use of HappyDoc (semi-literate programmming).
>
>Happydoc seems like a nice tool.  Whatever markup language we settle
>on (if we ever do), a HappyDoc formatter will probably be implemented..

I only wrote the LaTeXinfo extention to HappyDoc last week, and already
I'm very Happy \grin.  But the LaTeXinfo version is by far the most advanced: 
having my entire module and class structure documented with indexing, Table
of Contents and cross-references, in HTML, info and PDF is huge.*

>> 7)	It's likely to be mainly backward compatible - I doubt many
>> 	docstrings use \ a lot. On the other hand, I bet a lot of
>> 	them use blank line as a paragraph seperator.
>
>I believe that trying to be "backward compatible" with a markup language
>is an extremely dangerous thing to do, esp. if your markup language is
>relatively "forgiving," because you probably won't *notice* the places
>where it gets confused.  I would rather be explicitly non-backward-
>compatible.

Sorry, what I meant was backward compatible with all the current docstrings
in the existing Python library. It is backward compatible in the sense:

\begin{enumerate}
\item	There are very few occurences of \textbackslash.
\item	A blank line implies a paragraph.
\end{enumerate}

This would not be true if we went to an HTML markup system for example: all
current docstrings in existing code would require the insersion of
\code{} for the blank lines, plus worrying about the more frequently used
\samp{<} and \samp{>} characters.  Small details, but nice.

The whole docstring implementation in this way could be done simply by:
\begin{enumerate}

\item	Define a subset of LaTeXinfo commands that would be admissible in
docstrings.  You could start very small, and add to them in time.

\item	Change the page where these command are documented in the Python
documentation to seperate out the docstring subset on their own page, and
tell people about the \code{r"""markup"""} trick (see below).

\item	Implement the tty parser for docstrings so that they look pretty
at the terminal. For this, it it important to note that there are existing
reference implementations of a tty representation (info in C, info in Emacs),
so presumably you could blindly copy the info representations. That way
you would be compatible with the primordial Python IDE: Emacs.

\end{enumerate}

Note also, that a lot of the mileage of this approach is gained from
the fortutious coincidence that r""" docstring with \LaTeX\ markup""" works
for docstrings too, which makes it easy to use backslashes:

def foo():
	r"""
	My \code{foo} function \emph{breaks} the
	\module{bar} module.\index{Foos and Bars}
	"""
 	pass

Mike.

* \footnote{If people want, I can put a copy of a HappyDoc LaTeXinfo generated
PDF file up on starship for people to browse.}

PS: My apologies if anyone was mislead by the apparent misattribution of my
previous post; it was Edward D. Loper I was quoting.





From guido@digicool.com  Mon Apr  9 01:24:53 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sun, 08 Apr 2001 19:24:53 -0500
Subject: [Doc-SIG] which characters to use for docstring markup
In-Reply-To: Your message of "Sun, 08 Apr 2001 16:06:32 EDT."
 <200104082006.f38K6Wp00128@gradient.cis.upenn.edu>
References: <200104082006.f38K6Wp00128@gradient.cis.upenn.edu>
Message-ID: <200104090024.TAA31945@cj20424-a.reston1.va.home.com>

> Guido:
> >> I think I'd prefer to have to write \> for > than \} for }, so I
> >> *still* prefer C<> over C{}.
> 
> Is this just because you think it looks nicer?  Or for some semblance
> of compatability with perldoc?  Or is there some other reason?

I guess because I feel that dict displays are probably more common
than comparisons -- but I haven't made a study.

The argument could be made, though, that inside C{}, one could allow
unescaped {} to *nest*, and this would make C{{'1': 2, '2': 1}}
unambiguous.  We can't do this for < and >, because they occur
unpaired much more often than } and {.

So I withdraw that objection.

(My other reason was that I thought that C{} would be harder to type
than C<>, but I can't find an objective reason for that.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From edloper@gradient.cis.upenn.edu  Mon Apr  9 00:36:16 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Sun, 08 Apr 2001 19:36:16 EDT
Subject: [Doc-SIG] which characters to use for docstring markup
In-Reply-To: Your message of "Sun, 08 Apr 2001 16:08:18 PDT."
 <3.0.6.32.20010408160818.007c3db0@popd.ix.netcom.com>
Message-ID: <200104082336.f38NaGp20864@gradient.cis.upenn.edu>

> The formatter is an extension that I've added to HappyDoc. I'm working with
> the author to get the changes back into the distribution; with luck they
> may be done RSN (days). I hope they will be adopted into the next version
> (the changes are small, and it really just introduces a new hdformatter).

Could you put it on the web someplace?

Incidentally, I have a question about HappyDoc terminology -- they 
seem to use the word formatter to refer to both what I would call 
a "parser" (convert representation to an interlingua) and an 
"outputter" (convert interlingua to  output representations).  Do 
they do all the translations in one step?  If so, doesn't that make 
it a pain to write outputters for each output format?  Or do they 
just use terminology differently than I expect them to (I would 
think that a "formatter" would be what I would call an "outputter"??)

> Heavyweight is a relative term of course, and I think most users of TeXinfo
> feel it's not too heavy. It's a fair balance between light and complete.

I think that many people on this sig would say that its syntax
for lists is too heavy-weight.  I myself would be ok using XML,
so I'm noot really one of the ones strongly lobbying for lightweight..
but I want somethign that people will accept.  And I think it's much
more likely that people will type:
   - lists like this
than:
\begin{itemize}
  \item lists like this
\end{itemize}

(which is not to say that I'd support any sort of hybrid.. if
you're using the subset of LaTeX supported by Doc, then you should
use just that)

> I only wrote the LaTeXinfo extention to HappyDoc last week, and already
> I'm very Happy \grin.  But the LaTeXinfo version is by far the most advanced: 
> having my entire module and class structure documented with indexing, Table
> of Contents and cross-references, in HTML, info and PDF is huge.*

I assume you use somethign like \label{foo} and \ref{foo} for 
cross-referencing?

> Sorry, what I meant was backward compatible with all the current docstrings
> in the existing Python library. It is backward compatible in the sense:
> 
> \begin{enumerate}
> \item	There are very few occurences of \textbackslash.
> \item	A blank line implies a paragraph.
> \end{enumerate}

Presumably you also have to worry about '{' and '}' because
LaTeX will treat '{hi}' as equivalant to 'hi', etc.

> \item	Implement the tty parser for docstrings so that they look pretty
> at the terminal.

For me, this would be a pretty essential precondition to accepting 
a markup language like the one you propose.

> * \footnote{If people want, I can put a copy of a HappyDoc LaTeXinfo generated
> PDF file up on starship for people to browse.}

I'd like to see what the HTML output looks like too.

-Edward



From dgoodger@atsautomation.com  Mon Apr  9 16:12:32 2001
From: dgoodger@atsautomation.com (Goodger, David)
Date: Mon, 9 Apr 2001 11:12:32 -0400
Subject: [Doc-SIG] backslashing
Message-ID: 

Edward D. Loper wrote:
> Now consider what happens if a newbie user prints out a formatted
> docstring::
> 
>   >>> print somefunc.__doc__
>   Somefunc will start an interactive session.  When you want
>   to exit the session, simply type "\\exit"
>   >>> 
> 
> Now, the user gets confused, and types "\\exit" instead of "\exit".

I assume you meant that the user should type "\exit", and you doubled-up the
backslashes in order to avoid escaping the "e" in "exit". Correct?

In a nutshell:

- No matter what characters are chosen for markup, some day someone will
want to write documentation *about* that markup (hopefully sooner than later
:). Therefore, any complete markup language must have an escaping or
encoding mechanism.

- If we want a lightweight markup system, encoding mechanisms like
SGML/XML's '*' are out. So an escaping mechanism is in. The backslash is
the only viable candidate IMO.

- However, with carefully chosen markup, it should be necessary to use the
escaping mechanism only infrequently.

- As in many systems with escaping, we can define the escape character to
have the "escaping" meaning only for specific characters (the markup
characters themselves). (Example: in Python, len('\t') == 1, len('\T') ==
2.) So '\*' would escape the asterisk (evaluates to '*', but not processed
as markup), but '\e' would be a backslash and an 'e', two characters. No
'\\e' required.

- In extreme cases, or when we want to be absolutely clear, we can use a
literal block instead::

      When you want to exit the session, simply type::

          \exit

> So what's the alternative?  Don't allow markup characters in
> paragraphs, and force docstring writers to put them in literal
> blocks if they want to use them.

Have fun enforcing that! :>

I would change that to: allow markup characters in paragraphs via the
escaping mechanism, but encourage authors to put them in literal blocks
instead.

/DG


From edloper@gradient.cis.upenn.edu  Wed Apr 11 18:06:41 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Wed, 11 Apr 2001 13:06:41 EDT
Subject: [Doc-SIG] lightweight markup: bullets
Message-ID: <200104111706.f3BH6fp26132@gradient.cis.upenn.edu>

So I'm still playing around with developing a lightweight markup
language for docstrings, and wanted to bounce an idea off the
list..

Background
==========

Traditionally, there has been some difficulty in deciding how
to do lightweight lists.  The simplest idea is to do something
like::

  - this is a list item
  - this is another list item
  - This is a multiline
    list item.

Where list items are lines that start with a bullet character.

But then there's an issue of whether that makes it safe to include
dashes (surrounded by spaces) in paragraphs.. because they could get
word-wrapped such that the dash appears at the beginning of the line,
which would then make it into a list item.  So a paragraph containing
the sentence::

    This is a paragraph - it contains a dash.

Might get word-wrapped at some point to::

    This is a paragraph
    - it contains a dash.

The problem also applies to ordered lists: the paragraph::

    Some people like the number 1.  Some don't.

Might get word-wrapped to::

    Some people like the number
    1.  Some don't.

There are some ways to get around this *most* of the time with
indentation (by requiring that lists be indented), but they don't work
all of the time.  For example, with a paragraph like::

    return: Some real number that's greater than
        1. The number should also be less than
        2. ...

it doesn't help, because there's no way to tell whether that's a list
item or part of the first paragraph..

My Question
===========

So the approach that I wanted to get peoples' opinion on is using
bullets that look like::

    <-> This is an unordered list item.
    <1> This is an ordered list item.
     This is a description list item.

Here, I'm assuming that we're already using C<...> etc. to delimit
colored regions, so <...> without a letter before it should never
appear in a paragraph.  (Alternatively, replace '<' with '{' and '>'
with '}'.  I've been going back and forth on which one I like better.)

The advantages of this approach are:
    - it's consistant between list types
    - it's very easy to detect (for coloring, etc.)
    - it's safe with respect to word-wrapping paragraphs
    - it easily allows for a wide variety of bullets (e.g., 
      for description list items)

The main disadvantage that I see is:
    - It's uglier than just using '-' or '1.'.

Is it too ugly?  Do you see any other problems with it?  Do you have
any better ideas?

-Edward


From edloper@gradient.cis.upenn.edu  Wed Apr 11 18:16:36 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Wed, 11 Apr 2001 13:16:36 EDT
Subject: [Doc-SIG] which characters to use for docstring markup
In-Reply-To: Your message of "Sun, 08 Apr 2001 16:08:18 PDT."
 <3.0.6.32.20010408160818.007c3db0@popd.ix.netcom.com>
Message-ID: <200104111716.f3BHGap27199@gradient.cis.upenn.edu>

Mike said:
> The formatter is an extension that I've added to HappyDoc. I'm
> working with the author to get the changes back into the
> distribution; with luck they may be done RSN (days). I hope they
> will be adopted into the next version (the changes are small, and it
> really just introduces a new hdformatter).

Does it have any provisions for specifying descriptions of parameters,
etc?  Like "@param" and "@return" etc. in javadoc?

> Heavyweight is a relative term of course, and I think most users of
> TeXinfo feel it's not too heavy. It's a fair balance between light
> and complete.

It's more heavyweight than what I was aiming for, but that's not to
say that it's too heavyweight.  I was just reacting to what I
percieved as the desire of most people..

> It's \emph{really} nice to have the \key{PASTE} key as a conversion
> tool. I find myself documenting modules a lot, classes a little,
> etc.  and by then, a first draft of the reference documentation is
> already done.

It still seems to me like this won't *quite* work with your system,
since you'll allow comments that include unbackslashed &'s and
\\s and whatever other characters LaTeX treats funnily..  But it's
certainly much closer than the markup languages we've been talking 
about on docsig.

> I only wrote the LaTeXinfo extention to HappyDoc last week, and
> already I'm very Happy \grin.  But the LaTeXinfo version is by far
> the most advanced: having my entire module and class structure
> documented with indexing, Table of Contents and cross-references, in
> HTML, info and PDF is huge.*

It seems like a lot of the indexing and table-of-contents stuff is
really a tool issue, not a markup language issue..  How much explicit
info do you put in?  The markup language I've been thinking about is
mainly intended for API-level documentation, which is arguably
different from reference docs..  (I'm not one of the people who says
that the reference docs should be included inline).  I wouldn't
imagine most docstrings using sections, etc.

-Edward


From edloper@gradient.cis.upenn.edu  Wed Apr 11 18:22:21 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Wed, 11 Apr 2001 13:22:21 EDT
Subject: [Doc-SIG] which characters to use for docstring markup
In-Reply-To: Your message of "Sun, 08 Apr 2001 19:24:53 CDT."
 <200104090024.TAA31945@cj20424-a.reston1.va.home.com>
Message-ID: <200104111722.f3BHMLp27608@gradient.cis.upenn.edu>

> I guess because I feel that dict displays are probably more common
> than comparisons -- but I haven't made a study.

The fact that only 13 '{'s appear in the reference docs I searched
suggests otherwise.. ;) But really, the question isn't how often dicts
are included, but how often they're included inline.  I would *think*
that dictionary displays tend not to be inline.. But I don't trust my
intuitions on such things, though..

> The argument could be made, though, that inside C{}, one could allow
> unescaped {} to *nest*, and this would make C{{'1': 2, '2': 1}}
> unambiguous.  We can't do this for < and >, because they occur
> unpaired much more often than } and {.

We'd want to be careful with this, but that could work.

> (My other reason was that I thought that C{} would be harder to type
> than C<>, but I can't find an objective reason for that.)

Which is something I hadn't considered, but is actually a fairly good
reason..  It *is* easier to type C<> than to type C{} (for me anyway),
since '<' and '>' are closer to the home row than '{' and '}'.. :)

I think I'm still leaning towards '{}', but I'm not strongly set on
it..  For now, I'll just remain uncommitted, until I've finished
implementing something, and we can play with them and see which one
looks/feels better.

-Edward


From edloper@gradient.cis.upenn.edu  Wed Apr 11 18:49:43 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Wed, 11 Apr 2001 13:49:43 EDT
Subject: [Doc-SIG] backslashing
In-Reply-To: Your message of "Mon, 09 Apr 2001 11:12:32 EDT."
 
Message-ID: <200104111749.f3BHnhp29835@gradient.cis.upenn.edu>

> - As in many systems with escaping, we can define the escape
> character to have the "escaping" meaning only for specific
> characters (the markup characters themselves). (Example: in Python,
> len('\t') == 1, len('\T') == 2.) So '\*' would escape the asterisk
> (evaluates to '*', but not processed as markup), but '\e' would be a
> backslash and an 'e', two characters. No '\\e' required.

I can see this getting rediculously complicated if we're talking
about any regexps inline..  And regexps are hard enough to read 
anyway.. :)  

  >>> print my_confusing_docstring
  ...
  ...defaults to the regexp "\s\*", which will match zero or
  more spaces...

:)  Of course, I'm currently planning to use E{emph} instead of
*emph*, but you ge the idea.. :)

Hm..  So in my current markup language, there are two coloring
characters ('{' and '}' or '<' and '>') and the following
structuring characters:
  - '-': a bullet, when it occurs at the start of a line
  - '([0-9]+.)+': a bullet, when at the start of a line
  - '::': introduces a literal lock, when at the end of a para
  - '=': used for underlining headings
  - '-': used for underlining headings
  - '~': used for underlining headings

If we're doing escaping, then clearly we need to be able to escape '{'
and '}'.  We might be able to get away with not escaping any of the
structuring characters by saying that when they appear within a
colored region, they don't count.  So, for example, in::

  Find the value of C{x
  - y}.

The second line wouldn't be a list item because it's in a colored
region..  We might also need a new "null" coloring that could be used
in examples like::

  This is a sentence that ends in the number
  N{1.}  

Is this better or worse than::

  This is a sentence that ends in the number
  \1.

Of course, if we require bullets to be in a special colored region,
then we don't have to worry abou them..  And we don't have to 
worry about '::', since it's only interpreted when it comes at the
end of a paragraph (not at the end of a line)...  And presumably 
people will never write::

  x = y

as::

  x
  =
  y

(which would be read as a heading "x" followed by a paragraph
containing "y").  In that case, we could say that the only characters
that you can backslash are '\{' and '\}'.  So then I might feel better
about saying that '\' is interpreted as a literal backslash except
before '\', '{', or '}'.. Although I would still be worried that
people would get confused with regexps::

  >>> print another_confusing_docstring
  ...
  The regexp r"\\." matches a literal period.

-Edward


From tim.one@home.com  Wed Apr 11 19:45:59 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 11 Apr 2001 14:45:59 -0400
Subject: [Doc-SIG] lightweight markup: bullets
In-Reply-To: <200104111706.f3BH6fp26132@gradient.cis.upenn.edu>
Message-ID: 

[Edward D. Loper]
> ...
> So the approach that I wanted to get peoples' opinion on is using
> bullets that look like::
>
>     <-> This is an unordered list item.
>     <1> This is an ordered list item.
>      This is a description list item.
>
> Here, I'm assuming that we're already using C<...> etc. to delimit
> colored regions, ...
> ...
> Is it too ugly?  Do you see any other problems with it?  Do you have
> any better ideas?

If we're reserving X<...> notation, let's use it uniformly:

     L<-> This is an unordered list item.
     L<1> This is an ordered list item.
     L This is a description list item.
     L> This is a descriptive item with embedded code in
     the description.

> ...
> The main disadvantage that I see is:
>     - It's uglier than just using '-' or '1.'.

Ya, and I'm uglier than my sisters, but that's no argument for letting them
write your docstrings .



From edloper@gradient.cis.upenn.edu  Wed Apr 11 20:33:59 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Wed, 11 Apr 2001 15:33:59 EDT
Subject: [Doc-SIG] lightweight markup: bullets
In-Reply-To: Your message of "Wed, 11 Apr 2001 14:45:59 EDT."
 
Message-ID: <200104111933.f3BJXxp10083@gradient.cis.upenn.edu>

> If we're reserving X<...> notation, let's use it uniformly:
> 
>      L<-> This is an unordered list item.
>      L<1> This is an ordered list item.
>      L This is a description list item.
>      L> This is a descriptive item with embedded code in
>      the description.

Perhaps.  The reason that I didn't do that is that the use of <...> or
L<...> for bullets is really very different from the use of X<...> for
coloring.  X<...> coloring is something that happens within a
paragraph..  L<...> is a structuring primitive..  For example, you
can't say::

   This makes L sense.

But you can say::

   This I make sense.

Of course, if we decided to use '{' and '}' instead of '<' and '>',
and used 'L{...}' instead of '{...}', then we could say that '{...}'
when not preceeded by a capitalized letter will have the '{' and '}'
rendered as braces (c.f., Guido's suggestion to allow things like
'C{x={1:2, 3:4}}'..

> > ...
> > The main disadvantage that I see is:
> >     - It's uglier than just using '-' or '1.'.
> 
> Ya, and I'm uglier than my sisters, but that's no argument for
> letting them write your docstrings .

The main reason for not just using something established like XML or
LaTeX is that they're too complex/ugly.  There's no point in having a
new markup language if it's also complex/ugly.. :) So I'd like to keep
this markup language as simple and clean as possible.  

-Edward


From tim.one@home.com  Wed Apr 11 21:10:45 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 11 Apr 2001 16:10:45 -0400
Subject: [Doc-SIG] lightweight markup: bullets
In-Reply-To: <200104111933.f3BJXxp10083@gradient.cis.upenn.edu>
Message-ID: 

[Edward D. Loper, on L<...> for list items]
> Perhaps.  The reason that I didn't do that is that the use of <...> or
> L<...> for bullets is really very different from the use of X<...> for
> coloring.

It's barely different at all to me:  it's markup, as opposed to not markup,
and that's the *primary* distinction that needs to be learned.  You
overburden my biological pattern-recognition engine if I have to learn N
different lexical conventions for N different categories of markup.

> X<...> coloring is something that happens within a paragraph.
> L<...> is a structuring primitive..  For example, you can't say::
>
>    This makes L sense.
>
> But you can say::
>
>    This I make sense.

So L has to appear at the start of a line.  Fine:  additional
constraints on specific X<...> thingies are easy to live with.

> ...
> The main reason for not just using something established like XML or
> LaTeX is that they're too complex/ugly.  There's no point in having a
> new markup language if it's also complex/ugly.. :) So I'd like to keep
> this markup language as simple and clean as possible.

I've got no particular use for list markup at all, but since people will
insist that it's necessary, it's simpler and cleaner to reuse one lexical
gimmick whenever it suffices to get the job done.  WRT beauty,

   <1>

and

   L<1>

are *both* "ugly", but at least the latter is ugly in exactly the same way
that

   E

is uglier than

   *beautiful*

Occam's Beautifier suggests that new varieties of ugliness not be multiplied
beyond necessity .



From dgoodger@atsautomation.com  Wed Apr 11 21:29:27 2001
From: dgoodger@atsautomation.com (Goodger, David)
Date: Wed, 11 Apr 2001 16:29:27 -0400
Subject: [Doc-SIG] backslashing
Message-ID: 

> I can see this getting rediculously complicated if we're talking
> about any regexps inline..  And regexps are hard enough to read 
> anyway.. :)  

Any inline regexps will be complicated, no matter what escaping mechanism
you choose. It's the nature of the beast!

> Although I would still be worried that
> people would get confused with regexps::
> 
>   >>> print another_confusing_docstring
>   ...
>   The regexp r"\\." matches a literal period.

Inline literals would be better::

    The regexp `r"\."` matches a literal period.

Or a literal block::

    The regexp::
        r"\."
    matches a literal period.

Two ways to look at backslash escapes (i.e., a way to selectively suppress
markup recognition): as an occasional tool, or as a horrible wart. You seem
to be looking at it as a wart: how ugly can it get? Very ugly, indeed. Try
looking at it the other way: use it only when necessary, which should be
quite infrequently.

> So, for example, in::
> 
>   Find the value of C{x
>   - y}.

I haven't spoken up about this, but: ugh! Somewhat dismayed to see the X<>
type of construct being taken seriously. It works fine for POD: long live
POD! Want to use it in docstrings? Implement a POD parser for HappyDoc or
pydoc. In my estimation, readability is the most important criterion; X<>
fails miserably.

> The second line wouldn't be a list item because it's in a colored
> region..  We might also need a new "null" coloring that could be used
> in examples like::
> 
>   This is a sentence that ends in the number
>   N{1.}  

(Let's not get ridiculous here.)

> Is this better or worse than::
> 
>   This is a sentence that ends in the number
>   \1.

Recently, I've come to the conclusion that requiring a blank line before the
start of a list is reasonable and correct, even if we don't require blank
lines between items. Minimizing ambiguity trumps minimizing vertical space.

/DG


From edloper@gradient.cis.upenn.edu  Thu Apr 12 00:04:26 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Wed, 11 Apr 2001 19:04:26 EDT
Subject: [Doc-SIG] backslashing
In-Reply-To: Your message of "Wed, 11 Apr 2001 16:29:27 EDT."
 
Message-ID: <200104112304.f3BN4Rp02123@gradient.cis.upenn.edu>

> Any inline regexps will be complicated, no matter what escaping
> mechanism you choose. It's the nature of the beast!

If you used E and E or E{lb} and E{rb} or something like that,
then regexps would generally look how they're supposed to (at least
when you print them).

> Inline literals would be better::
> 
>     The regexp `r"\."` matches a literal period.

But then we have to say that inline literals can't ever contain "'"..
which in my mind is no better than saying that you can't backslash
'{' and '}'.

> I haven't spoken up about this, but: ugh! Somewhat dismayed to see
> the X<> type of construct being taken seriously. It works fine for
> POD: long live POD! Want to use it in docstrings? Implement a POD
> parser for HappyDoc or pydoc. In my estimation, readability is the
> most important criterion; X<> fails miserably.

I asked about this before, and didn't get any negative feedback.
Basically, I'd be happy to not use X<...> markup (or X{..} markup) if
we restrict ourselves to only using:
  - either 'literal' or `literal`
  - maybe *single* *word* *emph*
  - no nesting

Guido has objected to `literal` (which doesn't mean we can't do it, of
course).  I think that if the reason we're rejecting X{} or X<> is
because it's "not readable," then there's no reason to accept #code#,
which to me is signifigantly less intuitive than C{code}.

> Recently, I've come to the conclusion that requiring a blank line
> before the start of a list is reasonable and correct, even if we
> don't require blank lines between items. Minimizing ambiguity trumps
> minimizing vertical space.

That would make things easier.  But we would also have to require that
sublists are surrounded by blank lines.  So instead of::

   text

     - item1
       - subitem 1.1
       - subitem 1.2
     - item2
     - item3

   text

We would have::

   text

     - item1

       - subitem 1.1
       - subitem 1.2

     - item2
     - item3

   text

Any objections to that?  The way my markup language currently works,
we don't have to worry about how to detect when a new list item
starts, because list item contents are required to be indented::

  - this is a valid list
    item.

  - This is not a valid
  list item.

-Edward


From fdrake@beowolf.digicool.com  Thu Apr 12 05:39:34 2001
From: fdrake@beowolf.digicool.com (Fred Drake)
Date: Thu, 12 Apr 2001 00:39:34 -0400 (EDT)
Subject: [Doc-SIG] [development doc updates]
Message-ID: <20010412043934.B61E12879C@beowolf.digicool.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/devel-docs/


Almost to Python 2.1 release candidate 1 status.  This includes a variety
of small updates and a good bit more documentation on the PyUnit version
that will be included with the final release (new text essentially 
converted from Steve Purcell's HTML docs).



From ping@lfw.org  Fri Apr 13 06:56:52 2001
From: ping@lfw.org (Ka-Ping Yee)
Date: Thu, 12 Apr 2001 22:56:52 -0700 (PDT)
Subject: [Doc-SIG] Re: [Python-Dev] [development doc updates]
In-Reply-To: <20010412043934.B61E12879C@beowolf.digicool.com>
Message-ID: 

On Thu, 12 Apr 2001, Fred Drake wrote:
> The development version of the documentation has been updated:
> 
> 	http://python.sourceforge.net/devel-docs/

I had a browse through the reference manual.

This file is empty:

    http://python.sourceforge.net/devel-docs/ref/unicode.html

This file says "scopes do not nest" and doesn't mention the
availability of nested scopes via __future__:

    http://python.sourceforge.net/devel-docs/ref/execframes.html

This file still has a big XXX in it:

    http://python.sourceforge.net/devel-docs/ref/import.html

Do you already have updates for these?  I may be able to offer a
little help, but i'm stretched pretty thin at the moment...


-- ?!ng



From ping@lfw.org  Fri Apr 13 12:41:50 2001
From: ping@lfw.org (Ka-Ping Yee)
Date: Fri, 13 Apr 2001 04:41:50 -0700 (PDT)
Subject: [Doc-SIG] Doc nit: << and >>
Message-ID: 

I just noticed on the page

    http://python.sourceforge.net/devel-docs/ref/summary.html

that the shifting operators are not shown as << and >>, but
as the double-angle-quote characters ("guillemets"?) that the
French are fond of.  This should probably get fixed, though
i'm not enough of a TeX expert to know how (will a couple of
backslashes do the trick?).


-- ?!ng



From ping@lfw.org  Fri Apr 13 14:25:56 2001
From: ping@lfw.org (Ka-Ping Yee)
Date: Fri, 13 Apr 2001 06:25:56 -0700 (PDT)
Subject: [Doc-SIG] pydoc.py: new help feature
Message-ID: 

As well as fixes to pydoc which i have just checked in (the
module actually got *smaller* this time!) i have also spent a
good portion of the evening rewriting the Helper class in
pydoc to try to do a better job of providing help.

I haven't yet committed a version containing this new feature
(though i was quite tempted!) as it would be irresponsible of
me to check in such a big change the day before a deadline
without at least asking first.

I know my timing is really terrible, but i would like you to
have a look at it.  I think the help utility is in fairly good
shape.  Please try it out if you have time, and consider the
possibility of allowing it into 2.1.  The fancy version is at

    http://www.lfw.org/python/pydoc.py

Aside from a big chunk of new code for the Helper class, it is
otherwise the same as the version currently in CVS.

*** In particular, i'm also looking for any suggestions about
how to make Helper.__init__ more robust at finding the docs. ***

If you really like it, we could even think about adding a few
lines to site.py:

    class Helper:
        def __repr__(self):
            import pydoc
            pydoc.help()
            return ''
        def __call__(self, *args):
            import pydoc
            pydoc.help(*args)
    __builtin__.help = Helper()

I know that's pushing it, but hey, i thought it wouldn't hurt
to ask. :)


-- ?!ng


Here follows a transcript of a session, to show you some examples.
I've inserted marks like this: ##1## to refer to commentary at the bottom.


skuld[1052]% python
Python 2.1b2 (#29, Apr 10 2001, 04:59:40) 
[GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> from pydoc import help
>>> help

Welcome to Python 2.1!  This is the online help utility.

If this is your first time using Python, you should definitely check out
the tutorial on the Internet at http://www.python.org/doc/tut/.

Enter the name of any module, keyword, or topic to get help on writing
Python programs and using Python modules.  To quit this help utility and
return to the interpreter, just type "quit".

To get a list of available modules, keywords, or topics, type "modules",
"keywords", or "topics".  Each module also comes with a one-line summary
of what it does; to list the modules whose summaries contain a given word
such as "spam", type "modules spam".

help> topics                                                ##1##

Here is a list of available topics.  Enter any topic name to get more help.

ASSERTION           DELETION            LOOPING             SEQUENCEMETHODS2
ASSIGNMENT          DICTIONARIES        MAPPINGMETHODS      SEQUENCES
ATTRIBUTEMETHODS    DICTIONARYLITERALS  MAPPINGS            SHIFTING
ATTRIBUTES          ELLIPSIS            METHODS             SLICINGS
AUGMENTEDASSIGNMENT EXCEPTIONS          MODULES             SPECIALATTRIBUTES
BACKQUOTES          EXECUTION           NAMESPACES          SPECIALIDENTIFIERS
BASICMETHODS        EXPRESSIONS         NONE                SPECIALMETHODS
BINARY              FILES               NUMBERMETHODS       STRINGMETHODS
BITWISE             FLOAT               NUMBERS             STRINGS
BOOLEAN             FORMATTING          OBJECTS             SUBSCRIPTS
CALLABLEMETHODS     FRAMEOBJECTS        OPERATORS           TRACEBACKS
CALLS               FRAMES              PACKAGES            TRUTHVALUE
CLASSES             FUNCTIONS           POWER               TUPLELITERALS
CODEOBJECTS         IDENTIFIERS         PRECEDENCE          TUPLES
COERCIONS           IMPORTING           PRINTING            TYPEOBJECTS
COMPARISON          INTEGER             PRIVATENAMES        TYPES
COMPLEX             LISTLITERALS        RETURNING           UNARY
CONDITIONAL         LISTS               SCOPING             UNICODE
CONVERSIONS         LITERALS            SEQUENCEMETHODS1    

help> LISTS                                                 ##2##
  2.1.5.4 Mutable Sequence Types
  
  List objects support additional operations that allow in-place
  modification of the object. These operations would be supported by other
  mutable sequence types (when added to the language) as well. Strings and
  tuples are immutable sequence types and such objects cannot be modified
  once created. The following operations are defined on mutable sequence
  types (where x is an arbitrary object):
  
        Operation        Result          Notes 
        s[i] = x         item i of s is replaced by x     
        s[i:j] = t       slice of s from i to j is replaced by t          
        del s[i:j]       same as s[i:j] = []      
        s.append(x)      same as s[len(s):len(s)] = [x]  (1)
        s.extend(x)      same as s[len(s):len(s)] = x    (2)
        s.count(x)       return number of i's for which s[i] == x         
        s.index(x)       return smallest i such that s[i] == x   (3)
        s.insert(i, x)   same as s[i:i] = [x] if i >= 0   
        s.pop([i])       same as x = s[i]; del s[i]; return x    (4)
        s.remove(x)      same as del s[s.index(x)]       (3)
        s.reverse()      reverses the items of s in place        (5)
        s.sort([cmpfunc])        sort the items of s in place    (5), (6)
  
  Notes:
  
  (1)
  The C implementation of Python has historically accepted multiple
  parameters and implicitly joined them into a tuple; this no longer works
  in Python 2.0. Use of this misfeature has been deprecated since Python
  1.4.
  
  (2)
  Raises an exception when x is not a list object. The extend() method is
  experimental and not supported by mutable sequence types other than
  lists.
  
  (3)
  Raises ValueError when x is not found in s.
  
  (4)
  The pop() method is only supported by the list and array types. The
  optional argument i defaults to -1, so that by default the last item is
  removed and returned.
  
  (5)
  The sort() and reverse() methods modify the list in place for economy of
  space when sorting or reversing a large list. They don't return the
  sorted or reversed list to remind you of this side effect.
  
  (6)
  The sort() method takes an optional argument specifying a comparison
  function of two arguments (list items) which should return -1, 0 or 1
  depending on whether the first argument is considered smaller than,
  equal to, or larger than the second argument. Note that this slows the
  sorting process down considerably; e.g. to sort a list in reverse order
  it is much faster to use calls to the methods sort() and reverse() than
  to use the built-in function sort() with a comparison function that
  reverses the ordering of the elements.

Related help topics: LISTLITERALS

help> LISTLITERALS                                          ##3##
  5.2.4 List displays
  
  A list display is a possibly empty series of expressions enclosed in
  square brackets:
  
  
  list_display:   "[" [listmaker] "]"
  listmaker:   expression ( list_for | ( "," expression)* [","] )
  list_iter:   list_for | list_if
  list_for:    "for" expression_list "in" testlist [list_iter]
  list_if:     "if" test [list_iter]
  
  A list display yields a new list object. Its contents are specified by
  providing either a list of expressions or a list comprehension. When a
  comma-separated list of expressions is supplied, its elements are
  evaluated from left to right and placed into the list object in that
  order. When a list comprehension is supplied, it consists of a single
  expression followed by at least one for clause and zero or more for or
  if clauses. In this case, the elements of the new list are those that
  would be produced by considering each of the for or if clauses a block,
  nesting from left to right, and evaluating the expression to produce a
  list element each time the innermost block is reached.

Related help topics: LITERALS

help> LITERALS                                              ##4##
  5.2.2 Literals
  
  Python supports string literals and various numeric literals:
  
  
  literal: stringliteral | integer | longinteger | floatnumber | imagnumber
  
  Evaluation of a literal yields an object of the given type (string,
  integer, long integer, floating point number, complex number) with the
  given value. The value may be approximated in the case of floating point
  and imaginary (complex) literals. See section 2.4[1] for details.
  
  All literals correspond to immutable data types, and hence the object's
  identity is less important than its value. Multiple evaluations of
  literals with the same value (either the same occurrence in the program
  text or a different occurrence) may obtain the same object or a
  different object with the same value.

Related help topics: STRINGS BACKQUOTES NUMBERS TUPLELITERALS LISTLITERALS DICTIONARYLITERALS

help> NUMBERS
  2.4.3 Numeric literals
  
  There are four types of numeric literals: plain integers, long integers,
  floating point numbers, and imaginary numbers. There are no complex
  literals (complex numbers can be formed by adding a real number and an
  imaginary number).
  
  Note that numeric literals do not include a sign; a phrase like -1 is
  actually an expression composed of the unary operator `-' and the
  literal 1.

Related help topics: INTEGER FLOAT COMPLEX TYPES

help> keywords                                              ##5##

Here is a list of the Python keywords.  Enter any keyword to get more help.

and                 elif                global              or
assert              else                if                  pass
break               except              import              print
class               exec                in                  raise
continue            finally             is                  return
def                 for                 lambda              try
del                 from                not                 while

help> if                                                    ##6##
  7.1 The if statement
  
  The if statement is used for conditional execution:
  
  
  if_stmt:        "if" expression ":" suite
                 ("elif" expression ":" suite)*
                 ["else" ":" suite]
  
  It selects exactly one of the suites by evaluating the expressions one
  by one until one is found to be true (see section 5.10[1] for the
  definition of true and false); then that suite is executed (and no other
  part of the if statement is executed or evaluated). If all expressions
  are false, the suite of the else clause, if present, is executed.

Related help topics: TRUTHVALUE

help> TRUTHVALUE
  2.1.1 Truth Value Testing
  
  Any object can be tested for truth value, for use in an if or while
  condition or as operand of the Boolean operations below. The following
  values are considered false:
  
  None
  
  zero of any numeric type, for example, 0, 0L, 0.0, 0j.
  
  any empty sequence, for example, '', (), [].
  
  any empty mapping, for example, {}.
  
  instances of user-defined classes, if the class defines a __nonzero__()
  or __len__() method, when that method returns zero.2.2[1]
  
  All other values are considered true -- so objects of many types are
  always true.
  
  Operations and built-in functions that have a Boolean result always
  return 0 for false and 1 for true, unless otherwise stated. (Important
  exception: the Boolean operations "or" and "and" always return one of
  their operands.)
  
  
  ------------------------------------------------------------------------
  
  Footnotes
  
  ... zero.2.2[2]
  Additional information on these special methods may be found in the
  Python Reference Manual[3].

Related help topics: if while and or not BASICMETHODS

help> continue
  6.10 The continue statement
  
  
  continue_stmt:  "continue"
  
  continue may only occur syntactically nested in a for or while loop, but
  not nested in a function or class definition or try statement within
  that loop.6.1[1]It continues with the next cycle of the nearest
  enclosing loop.
  
  
  ------------------------------------------------------------------------
  
  Footnotes
  
  ... loop.6.1[2]
  It may occur within an except or else clause. The restriction on
  occurring in the try clause is implementor's laziness and will
  eventually be lifted.

Related help topics: while for

help> while
  7.2 The while statement
  
  The while statement is used for repeated execution as long as an
  expression is true:
  
  
  while_stmt:     "while" expression ":" suite
                 ["else" ":" suite]
  
  This repeatedly tests the expression and, if it is true, executes the
  first suite; if the expression is false (which may be the first time it
  is tested) the suite of the else clause, if present, is executed and the
  loop terminates.
  
  A break statement executed in the first suite terminates the loop
  without executing the else clause's suite. A continue statement executed
  in the first suite skips the rest of the suite and goes back to testing
  the expression.

Related help topics: break continue if TRUTHVALUE

help> modules color                                         ##7##

Here is a list of matching modules.  Enter any module name to get more help.

colorsys - Conversion functions between RGB and other color systems.
tkColorChooser 

help> modules mail

Here is a list of matching modules.  Enter any module name to get more help.

mailbox - Classes to handle Unix style, MMDF style, and MH style mailboxes.
mailcap - Mailcap file handling.  See RFC 1524.
mimify - Mimification and unmimification of mail messages.
test.test_mailbox 

help> colorsys                                              ##8##
Help on module colorsys:

NAME
    colorsys - Conversion functions between RGB and other color systems.

FILE
    /home/ping/dev/python/dist/src/Lib/colorsys.py

DESCRIPTION
    This modules provides two functions for each color system ABC:
    
      rgb_to_abc(r, g, b) --> a, b, c
      abc_to_rgb(a, b, c) --> r, g, b
    
    All inputs and outputs are triples of floats in the range [0.0...1.0].
    Inputs outside this range may cause exceptions or invalid outputs.
    
    Supported color systems:
    RGB: Red, Green, Blue components
    YIQ: used by composite video signals
    HLS: Hue, Luminance, Saturation
    HSV: Hue, Saturation, Value

CONSTANTS
    ONE_SIXTH = 0.16666666666666666
    ONE_THIRD = 0.33333333333333331
    TWO_THIRD = 0.66666666666666663
    __all__ = ['rgb_to_yiq', 'yiq_to_rgb', 'rgb_to_hls', 'hls_to_rgb', 'rg...
    __doc__ = 'Conversion functions between RGB and other color...uminance...
    __file__ = '/home/ping/dev/python/dist/src/Lib/colorsys.pyc'
    __name__ = 'colorsys'


help> modules                                               ##9##

Please wait a moment while I gather a list of all available modules...

BaseHTTPServer      delegate            multifile           sndhdr
Bastion             difflib             mutex               socket
CDROM               dircache            mytok (package)     spam
CGIHTTPServer       dirctest            neelk (package)     sps
Canvas              dis                 netrc               sre
ConfigParser        distutils (package) new                 sre_compile
Cookie              dl                  nis                 sre_constants
Dialog              doctest             nntplib             sre_parse
FCNTL               dospath             ntpath              stat
FileDialog          dumbdbm             nturl2path          statcache
FixTk               echatui             oldgnut             statvfs
IN                  eggs                operator            string
MimeWriter          encodings (package) os                  stringold
Queue               errno               parser              strop
ScrolledText        exceptions          pcre                struct
SimpleDialog        fcntl               pdb                 sunau
SimpleHTTPServer    festival            pickle              sunaudio
SocketServer        filecmp             pipes               symbol
StringIO            fileinput           popen2              symtable
TERMIOS             findcode            poplib              sys
Tix                 fnmatch             posix               syslog
Tkconstants         foo                 posixfile           tabnanny
Tkdnd               foo (package)       posixpath           telnetlib
Tkinter             formatter           ppm                 tempfile
UserDict            fpectl              pprint              termios
UserList            fpformat            pre                 test (package)
UserString          ftplib              profile             testalias
__builtin__         gc                  pstats              tester
__future__          gdbm                pty                 tester15
_codecs             getopt              pwd                 tester1c
_curses             getpass             py_compile          tester2
_curses_panel       gettext             pyclbr              thespark
_locale             glob                pydoc               thread
_socket             gnut                pydoc-ell           threading
_sre                gnutellalib         pydoc-findmod       time
_symtable           gopherlib           pydoc-help          timing
_testcapi           grp                 pydoc-nohelp        tkColorChooser
_tkinter            gviz                pyhints             tkCommonDialog
_weakref            gzip                pyscan              tkFileDialog
aifc                html                quopri              tkFont
alesis-old          htmlentitydefs      random              tkMessageBox
anydbm              htmllib             re                  tkSimpleDialog
array               http                readline            toaiff
asynchat            httpcli             reconvert           token
asyncore            httplib             regex               tokenize
atexit              icecream            regex_syntax        traceback
audiodev            ihooks              regsub              tree
audioop             imageop             repr                tty
base64              imaplib             resource            turtle
bdb                 imghdr              reverb              types
binascii            imgsize             rexec               tzparse
binhex              imp                 rfc822              unicodedata
bisect              imputil             rgbimg              unittest
blech               inspect             rlcompleter         urllib
bsddb               inspect-cvs         robotparser         urllib2
cPickle             inspect-ping        romandate           urlparse
cStringIO           keyword             rotor               user
calendar            knee                rxb                 uu
cgen                linecache           rxb14               vchtml
cgi                 linuxaudiodev       rxb15               warnings
chunk               locale              sched               watcher
cmath               logo                scopetest           watcher1
cmd                 macpath             scraper             wave
code                macurl2path         search              weakref
codecs              mailbox             select              webbot
codeop              mailcap             sequencer           webbrowser
collab              makesums            sequencer-badread   webbrowser-mine
collab2             marshal             sequencer-types     whichdb
colorsys            marshali            sgmllib             whrandom
commands            math                sha                 wiki
compileall          md5                 shelve              worker
copy                memtest             shlex               xdrlib
copy_reg            mhlib               shutil              xml (package)
coredump            mimetools           signal              xmllib
crypt               mimetypes           site                xreadlines
curses (package)    mimify              slk                 zipfile
dbhash              mmap                smtpd               zlib
dbm                 mpz                 smtplib             

Enter any module name to get more help.  Or, type "modules spam" to search
for modules whose descriptions contain the word "spam".

help> md5
Help on module md5:

NAME
    md5

FILE
    /home/ping/dev/python/dist/src/build/lib.linux-i686-2.1/md5.so

DESCRIPTION
    This module implements the interface to RSA's MD5 message digest
    algorithm (see also Internet RFC 1321). Its use is quite
    straightforward: use the new() to create an md5 object. You can now
    feed this object with arbitrary strings using the update() method, and
    at any point you can ask it for the digest (a strong kind of 128-bit
    checksum, a.k.a. ``fingerprint'') of the concatenation of the strings
    fed to it so far using the digest() method.
    
    Functions:
    
    new([arg]) -- return a new md5 object, initialized with arg if provided
    md5([arg]) -- DEPRECATED, same as new, but for compatibility
    
    Special Objects:
    
    MD5Type -- type object for md5 objects

FUNCTIONS
    md5(...)
        new([arg]) -> md5 object
        
        Return a new md5 object. If arg is present, the method call update(arg)
        is made.
    
    new(...)
        new([arg]) -> md5 object
        
        Return a new md5 object. If arg is present, the method call update(arg)
        is made.

CONSTANTS
    MD5Type = 
    __doc__ = "This module implements the interface to RSA's MD...Objects:...
    __file__ = '/home/ping/dev/python/dist/src/build/lib.linux-i686-2.1/md...
    __name__ = 'md5'

help> help                                                  ##10##

Welcome to Python 2.1!  This is the online help utility.

If this is your first time using Python, you should definitely check out
the tutorial on the Internet at http://www.python.org/doc/tut/.

Enter the name of any module, keyword, or topic to get help on writing
Python programs and using Python modules.  To quit this help utility and
return to the interpreter, just type "quit".

To get a list of available modules, keywords, or topics, type "modules",
"keywords", or "topics".  Each module also comes with a one-line summary
of what it does; to list the modules whose summaries contain a given word
such as "spam", type "modules spam".

help> abs                                                   ##11##
Help on built-in function abs:

abs(...)
    abs(number) -> number
    
    Return the absolute value of the argument.

help> sys.getrefcount                                       ##12##
Help on built-in function getrefcount in sys:

getrefcount(...)
    getrefcount(object) -> integer
    
    Return the current reference count for the object.  This includes the
    temporary reference in the argument list, so it is at least 2.

help> asdfadsf                                              ##13##
no Python documentation found for 'asdfadsf'

help> quit                                                  ##14##

You're now leaving help and returning to the Python interpreter.
If you want to ask for help on a particular object directly from the
interpreter, you can type "help(object)".  Executing "help('string')"
has the same effect as typing a particular string at the help> prompt.

>>> help(3)                                                 ##15##
Help on int:

3

>>> help([])
Help on list:

[]

>>> help([].append)                                         ##16##
Help on built-in function append:

append(...)
    L.append(object) -- append object to end

>>> import sys
>>> help(sys.path)                                          ##17##
Help on list:

['', '/home/ping/python', '/home/ping/dev/python/dist/src/Lib', '/home/ping/dev/
python/dist/src/Lib/plat-linux2', '/home/ping/dev/python/dist/src/Lib/lib-tk', '
/home/ping/dev/python/dist/src/Modules', '/home/ping/dev/python/dist/src/build/l
ib.linux-i686-2.1']

>>> help('sys.path')                                        ##18##
Help on list in sys:

path = ['', '/home/ping/python', '/home/ping/dev/python/dist/src/Lib', '/home/pi
ng/dev/python/dist/src/Lib/plat-linux2', '/home/ping/dev/python/dist/src/Lib/lib
-tk', '/home/ping/dev/python/dist/src/Modules', '/home/ping/dev/python/dist/src/
build/lib.linux-i686-2.1']

>>> help('array')                                           ##19##
Help on module array:

NAME
    array

FILE
    /home/ping/dev/python/dist/src/build/lib.linux-i686-2.1/array.so

DESCRIPTION
    This module defines a new object type which can efficiently represent
    an array of basic values: characters, integers, floating point
    numbers.  Arrays are sequence types and behave very much like lists,
    except that the type of objects stored in them is constrained.  The
    type is specified at object creation time by using a type code, which
    is a single character.  The following type codes are defined:
    
        Type code   C Type             Minimum size in bytes 
        'c'         character          1 
        'b'         signed integer     1 
        'B'         unsigned integer   1 
        'h'         signed integer     2 
        'H'         unsigned integer   2 
        'i'         signed integer     2 
        'I'         unsigned integer   2 
        'l'         signed integer     4 
        'L'         unsigned integer   4 
        'f'         floating point     4 
        'd'         floating point     8 
    
    Functions:
    
    array(typecode [, initializer]) -- create a new array
    
    Special Objects:
    
    ArrayType -- type object for array objects

FUNCTIONS
    array(...)
        array(typecode [, initializer]) -> array
        
        Return a new array whose items are restricted by typecode, and
        initialized from the optional initializer value, which must be a list
        or a string.

CONSTANTS
    ArrayType = 
    __doc__ = 'This module defines a new object type which can ...cts:\n\n...
    __file__ = '/home/ping/dev/python/dist/src/build/lib.linux-i686-2.1/ar...
    __name__ = 'array'


>>> 



##1##  Topic names are all in capital letters so that it's very
       unlikely they will collide with module or package names.
       The hope is that the user will understand they're supposed
       to enter them in capitals, as shown.

##2##  Each topic is associated with one of the HTML files in the
       library documentation.  The formatter module is used to
       turn HTML into text.  A couple of small enhancements to the
       HTML parser allow tables to be crudely displayed; columns
       are separated by tabs, so they don't always line up, but at
       least that's a lot better than having the entire table get
       mushed into one paragraph.  The generated text is displayed
       using the pager, like everything else.

##3##  Each topic can also have a number of cross-references, which
       are shown following the help docs (after the pager is done).
       The cross-references can be other topics, keywords, or modules.

##4##  You can surf through the docs by picking one of the related
       topics and typing it back in.  Yes, i know the list of related
       topics isn't word-wrapped -- that's an easy change.

##5##  A list of Python keywords is available.  It's okay for them
       to be entered by the user in lowercase -- they're reserved
       words, so there will never be modules with these names.

##6##  Each keyword is similarly associated with an HTML file and
       possibly some related topics.

##7##  Typing "modules" followed by a search key does the same thing
       as "pydoc -k" from the shell.

##8##  Typing in a module name is just like running "pydoc" from the
       shell on a module.

##9##  Typing in "modules" by itself produces a list of all the
       modules and packages.  It takes less than two seconds on
       my machine to gather the list.

##10## What happens if you type "help" in help?  You get the intro.

##11## Built-in functions are available too.

##12## You can look things up with a dotted path of arbitrary depth.

##13## What happens if there's no such module?  You get a message.

##14## "quit", "q", "QUIT", "Quit", "Q", and Ctrl-D all quit help.
       Even typing in '"quit"' with the quotation marks works, just
       to make sure beginners don't get stuck.

##15## What happens if you ask for help on a number?  Nothing too
       useful, but at least it doesn't explode.

##16## Asking for help on a built-in method is actually useful.

##17## If you ask for help on an object, it just shows you the object.

##18## If you ask for help and give the path to get to an object, it
       shows you the object and also where it came from.

##19## You can get help directly from the interpreter level by
       invoking "help()" with an argument.




From fdrake@acm.org  Thu Apr 12 14:41:46 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 12 Apr 2001 09:41:46 -0400 (EDT)
Subject: [Doc-SIG] Doc nit: << and >>
In-Reply-To: 
References: 
Message-ID: <15061.45210.173196.47640@beowolf.digicool.com>

Ka-Ping Yee writes:
 > I just noticed on the page
 > 
 >     http://python.sourceforge.net/devel-docs/ref/summary.html
 > 
 > that the shifting operators are not shown as << and >>, but
 > as the double-angle-quote characters ("guillemets"?) that the
 > French are fond of.  This should probably get fixed, though
 > i'm not enough of a TeX expert to know how (will a couple of
 > backslashes do the trick?).

  Fixed in CVS; thanks!


  -Fred

-- 
Fred L. Drake, Jr.  
PythonLabs at Digital Creations



From Juergen Hermann" >
In-Reply-To: 
Message-ID: 

On Fri, 13 Apr 2001 04:41:50 -0700 (PDT), Ka-Ping Yee wrote:

>I just noticed on the page
>
>    http://python.sourceforge.net/devel-docs/ref/summary.html
>
>that the shifting operators are not shown as << and >>, but
>as the double-angle-quote characters ("guillemets"?) that the
>French are fond of.  This should probably get fixed, though
>i'm not enough of a TeX expert to know how (will a couple of
>backslashes do the trick?).

My TeX is rusted, but $<$$<$ should do the trick.


Ciao, J=FCrgen

--
J=FCrgen Hermann, Developer (jhe@webde-ag.de)
WEB.DE AG, http://webde-ag.de/




From guido@digicool.com  Thu Apr 12 20:38:45 2001
From: guido@digicool.com (Guido van Rossum)
Date: Thu, 12 Apr 2001 14:38:45 -0500
Subject: [Doc-SIG] pydoc.py: new help feature
In-Reply-To: Your message of "Fri, 13 Apr 2001 06:25:56 MST."
 
References: 
Message-ID: <200104121938.OAA21112@cj20424-a.reston1.va.home.com>

OK, Ping, because pydoc is so new, we'll take the latest version.
Please check it in ASAP!!!

You're going to have to stay up the next 24 hours looking for bug
reports, and also over the weekend once the release candidate is
out. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From ping@lfw.org  Fri Apr 13 21:13:35 2001
From: ping@lfw.org (Ka-Ping Yee)
Date: Fri, 13 Apr 2001 13:13:35 -0700 (PDT)
Subject: [Doc-SIG] pydoc.py: new help feature
In-Reply-To: <200104121938.OAA21112@cj20424-a.reston1.va.home.com>
Message-ID: 

On Thu, 12 Apr 2001, Guido van Rossum wrote:
> OK, Ping, because pydoc is so new, we'll take the latest version.
> Please check it in ASAP!!!

Done.

> You're going to have to stay up the next 24 hours looking for bug
> reports, and also over the weekend once the release candidate is
> out. :-)

You got it!


-- ?!ng



From fdrake@beowolf.digicool.com  Fri Apr 13 06:10:02 2001
From: fdrake@beowolf.digicool.com (Fred Drake)
Date: Fri, 13 Apr 2001 01:10:02 -0400 (EDT)
Subject: [Doc-SIG] [development doc updates]
Message-ID: <20010413051002.795BD2879C@beowolf.digicool.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/devel-docs/


More description and explanation in the unittest documentation; update to
match the final code and decisions from the pyunit-interest mailing list.

Added information on urllib.FancyURLopener's handling of basic 
authentication and how to change the prompting behavior.

Added documentation for the ColorPicker module for the Macintosh.



From fdrake@acm.org  Fri Apr 13 19:02:55 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 13 Apr 2001 14:02:55 -0400 (EDT)
Subject: [Doc-SIG] Docs are frozen.
Message-ID: <15063.16207.884585.823138@beowolf.digicool.com>

  The documentation tree is frozen for Python 2.1c1.  All further
changes should be submitted via the SourceForge patch manager until
Python 2.1 has been released.
  Thanks!


  -Fred

-- 
Fred L. Drake, Jr.  
PythonLabs at Digital Creations



From fdrake@beowolf.digicool.com  Fri Apr 13 19:15:38 2001
From: fdrake@beowolf.digicool.com (Fred Drake)
Date: Fri, 13 Apr 2001 14:15:38 -0400 (EDT)
Subject: [Doc-SIG] [development doc updates]
Message-ID: <20010413181538.7BA3F28A06@beowolf.digicool.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/devel-docs/

Final documentation for Python 2.1c1.



From edloper@gradient.cis.upenn.edu  Fri Apr 13 20:47:33 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Fri, 13 Apr 2001 15:47:33 EDT
Subject: [Doc-SIG] lightweight markup: bullets
In-Reply-To: Your message of "Wed, 11 Apr 2001 16:10:45 EDT."
 
Message-ID: <200104131947.f3DJlXp14396@gradient.cis.upenn.edu>

Tim Peters said:
> [Edward D. Loper, on L<...> for list items]
> > Perhaps.  The reason that I didn't do that is that the use of <...> or
> > L<...> for bullets is really very different from the use of X<...> for
> > coloring.
> 
> It's barely different at all to me:  it's markup, as opposed to not markup,
> and that's the *primary* distinction that needs to be learned.  You
> overburden my biological pattern-recognition engine if I have to learn N
> different lexical conventions for N different categories of markup.

Well, I guess that part of the idea behind a lightweight markup is that 
we should try to re-use regexps that are already in your brain.  Which
might be a good argument with just sticking with lists that look like:
  - list item
  - another list item
or:
  1. list item
  2. another list item

> I've got no particular use for list markup at all, 

What markup do you find that you do have use for (while writing
docstrings)?  I personally tend to just use C{code} regions (for
identifiers, mainly); unordered lists; and literal blocks/doctest
blocks.  Oh, and fields for specifying info about specific parameters
or the return value or what exceptions are thrown, etc.

-Edward



From klm@digicool.com  Sat Apr 14 17:37:56 2001
From: klm@digicool.com (klm@digicool.com)
Date: Sat, 14 Apr 2001 12:37:56 -0400 (EDT)
Subject: [Doc-SIG] lightweight markup: bullets
Message-ID: <15064.31972.597163.363329@serenade.digicool.com>

First chance i've had to chime in this week, and only have a moment to
sound my repeating refrain: i would like to see the
structured-text-ish approach.

Someone (was it you, edward?) mentioned the non-geek CP4E-type
audience earlier this week - i'm dismayed to think that we're talking
about exposing them to code in docstrings, eg C<> or Z{} or whatever,
that's more cryptic than lots of python code.  The docstrings should
be more self-obvious, not less!!

Truly, even as a *programmer* i find it helpful that docstring
encoding is written-language encoding, which i can automatically
decipher.  **That's** the reason that the structured text approach
makes sense - the overt meanings of the conventions for the reader,
even the unitiated reader, are the intended ones.  You don't need the
secret codes or a tool to read the docstrings in the program text.

Evidently, the trick is coming up with a decent set of structured text
style rules that are unambiguous and "unsurprising" - in particular,
conventions that don't collide with common writing practices.  (Eg,
collide ones recently discussed: use of '--' for description lists, or
"1." at the end of a sentence but beginning of line translating to the
start of an ordered list item.)  Once again, it seems to me that we're
close to this goal, but veering off to a new language, with C<> or
whatever - totally at the expense of the reader.

Really, it seems to me that such docstrings would make python code
*less* readable, not more.

Oh well.

Ken
klm@digicool.com


From edloper@gradient.cis.upenn.edu  Sat Apr 14 18:05:12 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Sat, 14 Apr 2001 13:05:12 EDT
Subject: [Doc-SIG] lightweight markup: bullets
In-Reply-To: Your message of "Sat, 14 Apr 2001 12:37:56 EDT."
 <15064.31972.597163.363329@serenade.digicool.com>
Message-ID: <200104141705.f3EH5Cp06968@gradient.cis.upenn.edu>

> i would like to see the structured-text-ish approach.

In my mind, there are 2 things we're encoding here: structuring
(lists, sections, literal blocks, etc), and colorizing (emphasized,
inline literals, etc.).  Colorizing only occurs within a paragraph..

I've been working on both designing & implementing a parser for a
markup language for docstrings.  The structuring is based in the
structured-text-ish approach.  I'm currently undecided about whether I
want do do colorizing like E{this} or like *this*.  The advantage of
the former is that it means you can have more types of colorizing
(e.g., colorizing for URIs, for code, for emphasis, for math, for
definitions of terms that should be included in indeces, etc).  The
advantage of the later is that it's presumably more readable.  But if
we go with the later, I think we need to constrain ourselves to maybe
1 or 2 different colors (emph and code/identifier?  or just
identifier?).

> Someone (was it you, edward?) mentioned the non-geek CP4E-type
> audience earlier this week - 

I don't remember mentioning them, but I do think we need to keep them
in mind.  That would be one of my objections to some of the escaping
proposals so far..

> i'm dismayed to think that we're talking about exposing them to code
> in docstrings, eg C<> or Z{} or whatever, that's more cryptic than
> lots of python code.  The docstrings should be more self-obvious, not
> less!!

When I see it in context, it actually doesn't seem that cryptic to me.
But then the people we should be asking about that are people who
don't code.  Maybe we should try encoding some docs with both kinds of
markup, and see what they think.  

> You don't need the secret codes or a tool to read the docstrings in
> the program text.

In general, I think that the colorizing should *never* be necessary to
understand what's being said.. i.e., you should be able to blindly
ignore any X{}s (the "X{" and the "}", not the content).  The only
place where that wouldn't be true would be if X{}s were used to escape
characters, which should hopefully be very rare.

> Evidently, the trick is coming up with a decent set of structured
> text style rules that are unambiguous and "unsurprising" - in
> particular, conventions that don't collide with common writing
> practices.  (Eg, collide ones recently discussed: use of '--' for
> description lists, or "1." at the end of a sentence but beginning of
> line translating to the start of an ordered list item.)  Once again,
> it seems to me that we're close to this goal, but veering off to a
> new language, with C<> or whatever - totally at the expense of the
> reader.

For structuring, I think I have a set of such rules.  I'll send out
mail about that when I've done more testing etc., but basically:
  1. all paragraphs *must* be left-justified
  2. all lists must be either indented or separated by a blank
     line.
  3. The second and subsequent line of a list item must be indented
     further than the bullet.  All lines but the first must be
     left-justified.

     Subsequent paragraphs in the same list item must line up with
     that indentation level.

There are some more, but those are the basics required to avoid
mis-interpreting bullets..  The only true ambiguities you get with
rules like these are things like:
  1. This is a list item whose second line begins with the number
     1.  Was that "1." a bullet or part of a sentence?

> Really, it seems to me that such docstrings would make python code
> *less* readable, not more.

Do you think that we should have any colorizing at all?  If so, what
colors?  People usually talk about *emphasis*, although I really very
rarely find it useful in docstrings (despite its usefulness in
*email*).  The color I most often want is something to mark a token as
a python identifier (or, more generally, to mark a string as Python
code).

If we didn't do any colorizing, we would probably have:
  - paragraphs (in which word-wrapping is legal, etc.)
  - literal blocks (which are displayed as-is)
  - doctest blocks (which are displayed as-is, or possibly colorized)
  - lists (ordered and unordered)
  - sections (and subsections)

If there was no colorizing, I'm pretty sure we could get away with no
escaping mechanism (with carefully chosen structuring rules, it would
never be necessary).

-Edward


From klm@digicool.com  Sat Apr 14 18:41:04 2001
From: klm@digicool.com (Ken Manheimer)
Date: Sat, 14 Apr 2001 13:41:04 -0400 (EDT)
Subject: [Doc-SIG] lightweight markup: bullets
In-Reply-To: <200104141705.f3EH5Cp06968@gradient.cis.upenn.edu>
Message-ID: 

On Sat, 14 Apr 2001, Edward D. Loper wrote:

> Do you think that we should have any colorizing at all?  If so, what
> colors?  People usually talk about *emphasis*, although I really very
> rarely find it useful in docstrings (despite its usefulness in
> *email*).  The color I most often want is something to mark a token as
> a python identifier (or, more generally, to mark a string as Python
> code).

Huh - me too.  And certainly, emphasis is not significant when it comes to
auto-documentation considerations like function-name and variable indexes!  
(Without any special interpretation, the rare occasions that i use "*"
emphasis in my docstrings will still show through - as "*" emphasis.  
Handy, that.-)

Marking tokens is another matter - but just marking them doesn't add much
info to the auto-index situation.  Significant info would require more
elaborate structuring conventions, which we're nowhere near discussing
yet.

Why *don't* we start without coloring, and get plenty of the other useful
stuff that you mention in place?  We can defer the controversy, and maybe
when we get around to it we'll know more what kind of structuring we
really need...

Excellent.

Ken
klm@digicool.com



From edloper@gradient.cis.upenn.edu  Sat Apr 14 19:44:38 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Sat, 14 Apr 2001 14:44:38 EDT
Subject: [Doc-SIG] lightweight markup: bullets
In-Reply-To: Your message of "Sat, 14 Apr 2001 13:41:04 EDT."
 
Message-ID: <200104141844.f3EIicp15860@gradient.cis.upenn.edu>

> Why *don't* we start without coloring, and get plenty of the other
> useful stuff that you mention in place?  We can defer the
> controversy, and maybe when we get around to it we'll know more what
> kind of structuring we really need...

Sounds like a good idea to me.  The only slight problem is that, in
theory, colorizing and structuring interact slightly.  In particular,
consider examples like::

  - This is a list item talking about C{x
    - y}.  C{x - y} is a good value.  We
    all love it.

Here, in principle, we should be able to tell from the fact that we're
inside a colored region that the "-" is not a list bullet.  But I'd be
ok ignoring cases like that for now..  I've tried to be careful to
design my structring rules so it can be as independant of colorizing
as possible.

-Edward


From edloper@gradient.cis.upenn.edu  Sat Apr 14 19:46:24 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Sat, 14 Apr 2001 14:46:24 EDT
Subject: [Doc-SIG] Syntax for fields
Message-ID: <200104141846.f3EIkOp16056@gradient.cis.upenn.edu>

I've been actually using my markup language to document some things,
and my current syntax for fields seems somewhat problematic.  I've
been using expressions like::

  author: Edward Loper
  param n: The size of the list to return.
  type n: C{int}
  return: A list containing all of the prime numbers between
          2 and C{x}, inclusive.

(The syntax is essentially the same as it is for list items, except
that "\w+ \w+:" is the bullet instead of "-" or "(\d+\.)+")

The problem is that there's too much overlap between the form of such
expressions and the form of natural language expressions like::

  Consider this: blah blah..
  A problem: .sdf dfs...
  However: ...

And I don't feel comfortable forbidding people to use expressions like
that (among other things, it's just not something people will
remember not to do).

I was using the "param n:" style mainly because it's easy to read.
One other option is to mimic javadoc, and do something like::

  @author Edward Loper
  @param n The size of the list to return

or::

  @author: Edward Loper
  @param n: The size of the list to return

or::

  @author Edward Loper
  @param(n) The size of the list to return

Another option might be to only allow captizlied field names::

  AUTHOR: Edward Loper
  PARAM n: The size of the list to return

Although this looks somewhat ugly to me.  Yet another option would be
to only count ":" or " :" as a field if  is one of
a small finite set of reserved words..  But that means we can never
expand the tag set in a backwards-compatible way, and that we can't
have alternative tag-sets for code written in different languages.

Other ideas?  Which (if any) of these alternatives are appealing?

-Edward


From guido@digicool.com  Sat Apr 14 21:56:20 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sat, 14 Apr 2001 15:56:20 -0500
Subject: [Doc-SIG] lightweight markup: bullets
In-Reply-To: Your message of "Sat, 14 Apr 2001 13:05:12 EDT."
 <200104141705.f3EH5Cp06968@gradient.cis.upenn.edu>
References: <200104141705.f3EH5Cp06968@gradient.cis.upenn.edu>
Message-ID: <200104142056.PAA30488@cj20424-a.reston1.va.home.com>

> For structuring, I think I have a set of such rules.  I'll send out
> mail about that when I've done more testing etc., but basically:
>   1. all paragraphs *must* be left-justified
>   2. all lists must be either indented or separated by a blank
>      line.
>   3. The second and subsequent line of a list item must be indented
>      further than the bullet.  All lines but the first must be
>      left-justified.
> 
>      Subsequent paragraphs in the same list item must line up with
>      that indentation level.

I like this.

> There are some more, but those are the basics required to avoid
> mis-interpreting bullets..  The only true ambiguities you get with
> rules like these are things like:
>   1. This is a list item whose second line begins with the number
>      1.  Was that "1." a bullet or part of a sentence?

Let's err on the side of caution and declare this is not a list item
unless it's separated by a blank line.

> > Really, it seems to me that such docstrings would make python code
> > *less* readable, not more.
> 
> Do you think that we should have any colorizing at all?  If so, what
> colors?  People usually talk about *emphasis*, although I really very
> rarely find it useful in docstrings (despite its usefulness in
> *email*).  The color I most often want is something to mark a token as
> a python identifier (or, more generally, to mark a string as Python
> code).

Personally, I like having the *emphasis coloring*; I care less about
coloring identifiers.  My reasoning: sometimes it's *really* useful to
be able to stress the importance of something without SHOUTING; but
pieces of source code are easy enough to recognize without coloring:
they just *look* different, e.g. foo(bar) is clearly a function call.
When it's ambiguous, I'll put single or double quotes around it
(e.g. when referencing the 'a' variable by itself) but I'm OK with
seeing those quotes in the printed documentation as well; I'm *not* OK
with seeing *emphasis* printed as "*emphasis*".

One more thing: I'd like to argue against the use of a fixed-width
font for in-line code examples.  Typically this uses Courier, whose
characters are *way* to wide for readability.  I can understand why a
fixed-width font is necessary in sample *blocks*, because *sometimes*
(though not very often) there's code that is arranged in a tabular
manner; but this argument doesn't apply to in-line code samples.

> If we didn't do any colorizing, we would probably have:
>   - paragraphs (in which word-wrapping is legal, etc.)
>   - literal blocks (which are displayed as-is)
>   - doctest blocks (which are displayed as-is, or possibly colorized)
>   - lists (ordered and unordered)
>   - sections (and subsections)
> 
> If there was no colorizing, I'm pretty sure we could get away with no
> escaping mechanism (with carefully chosen structuring rules, it would
> never be necessary).

But I want an escaping mechanism.  I want to be able to say e.g. "When
I write "*foo*", all its characters are rendered, including the quotes
and the stars, but when I write \*foo*, it is rendered in italics."
(In other words, I want to be able to give an in-line example of the
"*foo*" notation.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake@beowolf.digicool.com  Sat Apr 14 21:09:33 2001
From: fdrake@beowolf.digicool.com (Fred Drake)
Date: Sat, 14 Apr 2001 16:09:33 -0400 (EDT)
Subject: [Doc-SIG] [development doc updates]
Message-ID: <20010414200933.0218628A09@beowolf.digicool.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/devel-docs/


Final Python 2.1 documentation.



From edloper@gradient.cis.upenn.edu  Sat Apr 14 21:16:41 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Sat, 14 Apr 2001 16:16:41 EDT
Subject: [Doc-SIG] lightweight markup: bullets
In-Reply-To: Your message of "Sat, 14 Apr 2001 15:56:20 CDT."
 <200104142056.PAA30488@cj20424-a.reston1.va.home.com>
Message-ID: <200104142016.f3EKGfp24071@gradient.cis.upenn.edu>

> > There are some more, but those are the basics required to avoid
> > mis-interpreting bullets..  The only true ambiguities you get with
> > rules like these are things like:
> >   1. This is a list item whose second line begins with the number
> >      1.  Was that "1." a bullet or part of a sentence?
> 
> Let's err on the side of caution and declare this is not a list item
> unless it's separated by a blank line.

Ok, I'll change it.  But in any case, it will generate a warning, since
it's potentially confusing (it will recommend that they move the "1."
to the previous line or the "number" to the next line, or that they
add a blank line if they intended to start a new list item).  It also
generates warnings for things like::

  The following was probably a mistake:
  - This is not a list item.
  - Neither is this

and::

  The following was probably a mistake:

  - This is a list item;
  but this is a new paragraph, not
  a continuation of the list item.

> > Do you think that we should have any colorizing at all?  If so, what
> > colors?  People usually talk about *emphasis*, although I really very
> > rarely find it useful in docstrings (despite its usefulness in
> > *email*).  The color I most often want is something to mark a token as
> > a python identifier (or, more generally, to mark a string as Python
> > code).
> 
> Personally, I like having the *emphasis coloring*; I care less about
> coloring identifiers.  My reasoning: sometimes it's *really* useful to
> be able to stress the importance of something without SHOUTING; 

Again, I very rarely find myself needing to do this in docstrings.. But
maybe I'm not a representative sample.

> but
> pieces of source code are easy enough to recognize without coloring:
> they just *look* different, e.g. foo(bar) is clearly a function call.
> When it's ambiguous, I'll put single or double quotes around it
> (e.g. when referencing the 'a' variable by itself) but I'm OK with
> seeing those quotes in the printed documentation as well; 

It can be nice to have code colored for other reasons, but I don't 
think it's really a necessity..

> I'm *not* OK with seeing *emphasis* printed as "*emphasis*".

How would you like to see *emphasis* rendered in a tty environment?
Like "*this*"?  Or just like "this", since emphasis should never
really be *necessary* to make your point?  This would apply to any
tool that tries to print marked-up documentation from within
Python, for example (similar to "help").

> One more thing: I'd like to argue against the use of a fixed-width
> font for in-line code examples.  Typically this uses Courier, whose
> characters are *way* to wide for readability.  I can understand why a
> fixed-width font is necessary in sample *blocks*, because *sometimes*
> (though not very often) there's code that is arranged in a tabular
> manner; but this argument doesn't apply to in-line code samples.

Yeah, I had been thinking about that, and I agree.  But of course that's
mainly a tool issue, not a markup language issue.  (though not entirely).
On a related note, I've been thinking that all spaces in in-line
code should be soft.  If you really need "x  y" to come out with 2 
spaces in it instead of one, you should use a literal block.  I'm undecided
about whether spaces in in-line code should be breakable.. Maybe leave
that a tool issue.

> > If we didn't do any colorizing, we would probably have:
> >   - paragraphs (in which word-wrapping is legal, etc.)
> >   - literal blocks (which are displayed as-is)
> >   - doctest blocks (which are displayed as-is, or possibly colorized)
> >   - lists (ordered and unordered)
> >   - sections (and subsections)

I forgot to mention "fields," which allow you to do things like describe
individual parameters, or the return value, or a class's instance
variables, etc.

> > If there was no colorizing, I'm pretty sure we could get away with no
> > escaping mechanism (with carefully chosen structuring rules, it would
> > never be necessary).
> 
> But I want an escaping mechanism.  I want to be able to say e.g. "When
> I write "*foo*", all its characters are rendered, including the quotes
> and the stars, but when I write \*foo*, it is rendered in italics."
> (In other words, I want to be able to give an in-line example of the
> "*foo*" notation.)

Well, as I said *if there's no colorizing*, we don't need escaping.
The second you introduce *emphasis* colorizing, or any other colorizing,
we do need some type of escaping mechanism.  And then we can talk about
various escaping mechanisms (I've seen 3 workable ones: backslashing of
some sort (e.g., \*); using X{..} notation (e.g., E{*} or E{lb}); and
using a literal coloring (e.g. '*' or `*`).. Of course, the last one's 
not as complete, since you then can't include the literal character 
inline..  but at least that's 1 character instead of all the markup 
characters.

But I think that, for now, it makes sense to postpone discussion of
*both* colorizing and escaping (since they're clearly related) and to
try to come up with a good definition for how we want structuring to
work.  Currently, the only open questions in my mind are where to draw
the lines between errors and warnings, and how to write fields in such a
way that they won't conflict with normal English usage.  Any feedback
would be much appreciated.  I'll try to put up a link to my parser
sometime soon, but it's getting towards the end of the semester, and I'm
a bit swamped with projects. :)

-Edward



From guido@digicool.com  Sun Apr 15 01:52:01 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sat, 14 Apr 2001 19:52:01 -0500
Subject: [Doc-SIG] lightweight markup: bullets
In-Reply-To: Your message of "Sat, 14 Apr 2001 16:16:41 EDT."
 <200104142016.f3EKGfp24071@gradient.cis.upenn.edu>
References: <200104142016.f3EKGfp24071@gradient.cis.upenn.edu>
Message-ID: <200104150052.TAA30895@cj20424-a.reston1.va.home.com>

> > Let's err on the side of caution and declare this is not a list item
> > unless it's separated by a blank line.
> 
> Ok, I'll change it.  But in any case, it will generate a warning, since
> it's potentially confusing (it will recommend that they move the "1."
> to the previous line or the "number" to the next line, or that they
> add a blank line if they intended to start a new list item).  It also
> generates warnings for things like::
> 
>   The following was probably a mistake:
>   - This is not a list item.
>   - Neither is this
> 
> and::
> 
>   The following was probably a mistake:
> 
>   - This is a list item;
>   but this is a new paragraph, not
>   a continuation of the list item.

Good.

> > Personally, I like having the *emphasis coloring*; I care less about
> > coloring identifiers.  My reasoning: sometimes it's *really* useful to
> > be able to stress the importance of something without SHOUTING; 
> 
> Again, I very rarely find myself needing to do this in docstrings.. But
> maybe I'm not a representative sample.

Grep through Lib/*.py for ' \*[a-z][a-z]*\* '.  Lots of examples (some
in comments, but those are also documentation :-).

> > but
> > pieces of source code are easy enough to recognize without coloring:
> > they just *look* different, e.g. foo(bar) is clearly a function call.
> > When it's ambiguous, I'll put single or double quotes around it
> > (e.g. when referencing the 'a' variable by itself) but I'm OK with
> > seeing those quotes in the printed documentation as well; 
> 
> It can be nice to have code colored for other reasons, but I don't 
> think it's really a necessity..

Agreed.

> > I'm *not* OK with seeing *emphasis* printed as "*emphasis*".
> 
> How would you like to see *emphasis* rendered in a tty environment?
> Like "*this*"?  Or just like "this", since emphasis should never
> really be *necessary* to make your point?  This would apply to any
> tool that tries to print marked-up documentation from within
> Python, for example (similar to "help").

Since I went to the trouble of typing it, I'd like to see it rendered
one way or another.  Rendering as *foo* is fine.  (Much better than
inverse video!)

> > One more thing: I'd like to argue against the use of a fixed-width
> > font for in-line code examples.  Typically this uses Courier, whose
> > characters are *way* to wide for readability.  I can understand why a
> > fixed-width font is necessary in sample *blocks*, because *sometimes*
> > (though not very often) there's code that is arranged in a tabular
> > manner; but this argument doesn't apply to in-line code samples.
> 
> Yeah, I had been thinking about that, and I agree.  But of course that's
> mainly a tool issue, not a markup language issue.  (though not entirely).

I know, I just wanted to throw it out while I was thinking of it.

> On a related note, I've been thinking that all spaces in in-line
> code should be soft.  If you really need "x  y" to come out with 2 
> spaces in it instead of one, you should use a literal block.  I'm undecided
> about whether spaces in in-line code should be breakable.. Maybe leave
> that a tool issue.

Agreed, and I do think spaces in in-line code should be breakable.  I
write a lot of email with in-line code samples, and I often have no
choice in letting it break -- and if I don't want it to be broken,
I'll make it a block.

> > > If we didn't do any colorizing, we would probably have:
> > >   - paragraphs (in which word-wrapping is legal, etc.)
> > >   - literal blocks (which are displayed as-is)
> > >   - doctest blocks (which are displayed as-is, or possibly colorized)
> > >   - lists (ordered and unordered)
> > >   - sections (and subsections)
> 
> I forgot to mention "fields," which allow you to do things like describe
> individual parameters, or the return value, or a class's instance
> variables, etc.

The Javadoc-style @ notation makes sense to me here -- as you showed,
trying to do this without markup can be plain confusing.

> But I think that, for now, it makes sense to postpone discussion of
> *both* colorizing and escaping (since they're clearly related) and to
> try to come up with a good definition for how we want structuring to
> work.  Currently, the only open questions in my mind are where to draw
> the lines between errors and warnings,

I say be strict.  The tool should always be available and we should
tweak all our docstrings until the tool is happy.

> and how to write fields in such a
> way that they won't conflict with normal English usage.  Any feedback
> would be much appreciated.  I'll try to put up a link to my parser
> sometime soon, but it's getting towards the end of the semester, and I'm
> a bit swamped with projects. :)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From edloper@gradient.cis.upenn.edu  Sun Apr 15 01:14:32 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Sat, 14 Apr 2001 20:14:32 EDT
Subject: [Doc-SIG] __all__, and how it relates to doc tools
Message-ID: <200104150014.f3F0EWp17579@gradient.cis.upenn.edu>

I was looking through the changes made in Python 2.1, and noticed
the "__all__" variable in modules.  My understanding is that, when
defined, this lists all of the variables that will be imported
when you do "from xyz import *".  

Would it also be reasonable for a doc tool to look at this value, for
an indication of which objects to document?  This would be an
easy way of preventing the doc tools from documenting:
  1. "internal" objects (may be a good idea, may not be..)
  2. imported modules and objects that were imported from other 
     modules (most likely a good idea).  (note that this is only
     an issue when we're documenting from within Python, not when
     we're parsing the file that we're documenting.)

(I've had trouble documenting modules that run "from types import *",
and seeing a bunch of Type objects defined by the module, etc..)

Of course, if the __all__ variable is not defined, you'd still have
to use whatever heuristics/rules you have to decide what to document..
And you'd probably want tools to have a flag that tells them to
ignore the __all__ variable.

But mainly I'm wondering whether this is consistant with the intended
meaning of the __all__ variable?  If not, tools shouldn't use it that
way.. we've had enough trouble with people overloading variables
already (pre-function-attribute __doc__ comes to mind).

Also, would it be reasonable to only document the fields of a 
class listed in an __all__ class variable, if such a variable is
defined?

-Edward



From edloper@gradient.cis.upenn.edu  Sun Apr 15 01:15:27 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Sat, 14 Apr 2001 20:15:27 EDT
Subject: [Doc-SIG] lightweight markup: bullets
Message-ID: <200104150015.f3F0FRp17667@gradient.cis.upenn.edu>

Guido said:
> Grep through Lib/*.py for ' \*[a-z][a-z]*\* '.  Lots of examples (some
> in comments, but those are also documentation :-).

Ok, you're right.  But I *still* think that we should defer that issue, 
given that we can.  I'd like to get a markup language that we can all
play with for a little while first, and then talk about how to add
colorizing.. The only way that colorizing should be non-backwards-
compatible is when you need to escape things.

> > How would you like to see *emphasis* rendered in a tty environment?
> [...]
> Since I went to the trouble of typing it, I'd like to see it rendered
> one way or another.  Rendering as *foo* is fine.  (Much better than
> inverse video!)

Agreed (not to mention that sometimes such fancy features as inverse
video are not available).

> I do think spaces in in-line code should be breakable.  I
> write a lot of email with in-line code samples, and I often have no
> choice in letting it break -- and if I don't want it to be broken,
> I'll make it a block.

Agreed.  Although if the tool wants to be nice, and try to avoid 
breaking in-line code, it's free to.  But the markup language says
that any in-line code *can* get broken at spaces.

> > I forgot to mention "fields," which allow you to do things like describe
> > individual parameters, or the return value, or a class's instance
> > variables, etc.
> 
> The Javadoc-style @ notation makes sense to me here -- as you showed,
> trying to do this without markup can be plain confusing.

Ok.  Does anyone have objections to using Javadoc-style @ notation?
Any votes on which of the various notations I wrote out that we should

I was looking at the emacs java-mode, to see how they do colorizing
in docstrings.  It looks like they just colorize things like 
"@param x" and "@author" *wherever* they occur (assuming that these
will appear rarely, if ever, in actual source code; and that when
they do, it will be fairly harmless anyway).  Would we be ok with
doing something like that?  Of course, IDLE could in theory be much
smarter about it..  (And if we did eventually put something like 
this in emacs python mode, it would certainly be something you can
turn on/off).

> I say be strict.  The tool should always be available and we should
> tweak all our docstrings until the tool is happy.

Ok.  I'll err on the side of being strict then.

One advantage of being strict is that it greatly reduces the need
to hand-check the parser's output..  as long as running::

  pytext.check_docstrings(module)

or whatever succeeds, you're most likely fine.

-Edward




From ping@lfw.org  Sun Apr 15 07:42:05 2001
From: ping@lfw.org (Ka-Ping Yee)
Date: Sun, 15 Apr 2001 01:42:05 -0500 (CDT)
Subject: [Doc-SIG] Where to find the docs
Message-ID: 

Where does the documentation normally reside on Unix and Mac
platforms?  (Or, alternate question: why isn't the documentation
part of the main distribution archive on these platforms?)

At the moment, pydoc is looking in:

    os.environ.get('PYTHONDOCS')

Setting the environment variable has highest priority.  Then:

    os.path.join(os.environ.get('PYTHONHOME'), 'doc')
    os.path.join(os.path.basename(sys.executable), 'doc')

These work for Windows, and work for Mac *if* you choose to unpack
the docs into the 'doc' subdirectory of Python's home -- but you
have to rename the unpacked folder manually.

Then, for Unix:

    '/usr/doc/python-docs-' + split(sys.version)[0]
    '/usr/doc/python-' + split(sys.version)[0]
    '/usr/doc/python-docs-' + sys.version[:3]
    '/usr/doc/python-' + sys.version[:3]

The most logical place i would expect the docs to reside is
/usr/doc/python-2.1.  But the last Python documentation RPMs
i installed used /usr/doc/python-docs-1.5.2 and
/usr/doc/python-docs-2.0.  Hence the above.

Other RPMs i have seen put the documentation in
/usr/share/doc/python-docs-2.0.  Should this be added?

Is there a standard place to look?

Thanks,


-- ?!ng



From tim.one@home.com  Sun Apr 15 08:14:50 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 15 Apr 2001 03:14:50 -0400
Subject: [Doc-SIG] __all__, and how it relates to doc tools
In-Reply-To: <200104150014.f3F0EWp17579@gradient.cis.upenn.edu>
Message-ID: 

[Edward D. Loper]
> I was looking through the changes made in Python 2.1, and noticed
> the "__all__" variable in modules.  My understanding is that, when
> defined, this lists all of the variables that will be imported
> when you do "from xyz import *".

Correct!  That's it's only enforced semantics.  More generally, it's meant to
identify which names a module intends to export regardless of means, and in
that larger sense it's more of a doc gimmick than a language feature.

> Would it also be reasonable for a doc tool to look at this value, for
> an indication of which objects to document?

Absolutely.  In fact, that's probably the best use.

> ...
> And you'd probably want tools to have a flag that tells them to
> ignore the __all__ variable.

I wouldn't:  if a module lies about what it intends to export, that's a bug
in the module.

> ...
> Also, would it be reasonable to only document the fields of a
> class listed in an __all__ class variable, if such a variable is
> defined?

__all__ was a marginal idea even at the module level; I'd prefer not to see
it spread.  The practical problem at the module level was that

    import xyz

also acts as an export of xyz (from "import *"'s POV), and usually an
unintended export.  There's no such problem at the class level.



From edloper@gradient.cis.upenn.edu  Sun Apr 15 08:53:32 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Sun, 15 Apr 2001 03:53:32 EDT
Subject: [Doc-SIG] __all__, and how it relates to doc tools
In-Reply-To: Your message of "Sun, 15 Apr 2001 03:14:50 EDT."
 
Message-ID: <200104150753.f3F7rWp28385@gradient.cis.upenn.edu>

> > And you'd probably want tools to have a flag that tells them to
> > ignore the __all__ variable.
> 
> I wouldn't:  if a module lies about what it intends to export, that's a bug
> in the module.

My guess is that most people won't put "private" functions/classes/
etc. in the __all__ list, but it still may be useful for a doc tool to
be able to process the docstrings of the "private" objects..  This is
similar to including a flag saying whether a doc tool should process
private objects (ones starting with "_" or "__")..

> __all__ was a marginal idea even at the module level; I'd prefer not
> to see it spread.

Ok.

> The practical problem at the module level was that
> 
>     import xyz
> 
> also acts as an export of xyz (from "import *"'s POV), and usually an
> unintended export.  There's no such problem at the class level.

I would be surprised if people don't also use it to hide "private"
objects.  Is this something we want to discourage?  (Of course, 
"private" objects are probably named with a _leading_backslash, and
my understanding is that "from xyz import *" won't import such
objects if __all__ is undefined.. so perhaps the question is moot..)

Incidentally, if __all__ is defined, and it includes objects that
begin with a "_", do those get imported (in "from xyz import *")?  Or
does the general rule that objects starting with "_" don't get
imported override that?  (I haven't had a chance to grab 2.1 and play
with it yet..)

-Edward


From tim.one@home.com  Sun Apr 15 09:26:04 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 15 Apr 2001 04:26:04 -0400
Subject: [Doc-SIG] __all__, and how it relates to doc tools
In-Reply-To: <200104150753.f3F7rWp28385@gradient.cis.upenn.edu>
Message-ID: 

[Edward D. Loper]
> My guess is that most people won't put "private" functions/classes/
> etc. in the __all__ list, but it still may be useful for a doc tool to
> be able to process the docstrings of the "private" objects..  This is
> similar to including a flag saying whether a doc tool should process
> private objects (ones starting with "_" or "__")..

It's useful for a doc tool to have a notion of public and private class
attributes, but naming conventions already exist to make those distinctions.
It would be unPythonic to introduce another mechanism to do the same thing.

> ...
> I would be surprised if people don't also use it to hide "private"
> objects.  Is this something we want to discourage?

Yes:  the convention for module-private names has always been to begin them
with an underscore.  It wasn't the intent of __all__ to throw that rule away;
although, frankly, I've never been clear on exactly why __all__ *was* added.
The addition of "import name as _name" syntax made it convenient enough to do
"non-exporting imports", as far as I was concerned.

> ...
> Incidentally, if __all__ is defined, and it includes objects that
> begin with a "_", do those get imported (in "from xyz import *")?

Yes, if an __all__ list is present, import* imports exactly the names it
contains.



From guido@digicool.com  Sun Apr 15 14:12:23 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sun, 15 Apr 2001 08:12:23 -0500
Subject: [Doc-SIG] __all__, and how it relates to doc tools
In-Reply-To: Your message of "Sun, 15 Apr 2001 03:14:50 -0400."
 
References: 
Message-ID: <200104151312.IAA08960@cj20424-a.reston1.va.home.com>

> > Would it also be reasonable for a doc tool to look at this value, for
> > an indication of which objects to document?
> 
> Absolutely.  In fact, that's probably the best use.

Hm.  You may be right, but Ping told me that he had tried this in
pyoc, and was unhapy with the result: too much stuff didn't get
documented.  So we should at least be willing to retract this idea.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From dgoodger@atsautomation.com  Mon Apr 16 14:39:24 2001
From: dgoodger@atsautomation.com (Goodger, David)
Date: Mon, 16 Apr 2001 09:39:24 -0400
Subject: [Doc-SIG] backslashing
Message-ID: 

Edward D. Loper wrote:
> If you used E and E or E{lb} and E{rb} or something like that,
> then regexps would generally look how they're supposed to (at least
> when you print them).

So would *any other* convention -- when you print them. The point is, what
do they look like when you read them?

Another point: mark up "x > y" as an inline literal. If you use C<>, you
need to escape. If you use C{}, you need to escape for some other case.

[me]
> > Inline literals would be better::
> > 
> >     The regexp `r"\."` matches a literal period.

[Edward]
> But then we have to say that inline literals can't ever contain "'"..
> which in my mind is no better than saying that you can't backslash
> '{' and '}'.

No mention of "'" (single quote). I used "`" (backquote). Your email font
can't distinguish the two.

> Guido has objected to `literal`

On 30 March, Guido wrote:
> In many fonts, backtick is hard to distinguish from apostrophe!

Two aspects: reading and writing. If you're reading the raw marked-up
docstring/email/whatever, it doesn't *matter* if it shows up as a backquote
or a single quote. As long as it appears quoted in some manner, the quoting
has served its purpose. If you're reading the processed docstring, the
`inline literal` (note: backquotes used) will be formatted in some way which
makes the context obvious. If you're reading the raw text and debating about
it, you'd better be using a font which distinguishes clearly between all
ASCII characters (there are common fonts in which "(" and "{" are hardly
distinguishable either).

If you're *writing* the markup (or writing *about* it! :) you'd better be
using a suitable font. 

After all, one day you might receive email saying::

    What's wrong with this code?

    >>> hello = 'Dolly'
    >>> print `hello`, 'hello'
    Dolly hello

    It should print "hello Dolly"!

[Guido]
> This seems to come from a confusion between two similar, but
> different goals:
> 
> - It should be easy to read without any knowledge of the markup
>   language
> 
> - It should be possible to author without knowing the whole markup
>   language and without changing your habits
> 
> I can agree with the first one, but I think the second will continue
> to get us into trouble.

Agreed. So change your habits, change your mindset, everyone! Or at least,
change your email font! ;-) Documentation is data, and markup is the
equivalent of code.  

[Edward]
> I think that if the reason we're rejecting X{} or X<> is
> because it's "not readable," then there's no reason to accept #code#,
> which to me is signifigantly less intuitive than C{code}.

Yes.

/DG


From dgoodger@atsautomation.com  Mon Apr 16 14:39:26 2001
From: dgoodger@atsautomation.com (Goodger, David)
Date: Mon, 16 Apr 2001 09:39:26 -0400
Subject: [Doc-SIG] lists & blank lines (was re: backslashing)
Message-ID: 

(subject changed to separate the issues)

[I wrote:]
> > Recently, I've come to the conclusion that requiring a blank line
> > before the start of a list is reasonable and correct, even if we
> > don't require blank lines between items. Minimizing ambiguity trumps
> > minimizing vertical space.
 
Edward D. Loper wrote:
> That would make things easier.  But we would also have to require that
> sublists are surrounded by blank lines.
[examples omitted]
> Any objections to that?

None.

> The way my markup language currently works,
> we don't have to worry about how to detect when a new list item
> starts, because list item contents are required to be indented::
> 
>   - this is a valid list
>     item.
> 
>   - This is not a valid
>   list item.

Explicit wins the day. For list items, blank line & indentation ambiguity
will bite us all someday, so removing ambiguity is good.

/DG


From edloper@gradient.cis.upenn.edu  Mon Apr 16 16:39:28 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Mon, 16 Apr 2001 11:39:28 EDT
Subject: [Doc-SIG] lists & blank lines (was re: backslashing)
In-Reply-To: Your message of "Mon, 16 Apr 2001 09:39:26 EDT."
 
Message-ID: <200104161539.f3GFdSp09663@gradient.cis.upenn.edu>

> > The way my markup language currently works,
> > we don't have to worry about how to detect when a new list item
> > starts, because list item contents are required to be indented::
> >
> >   - this is a valid list
> >     item.
> >
> >   - This is not a valid
> >   list item.
> 
> Explicit wins the day. For list items, blank line & indentation
> ambiguity will bite us all someday, so removing ambiguity is good.

Um.. I'm not sure whether that means you're agreeing with me or
disagreeing..  But the basic reasoning here was that there are
a number of structural forms that are "ambiguous", in the sense 
that people use them to convey different structures.. For lists,
the "ambiguous" structures that I thought of are:

  - xxxx x xxxx         (one list item or a list item
  xx xx x xxxxx          followed by a paragraph?)

  xxx xx xx xxx         (one paragraph or a paragraph
  - xx x xxxxxx          followed by a list item?)

  - xx x x xxxx         (a list item with one para or with
    - x xxxxx x          one para and one sublist?)

  - xx xx x xxx         (one list item with a dash in its
  - x xx xx x x          para or two list items?)

(where "-" represents any bullet charcter, and "xxx" is text.)

The problem is that people will decide which choice to read
something as based on the text..  Which will lead to errors
in writing formatted docstrings.

The solution?  Make all "ambigous" structures give either errors
or warnings, and ask people to write them in unambiguous ways.
To make any of them look like singe paras, simply re-word-wrap
so that the "-" is not at the beginning of the line.  To make
them look like list items, indent list items and separate them
with blank lines; and indent the contents of list items.  This
also makes it clear whether:

  - xx xx x xxx

  xx x xx xxx x

is a list item with one para followed by a paragraph, or a list
item with two paragraphs.

The only ambiguity that this dosn't deal with is the last one I
listed.  But I decided that we could probably ignore that ambiguity,
because if anyone *does* try to make it one paragraph with an embedded
bullet, it's unreadable anyway:

    1. This is a list
    2. This list item talks about the number
    1.  1. is a good number.
    3. That was confusing.

If you disagree, we could require blank lines between list items.  The
main disadvantage there is that it would add a fair amount of blank
space (fields obey the same rules as lists, so you'd have to say::

    @param x: ...

    @param y: ...

    @returns: ...

    @raises: ...

instead of::

    @param x: ...
    @param y: ...
    @returns: ...
    @raises: ...

)

The other advantage of this set of rules is that it allows us to
completely separate colorizing from structuring..  Which means that we
can temporarily put colorizing aside, and concentrate on what we want
our structuring rules to do.

-Edward


From edloper@gradient.cis.upenn.edu  Mon Apr 16 17:09:42 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Mon, 16 Apr 2001 12:09:42 EDT
Subject: [Doc-SIG] backslashing
In-Reply-To: Your message of "Mon, 16 Apr 2001 09:39:24 EDT."
 
Message-ID: <200104161609.f3GG9hp12691@gradient.cis.upenn.edu>

>> If you used E and E or E{lb} and E{rb} or something like that,
>> then regexps would generally look how they're supposed to (at least
>> when you print them).
> 
> So would *any other* convention -- when you print them. The point
> is, what do they look like when you read them?

Not when you print them with "print foo.__doc__"; only when you use
some tool to interpret them and print them..

> Another point: mark up "x > y" as an inline literal. If you use C<>,
> you need to escape. If you use C{}, you need to escape for some
> other case.

True.  But the advantage of C{} is that we can say that X{} is
markup for X=[A-Z], but any other *nested* {}s will be printed as {}s
(so, e.g., you can say C{ {1:'a', 2:'b'} })...  Which means that
you *almost* never need to use explicit escaping.  (you need it
if you want to talk about "{"s and "}"s themselves, as opposed to
objects defined with them.. or if you want to put a capital letter
before a "{").  In particular, I searched for all "{"s in the
standard library and the other packages I have installed on my
system, and found no examples where they would need to be escaped..

Once you introduce "\" as an escape character, though, all sorts of
"\"s now need to be escaped..  And I don't really like the convention
of keeping "\"s if they appear before something that doesn't require
escaping.. It taxes my brain too much.

I guess I'm trying to go on the principle of keeping the need to
escape characters to a minimum, because whatever escaping mechanism we
have, it'll be somewhat ugly/difficult to read.

-Edward


From edloper@gradient.cis.upenn.edu  Mon Apr 16 17:57:35 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Mon, 16 Apr 2001 12:57:35 EDT
Subject: [Doc-SIG] backslashing
In-Reply-To: Your message of "Mon, 16 Apr 2001 09:39:24 EDT."
 
Message-ID: <200104161657.f3GGvZp17069@gradient.cis.upenn.edu>

David Quoted:
> [I said]
> > Guido has objected to `literal`
> 
> [Guido said]
> > In many fonts, backtick is hard to distinguish from apostrophe!
>
> [response omitted]

That wasn't the objection I was refering to...  I was referring to:

[Guido said]
> I don't like `...`, because (a) it means something very specific in
> Python (and in the Unix shell), (b) it's hard to distinguish from
> '...' in some fonts, and (c) except for the `...` Python and shell
> notation, I expect ` to be closed with '.

(well, I guess part (b) is the same, and I agree that that's not a
real objection.. The only time when you'll be looking at docstrings as
raw text will most likely be either in your code editor or in Python.
And hopefully your font for both those environments distinguishes
apostrophe from backtick, because otherwise you'll have a lot of
trouble coding..)

We might be able to convince Guido to let go of (a) and (c).  I
personally strongly favor using `...` (backticks) over '...'
(apostrophes), since apostrophes are fairly overloaded in natural
language already..

-Edward


From dgoodger@atsautomation.com  Mon Apr 16 19:00:10 2001
From: dgoodger@atsautomation.com (Goodger, David)
Date: Mon, 16 Apr 2001 14:00:10 -0400
Subject: [Doc-SIG] lists & blank lines (was re: backslashing)
Message-ID: 

[Edward D. Loper]
> > > The way my markup language currently works,
> > > we don't have to worry about how to detect when a new list item
> > > starts, because list item contents are required to be indented::
> > >
> > >   - this is a valid list
> > >     item.
> > >
> > >   - This is not a valid
> > >   list item.

[me]
> > Explicit wins the day. For list items, blank line & indentation
> > ambiguity will bite us all someday, so removing ambiguity is good.

[Edward]
> Um.. I'm not sure whether that means you're agreeing with me or
> disagreeing.

Agreeing. Remove ambiguity. Require blank lines & intentation to make lists
explicit.

> But the basic reasoning here was that there are
> a number of structural forms that are "ambiguous", in the sense 
> that people use them to convey different structures.. For lists,
> the "ambiguous" structures that I thought of are:
> 
>   - xxxx x xxxx         (one list item or a list item
>   xx xx x xxxxx          followed by a paragraph?)

Item followed by paragraph, with warning. Or error.

>   xxx xx xx xxx         (one paragraph or a paragraph
>   - xx x xxxxxx          followed by a list item?)

One paragraph, with warning.

>   - xx x x xxxx         (a list item with one para or with
>     - x xxxxx x          one para and one sublist?)

One para, no sublist, with warning.

>   - xx xx x xxx         (one list item with a dash in its
>   - x xx xx x x          para or two list items?)

Two list items (assuming the list was started properly, of course).

> The problem is that people will decide which choice to read
> something as based on the text.

> The solution?  Make all "ambigous" structures give either errors
> or warnings, and ask people to write them in unambiguous ways.

Yes.

> This also makes it clear whether:
> 
>   - xx xx x xxx
> 
>   xx x xx xxx x
> 
> is a list item with one para followed by a paragraph, or a list
> item with two paragraphs.

The former.

> The only ambiguity that this dosn't deal with is the last one I
> listed.

Which one was that? Unclear.

> If you disagree, we could require blank lines between list items.

Unnecessary (ie, I agree; I *don't* disagree ;). We don't need blank lines
between items if the rules for list item indentation are explicit. Blank
lines were required by StructuredText because it makes parsing easy, but
there were many complaints about wasted vertical space. These rules make an
unambiguous solution.

/DG


From dgoodger@atsautomation.com  Mon Apr 16 19:12:33 2001
From: dgoodger@atsautomation.com (Goodger, David)
Date: Mon, 16 Apr 2001 14:12:33 -0400
Subject: [Doc-SIG] field syntax (was re: lists & blank lines)
Message-ID: 

[Edward Loper]
>     @param x: ...

I'm not a big fan of the JavaDoc @ syntax, but I don't know of a better
inline syntax for keyword-tagged values. (I did propose a [directive-based
syntax]_; search for "keyword".) I propose that until a clearly superior
syntax is discovered/revealed, we leave these out of the discussion
(unnecessary complication).

/DG

.. _directive-based syntax:
   http://mail.python.org/pipermail/doc-sig/2000-November/001241.html


From edloper@gradient.cis.upenn.edu  Mon Apr 16 19:28:57 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Mon, 16 Apr 2001 14:28:57 EDT
Subject: [Doc-SIG] lists & blank lines (was re: backslashing)
In-Reply-To: Your message of "Mon, 16 Apr 2001 14:00:10 EDT."
 
Message-ID: <200104161828.f3GISwp24578@gradient.cis.upenn.edu>

> >   - xxxx x xxxx         (one list item or a list item
> >   xx xx x xxxxx          followed by a paragraph?)
> 
> Item followed by paragraph, with warning. Or error.

Yes.  (currently a warning in my parser -- asks you to add a blank line)

> >   xxx xx xx xxx         (one paragraph or a paragraph
> >   - xx x xxxxxx          followed by a list item?)
> 
> One paragraph, with warning.

Yes.  (currently a warning in my parser -- asks you to re-word wrap
the paragraph, or to separate & indent if you intended to start a list)

> >   - xx x x xxxx         (a list item with one para or with
> >     - x xxxxx x          one para and one sublist?)
> 
> One para, no sublist, with warning.

Yes.  (currently a warning in my parser -- asks you to re-word wrap
the paragraph, or to add a blank line if you intended to start a 
sublist)

> >   - xx xx x xxx         (one list item with a dash in its
> >   - x xx xx x x          para or two list items?)
> 
> Two list items (assuming the list was started properly, of course).

Yes.  And if the 2 bullets are of different types, it's a warning,
because lists should be separated by blank lines.

> > This also makes it clear whether:
> > 
> >   - xx xx x xxx
> > 
> >   xx x xx xxx x
> > 
> > is a list item with one para followed by a paragraph, or a list
> > item with two paragraphs.
> 
> The former.

Yes.

> > The only ambiguity that this dosn't deal with is the last one I
> > listed.
> 
> Which one was that? Unclear.

In theory, someone could read the following as a single list item,
even though our rules say its two::

    1. I like the number e.  This number is approximately equal to
    2.71828182846.  But it's irrational, so that's an approximation.

Similarly with something like::

    - To find the result, simply take C{x
    - y}.

or even::

    - I like numbers that are prime, like
    2.  I also like odd numbers.

But I would argue that these are so hard to read, that we can basically
ignore them..

Note that when I say "ambiguous," I don't mean ambiguous according to
the markup language rules.. I mean that it seems possible that someone
would read it one way or the other, given that they don't know the
rules of the markup language.  It's also related to the question of
whether it's possible to make 's word-wrapping
work properly with the formatted documentation strings.

> (ie, I agree; I *don't* disagree ;).
Good.  Does anyone else disagree, or can we tentatively move on? :)

-Edward



From edloper@gradient.cis.upenn.edu  Mon Apr 16 19:45:35 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Mon, 16 Apr 2001 14:45:35 EDT
Subject: [Doc-SIG] field syntax (was re: lists & blank lines)
In-Reply-To: Your message of "Mon, 16 Apr 2001 14:12:33 EDT."
 
Message-ID: <200104161845.f3GIjZp26051@gradient.cis.upenn.edu>

> [Edward Loper]
> >     @param x: ...
> 
> I'm not a big fan of the JavaDoc @ syntax, but I don't know of a better
> inline syntax for keyword-tagged values. (I did propose a [directive-based
> syntax]_; search for "keyword".)

As I've said before, I'm not terribly attatched to the JavaDoc syntax.
But it seems to me to make sense to handle field lists as follows:

    1. Each field begins with a "bullet", like "@returns" or "returns:"
       or ".. Keywords::" or whatever we agree on.  These should be
       recognizable using a regexp that doesn't depend on the actual
       words (i.e., *not* "@returns|@param|...", but "@\w+\b").
    2. Fields act just like list items.
    3. The field list must be the last thing in a docstring, and it
       must be separated by a blank line (unless it's the only thing
       in the docstring).  If you want, we could require that it be
       indented -- then, its syntax would be essentially identical to
       list syntax.

As I understand it, your "directive-based syntax" would mainly fit
this model.. Except that I require the contents of each directive
to be indented.  Note that you are not required to start a paragraph
on the line that a list bullet is on.. You can write list items 
like this if you want::
    1. 
       Paragraph one for list item (1).

       Paragraph two for list item (1).

The only other difference would be that, under my scheme, the contents
of a directive have to be properly formatted formatted text; where
under your scheme it seems like they can be anything.

As a side note, you called this "inline syntax," but I think of "inline" 
as being things that occur within a paragraph.. This is "structural 
syntax" in my mind.

The reason I'd support JavaDoc rather than something like ".." is
because it's no less readable, and it's already a somewhat established
conventions (there are a fair number of javadoc-clones out there for
other languages).  On the other hand, we might not want people to
get confused, and think that our markup language is the same as
javadoc's...  :)  Also, "@\w+" occurs very rarely under natural 
circumstances (although perhaps the same can be said of ".. \w+::".

> I propose that until a clearly superior
> syntax is discovered/revealed, we leave these out of the discussion
> (unnecessary complication).

This is a feature that I'm very interested in making sure that the
markup language includes.  As such, I'd like to keep it on the table,
even if it's off to the side. :)  (I see this feature as being more 
important than the ability to use lists or colorizing..)

-Edward



From dgoodger@atsautomation.com  Mon Apr 16 21:14:22 2001
From: dgoodger@atsautomation.com (Goodger, David)
Date: Mon, 16 Apr 2001 16:14:22 -0400
Subject: [Doc-SIG] field syntax (was re: lists & blank lines)
Message-ID: 

[Edward Loper]
>     3. The field list must be the last thing in a docstring

Why? What about PEPs?

> I require the contents of each directive
> to be indented.

Why? And what do you mean by "directive" here? (Note that my proposed
directive syntax was for arbitrary language extension. The keyword-tagged
values directive was just an example.)

> Note that you are not required to start a paragraph
> on the line that a list bullet is on.. You can write list items 
> like this if you want::
>     1. 
>        Paragraph one for list item (1).
> 
>        Paragraph two for list item (1).

I'm confused. So what? And why would we want this?
 
> The only other difference would be that, under my scheme, the contents
> of a directive have to be properly formatted formatted text; where
> under your scheme it seems like they can be anything.

Not "anything", but directive-dependent. In other words, for the keywords
example, given the directive::

    .. keywords::

The next lines are expected to be of the form "keyword: value". (Beyond
that, I didn't specify; it was only an example of what could be done.)

> As a side note, you called this "inline syntax," but I think 
> of "inline" 
> as being things that occur within a paragraph.. This is "structural 
> syntax" in my mind.

Sorry, my bad. I meant character-based syntax (in this case "@"-based), as
opposed to explicit directive-based.

> This is a feature that I'm very interested in making sure that the
> markup language includes.

Keyword-tagged values have been discussed in the past on Doc-SIG. If they're
that important to you, I'd suggest you go through the archives, list up all
proposed alternatives, analyze & summarize. Otherwise, history repeats.

> I see this feature as being more 
> important than the ability to use lists or colorizing..)

I don't. Everyone has their own agenda, their own priorities. Beware that
yours don't become a stumbling block for others' acceptance. :)

One problem with getting a Setext/StructuredText derivative to satisfy
everyone's needs is that the more characters we use as markup, the more
complex it becomes. Another is that the available characters are limited.
Are keyword-tagged values important enough to warrant the use of another
character for their syntax? Edward's answer is obviously "yes". Mine was
"no" (also because "@" isn't obvious/intuitive), and so I proposed a general
explicit solution to future extension.

/DG


From edloper@gradient.cis.upenn.edu  Mon Apr 16 21:35:33 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Mon, 16 Apr 2001 16:35:33 EDT
Subject: [Doc-SIG] field syntax (was re: lists & blank lines)
In-Reply-To: Your message of "Mon, 16 Apr 2001 16:14:22 EDT."
 
Message-ID: <200104162035.f3GKZYp06669@gradient.cis.upenn.edu>

> [Edward Loper]
> >     3. The field list must be the last thing in a docstring
> 
> Why? What about PEPs?

I've been trying to design a ML for formatted docstrings; if it works
for other domains, great, if not, too bad.  I don't want to impede
progress because we're trying to solve a more general problem than
we really need to.

> > I require the contents of each directive
> > to be indented.
> 
> Why? And what do you mean by "directive" here? (Note that my proposed
> directive syntax was for arbitrary language extension. The keyword-tagged
> values directive was just an example.)

Why: so we know when they end.

I think we may be talking about 2 different things here, though, and
that may be a stumbling block.  The functionality that I want is
basically what JavaDoc implements with "@tags".  I've used JavaDoc,
and other similar systems for other programming languages, and it
is *very* useful.  E.g., it lets you have a little "section" to describe
each parameter, the return value, etc.  I'm not sure that this is
the same thing that you're calling "keyword-tagged values."  Maybe
it is..  But I'd like to be able to say something semantically 
equivalant to::

   @param elt The initial element.
   @param n The size of the list.

in my docstrings for functions/methods.  Also, I'd like to be able to
include multiple paragraphs, lists, etc., in the description of a
parameter.

> > Note that you are not required to start a paragraph
> > on the line that a list bullet is on.. You can write list items 
> > like this if you want::
> >     1. 
> >        Paragraph one for list item (1).
> > 
> >        Paragraph two for list item (1).
> 
> I'm confused. So what? And why would we want this?

If it confuses you, ignore it; it's not really important.

> > The only other difference would be that, under my scheme, the contents
> > of a directive have to be properly formatted formatted text; where
> > under your scheme it seems like they can be anything.
> 
> Not "anything", but directive-dependent. 

Yes.  But from the parser's point of view, it can be anything, because
it doesn't know what extensions you'll be using.  Some later stage
(after the parser) will put restrictions on it..

> > This is a feature that I'm very interested in making sure that the
> > markup language includes.
> 
> Keyword-tagged values have been discussed in the past on Doc-SIG. If they're
> that important to you, I'd suggest you go through the archives, list up all
> proposed alternatives, analyze & summarize. Otherwise, history repeats.

I've been going through the archives, on and off, and haven't seen
that many *different* proposals that deal with what I'm trying to do..
But I guess I'll keep looking.  

> > I see this feature as being more 
> > important than the ability to use lists or colorizing..)
> 
> I don't. Everyone has their own agenda, their own priorities. Beware that
> yours don't become a stumbling block for others' acceptance. :)

Fine, as long as we all agree that our main agenda is to develop a
markup language for use with docstrings.

> One problem with getting a Setext/StructuredText derivative to satisfy
> everyone's needs is that the more characters we use as markup, the more
> complex it becomes. 

Which is one of the reasons I'm trying to get as much milage out of
indentation as I can.. :)

> Another is that the available characters are limited.

True.  Although, it's not a character we're taking away.  It's the
ability to start a paragraph with "@\w+\b".  Just like bullets, the @
will be treated as @ in any other circumstance.

> Are keyword-tagged values important enough to warrant the use of another
> character for their syntax? Edward's answer is obviously "yes". 

I believe that having keyword-tagged values, or whatever we want to
call them, is worth removing the ablity to start paragraphs with
"@\w+\b".

> Mine was "no" (also because "@" isn't obvious/intuitive)

The obvious/intuitive reason seems better to me, although I don't see
starting paragraphs with ".." as being any more intuitive.. The problem
is that if you use something intuitive, like::

    author: Edward Loper
    param size: The radius of the planet, in miles

Then you're much more likely to prevent people from saying things they
want to say, like::

    However: ...

> and so I proposed a general explicit solution to future extension.

Which may be a good thing (although I would argue that directives should
end on a return to the indentation that introduced them, esp. since this
is consistant with the other use of "::")..  

But I think that these "keyword-tagged values" are central enough 
to the task of writing docstrings (especially for functions and 
methods; but also for describing class variables, etc) that they 
can be given their own syntax.. (Well, not quite their own -- 
they're really just lists with funny looking bullets that must 
appear at the top level and at the end of the docstring)

-Edward



From dgoodger@atsautomation.com  Mon Apr 16 21:48:18 2001
From: dgoodger@atsautomation.com (Goodger, David)
Date: Mon, 16 Apr 2001 16:48:18 -0400
Subject: [Doc-SIG] field syntax (was re: lists & blank lines)
Message-ID: 

[Edward Loper, referring to fields]
> ... and at the end of the docstring)

Again, why? Why restrict fields to the end of a docstring? Seems artificial
to me.

/DG


From dgoodger@atsautomation.com  Mon Apr 16 22:00:32 2001
From: dgoodger@atsautomation.com (Goodger, David)
Date: Mon, 16 Apr 2001 17:00:32 -0400
Subject: [Doc-SIG] lists & blank lines (was re: backslashing)
Message-ID: 

[Edward D. Loper]
> In theory, someone could read the following as a single list item,
> even though our rules say its two::
> 
>     1. I like the number e.  This number is approximately equal to
>     2.71828182846.  But it's irrational, so that's an approximation.

I'd say this is just another example of:

> > >   - xxxx x xxxx         (one list item or a list item
> > >   xx xx x xxxxx          followed by a paragraph?)
> > 
> > Item followed by paragraph, with warning. Or error.
> 
> Yes.  (currently a warning in my parser -- asks you to add a 
> blank line)

As is:

>     - I like numbers that are prime, like
>     2.  I also like odd numbers.

This one is two bulleted items:

>     - To find the result, simply take C{x
>     - y}.

(Unless the C{} syntax is used, in which case it's a single malformed item
[second line should be indented] or an item followed by a paragraph [should
be a blank line, and "C{x" should trigger an error]. In any case, it
warrants a warning.)

> Note that when I say "ambiguous," I don't mean ambiguous according to
> the markup language rules.. I mean that it seems possible that someone
> would read it one way or the other, given that they don't know the
> rules of the markup language.

Humans can parse text much more flexibly than software. Make the software
(markup rules) quite strict, so that a text passing through the software
without errors or warnings has no chance for ambiguity at the human-level.
The best you can do is make the software say, "I don't understand what you
mean here." Timbot's rule 12: "In the face of ambiguity, refuse the
temptation to guess."

/DG


From edloper@gradient.cis.upenn.edu  Mon Apr 16 22:23:19 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Mon, 16 Apr 2001 17:23:19 EDT
Subject: [Doc-SIG] lists & blank lines (was re: backslashing)
In-Reply-To: Your message of "Mon, 16 Apr 2001 17:00:32 EDT."
 
Message-ID: <200104162123.f3GLNJp12024@gradient.cis.upenn.edu>

> [Edward D. Loper]
> > In theory, someone could read the following as a single list item,
> > even though our rules say its two::
> >
> >     1. I like the number e.  This number is approximately equal to
> >     2.71828182846.  But it's irrational, so that's an approximation.
> 
> I'd say this is just another example of:
> 
> > > >   - xxxx x xxxx         (one list item or a list item
> > > >   xx xx x xxxxx          followed by a paragraph?)
> > >
> > > Item followed by paragraph, with warning. Or error.
> 
> As is:
> 
> > >     - I like numbers that are prime, like
> > >     2.  I also like odd numbers.

But there's an important difference here.  A parser will give a
warning for the second and third examples, but won't for the
first example.  I would prefer to be able to say "if something
might be ambiguous to people, then we either issue a warning
or an error."  But in the example about liking e, that rule
doesn't hold.

> >     - To find the result, simply take C{x
> >     - y}.
> 
> (Unless the C{} syntax is used, in which case it's a single
> malformed item [second line should be indented] or an item followed
> by a paragraph [should be a blank line, and "C{x" should trigger an
> error]. In any case, it warrants a warning.)

The intention was that C{..} was used to stand for whatever colorizing
we decide we like.  In that case, I agree that it should be 2 errors
(mismatched delimiters) and possibly a warning.

> Humans can parse text much more flexibly than software. Make the
> software (markup rules) quite strict, so that a text passing through
> the software without errors or warnings has no chance for ambiguity
> at the human-level.  The best you can do is make the software say,
> "I don't understand what you mean here." Timbot's rule 12: "In the
> face of ambiguity, refuse the temptation to guess."

That's been my goal so far.  But the problem is deciding what's
ambiguous...

-Edward


From edloper@gradient.cis.upenn.edu  Mon Apr 16 22:26:20 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Mon, 16 Apr 2001 17:26:20 EDT
Subject: [Doc-SIG] field syntax (was re: lists & blank lines)
In-Reply-To: Your message of "Mon, 16 Apr 2001 16:48:18 EDT."
 
Message-ID: <200104162126.f3GLQKp12374@gradient.cis.upenn.edu>

> [Edward Loper, referring to fields]
> > ... and at the end of the docstring)
> 
> Again, why? Why restrict fields to the end of a docstring? Seems
> artificial to me.

It is somewhat artificial.  The reasoning was as follows: the 
position of the fields does not convey any semantic information;
tools are likely to disregard the position when formatting
their output.  If we let people put them wherever they want in
the docstring, then they may assume that they will appear in
that position in the output of doc formatting tools (i.e., that
their position *does* convey semantic information).  This is
dangerous, and should be stamped out. :)  So, put all fields
at the end, so no one will get confused.

-Edward


From dgoodger@atsautomation.com  Mon Apr 16 23:23:35 2001
From: dgoodger@atsautomation.com (Goodger, David)
Date: Mon, 16 Apr 2001 18:23:35 -0400
Subject: [Doc-SIG] lists & blank lines (was re: backslashing)
Message-ID: 

[Edward D. Loper]
> > > In theory, someone could read the following as a single list item,
> > > even though our rules say its two::
> > >
> > >     1. I like the number e.  This number is approximately equal to
> > >     2.71828182846.  But it's irrational, so that's an 
> approximation.
> > 
> > I'd say this is just another example of:
> > 
> > > > >   - xxxx x xxxx         (one list item or a list item
> > > > >   xx xx x xxxxx          followed by a paragraph?)
> > > >
> > > > Item followed by paragraph, with warning. Or error.
> 
> But there's an important difference here.  A parser will give a
> warning for the second and third examples, but won't for the
> first example.  I would prefer to be able to say "if something
> might be ambiguous to people, then we either issue a warning
> or an error."  But in the example about liking e, that rule
> doesn't hold.

Sure it does. It's an enumerated list item ("1.") followed by an unindented
line, therefore another paragraph not part of the first item (this should
trigger a warning unless it's another item in the same list). The second
line is not an enumerated list item, since:

(a) the label isn't of a standard pattern the same as the first item 
    ("\d+\. "; no space after the "2."; I don't think we should allow
    floating-point enumerators, hm? :); 
(b) the label isn't sequential with the first item's label 
    (1 + 1 != 2.718...); 
(c) if we permit nested lists through compound enumerators, sublists must
    start with "1" or equivalent, and this one doesn't.

Convinced? If not, why *would* the parser pass the second line through
unchallenged? Please show your work ;-)

/DG


From dgoodger@atsautomation.com  Mon Apr 16 23:32:02 2001
From: dgoodger@atsautomation.com (Goodger, David)
Date: Mon, 16 Apr 2001 18:32:02 -0400
Subject: [Doc-SIG] field syntax (was re: lists & blank lines)
Message-ID: 

We've monopolized Doc-SIG all day, might as well continue...

[Edward Loper, referring to fields]
> > > ... and at the end of the docstring)
> > 
> > Again, why? Why restrict fields to the end of a docstring? Seems
> > artificial to me.
> 
> It is somewhat artificial.  The reasoning was as follows: the 
> position of the fields does not convey any semantic information;
> tools are likely to disregard the position when formatting
> their output.  If we let people put them wherever they want in
> the docstring, then they may assume that they will appear in
> that position in the output of doc formatting tools (i.e., that
> their position *does* convey semantic information).  This is
> dangerous, and should be stamped out. :)  So, put all fields
> at the end, so no one will get confused.

I don't see why your first assumption should hold true. It is the foundation
of the rest of your argument.

I think you need to define your concept of "fields" better for us here on
the SIG (note: assume no previous knowledge of JavaDoc). Give a detailed
example. Why isn't position significant? What about field order? Sounds like
you're describing a dictionary-like structure associated with each
docstring. Can a field be used more than once, or must each field be unique
per docstring?

/DG


From dgoodger@atsautomation.com  Mon Apr 16 23:41:41 2001
From: dgoodger@atsautomation.com (Goodger, David)
Date: Mon, 16 Apr 2001 18:41:41 -0400
Subject: [Doc-SIG] backslashing
Message-ID: 

[Edward D. Loper]
> >> If you used E and E or E{lb} and E{rb} or 
> something like that,
> >> then regexps would generally look how they're supposed to (at least
> >> when you print them).
> > 
> > So would *any other* convention -- when you print them. The point
> > is, what do they look like when you read them?
> 
> Not when you print them with "print foo.__doc__"; only when you use
> some tool to interpret them and print them..

Not following you. This argues in favour of plaintext-transparent markup
like backquotes, not E etc. Perhaps some examples of what you mean?

> Once you introduce "\" as an escape character, though, all sorts of
> "\"s now need to be escaped..  And I don't really like the convention
> of keeping "\"s if they appear before something that doesn't require
> escaping.. It taxes my brain too much.

Please show (with examples requiring internal escaping) the alternatives.

> I guess I'm trying to go on the principle of keeping the need to
> escape characters to a minimum, because whatever escaping mechanism we
> have, it'll be somewhat ugly/difficult to read.

I think that's inevitable. Please prove me wrong!

However, although I'm sure an escape mechanism is needed, I'm also sure it
will only rarely be needed.

/DG


From edloper@gradient.cis.upenn.edu  Mon Apr 16 23:50:25 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Mon, 16 Apr 2001 18:50:25 EDT
Subject: [Doc-SIG] field syntax (was re: lists & blank lines)
In-Reply-To: Your message of "Mon, 16 Apr 2001 18:32:02 EDT."
 
Message-ID: <200104162250.f3GMoPp19516@gradient.cis.upenn.edu>

> > >     1. I like the number e.  This number is approximately equal to
> > >     2.71828182846.  But it's irrational, so that's an

> (a) the label isn't of a standard pattern the same as the first item
>     ("\d+\. "; no space after the "2."; I don't think we should allow
>     floating-point enumerators, hm? :);
> (b) the label isn't sequential with the first item's label
>     (1 + 1 != 2.718...);
> (c) if we permit nested lists through compound enumerators, sublists must
>     start with "1" or equivalent, and this one doesn't.
> 
> Convinced? If not, why *would* the parser pass the second line through
> unchallenged? Please show your work ;-)

Sorry, you're right, I wasn't being explicit enough.  I was
assuming that ordered list bullets were "(\d+\.)+", because that's
what we decided last time around the loop..  The idea was that
people might want to say "2.1." or something.  But I don't have
any problem with restricting ordered list bullets to "\d+\.".

But the problem still exists, albeit in a more rare form:

  1. I like the number 3.  It comes right after the number
  2.  It comes right before the number 4.

But I think that really we agree.  I'm just saying that *in 
principle* it's ambiguous to a reader, but that any sane reader
would complain about it anyway..  so we can ignore that
ambiguity.

On a side note, I'm not sure whether we should enforce (b) and
(c).  I guess my gut instinct would be to generate a warning
for them, but not an error..  They prevent people from having
an enumerated list that's intersperced with text (e.g., 
that's normally done in math papers with the math formulas..).
I guess that's not a great loss, though, in the context of
writing docstrings.

-Edward


From edloper@gradient.cis.upenn.edu  Tue Apr 17 17:27:28 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Tue, 17 Apr 2001 12:27:28 EDT
Subject: [Doc-SIG] field syntax (was re: lists & blank lines)
In-Reply-To: Your message of "Mon, 16 Apr 2001 18:32:02 EDT."
 
Message-ID: <200104171627.f3HGRSp28953@gradient.cis.upenn.edu>

> I think you need to define your concept of "fields" better for us
> here on the SIG (note: assume no previous knowledge of
> JavaDoc). Give a detailed example. Why isn't position significant?
> What about field order? Sounds like you're describing a
> dictionary-like structure associated with each docstring. Can a
> field be used more than once, or must each field be unique per
> docstring?

Sorry, you're right.  If you have time, you can look at the JavaDoc
home page:

  

or at a sample of the output of JavaDoc:

  

>From the JavaDoc page:

    A doc comment is made up of two parts -- a description followed by
    zero or more tags, with a blank line (containing a single asterisk
    "*") between these two sections:

    /** 
     * This is the description part of a doc comment
     *
     * @tag    Comment for the tag
     */

The first part describes the object being documented; the second part
essentially sets up a multi-map from keys to formatted doc strings.

  - Certain tags are paramatrized, such as "@param", which takes a
    parameter, and gives a description of it.
  - Some tags can be repeated (e.g., "@see"); others can't (e.g.,
    you can't have 2 "@param"'s with the same parameter.
  - It is assumed that when these "fields" (=tag+value) are output,
    they will be put in special sections.  (see thesample output
    of JavaDoc).

An example of a formatted doc string with a field (from the formatted
doc string parser I've been writing) is::

    def _tokenize_literal(lines, start, block_indent, tokens, warnings):
        """
        Construct a C{Token} containing the literal block starting at
        C{lines[start]}, and append it to C{tokens}.  C{block_indent}
        should be the indentation of the literal block.  Any warnings
        generated while tokenizing the literal block will be appended to
        C{warnings}.

        @param lines: The list of lines to be tokenized.
        @param start: The index into C{lines} of the first line of the
            literal block to be tokenized.
        @param block_indent: The indentation of C{lines[start]}.  This is
            the indentation of the literal block.
        @param warnings: A list of the warnings generated by parsing.  
            Any new warnings generated while tokenizing this literal
            block will be appended to this list.
        @return: The line number of the first line following the literal
            block.

        @type lines: C{list} of C{string}
        @type start: C{int}
        @type block_indent: C{int}
        @type warnings: C{list} of C{ParseError}
        @rtype: C{int}
        """

It doesn't matter to me what syntax we use.  Another alternative
that's been suggested is to do something like::
        ...
        Arguments:
            lines -- The list of lines to be tokenized.
            start -- The index into C{lines} of the first line of the
                literal block to be tokenized.
            block_indent -- The indentation of C{lines[start]}.  This
                is the indentation of the literal block.
            warnings -- A list of the warnings generated by parsing.
                Any new warnings generated while tokenizing this
                literal block will be appended to this list.
            return -- The line number of the first line following the
                literal block.
        ...

But semantically, the idea is to associate a description with each of
a number of pre-defined entities, such as the parameters of a method.
Tags defined by Javadoc are:
    @see          (a single see-also link; can repeat)
    @author       (an author; can repeat)
    @version      (the object's version)
    @param        (a function/method param; takes an argument)
    @return       (the return value of a function/method)
    @exception    (a description of an exception that a function/
                   method can raise; takes an argument (the exception))
    @since        (minimum version needed to use it)
    @depreciated  (object is depreciated; description of why)

I think there are a few more, but that's probably a representative
sample..

I find that the output you can produce with fields is easier to
read/use than the output you can produce without them.  (See the HTML
and LaTeX versions of the Java library API)..  

Of course, we don't really *need* them.  In my mind, the only
necessary features for a formatted docstring language are:
  - paragraphs
  - literal blocks
  - maybe doctest blocks

But I'd like to see them included.  Of course, you don't have to use
them if you don't want to.  But I think that most people will find
them useful if they try using them..

-Edward


From edloper@gradient.cis.upenn.edu  Tue Apr 17 21:04:29 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Tue, 17 Apr 2001 16:04:29 EDT
Subject: [Doc-SIG] backslashing
In-Reply-To: Your message of "Mon, 16 Apr 2001 18:41:41 EDT."
 
Message-ID: <200104172004.f3HK4Tp20629@gradient.cis.upenn.edu>

Basically what I'm trying to avoid here is having the escaping
mechanism itself be responsible for most of the cases where we need
to use escaping.  I would argue that with the following rules, you
almost never need escaping.  Of course, this is only relevant if
we end up using colorizing like C{this}; if we decide on some other
colorizing mechanism, then the following is moot..
    1. All curley braces ({}) *must* be properly nested
    2. If an open curly brace is preceeded by a capital letter,
       then it and its matching brace signify colorizing.
    3. If an open curly brace is not preceeded by a capital letter,
       then it and its matching brace should be rendered as braces.

Given these rules, when do we need to do escaping?
    1. If we want to use unmatched curly braces.  Generally this is
       only true when we're talking about the braces themselves, not
       when we're using them to talk about something else (e.g.,
       using them to write a Python dictionary).
    2. If we want to preceed an open brace by a capital letter.  I
       can't think of any case where this would be necessary, other
       than when you're talking about the markup language itself,
       or something similar?

How can we escape in these situations?
    1. By putting the entity in a literal block.  This seems to me
       more applicable to (2) above than to (1).
    2. By using an escape like E{lb}.

How do we evaluate whether this is a good solution?
    1. How ugly is it to do escaping?
    2. How often do we need to do escaping?

I would argue that this solution has a relatively high value for (1)
(higher than backslashing, anyway), but a very low value for (2).  In
particular, I was unable to find *any* occurances in the docstrings
in the standard library of characters that needed to be escaped..

Now compare to backslashes.  There are 2 possible ways to do 
backslashing:
    1. Preceeding anything by a backslash escapes it.
    2. Preceeding any escapable character by a backslash escapes it.
       Preceeding anything else by a backslash gives a literal
       backslash (similar to Python's way of doing things).  Note,
       however, that the stated reasons for Python doing it that
       way have to do with making it easier to see mistakes in your
       strings.

My problem with (2) is that it taxes my brain to remember which
characters I can put one backslash before, and which I have to 
put two backslashes before.  But in any case, for both (1) and (2),
"\\" translates to a single backslash.  I believe (though I haven't
yet had time to check) that "\\" does occur in docstrings.  It
certainly wouldn't be that uncommon if one wanted to talk about
regexps, or to use regexps to talk about something else, for example.

So now, we evaluate backslashing as an escaping solution, using the
same criteria:
    1. How ugly is it to do escaping?
    2. How often do we need to do escaping?

I believe that backslashing has a lower value for (1), but a 
higher value for (2).

So how do we decide?  Well, not it's objective, because we need to
decide how much we care about each criteria, how *much* higher/lower
we think backslashing scores, etc.  But in my opinion, I'd rather
use the {} solution..

But again, as I said, all this is moot if we're not using {} to do
colorizing.

> > I guess I'm trying to go on the principle of keeping the need to
> > escape characters to a minimum, because whatever escaping mechanism we
> > have, it'll be somewhat ugly/difficult to read.
> 
> I think that's inevitable. Please prove me wrong!

I don't know whether this was a coinvincing enough case, but I tried
to show that escaping is needed *less* often with the E{} approach
than with backslashing..

> However, although I'm sure an escape mechanism is needed, I'm also sure it
> will only rarely be needed.

I agree, which is why I want to be wary about the escape mechanism itself
becoming the major reason for using the escape mechanism (backslashing
backslashes, etc).

-Edward



From dgoodger@atsautomation.com  Wed Apr 18 16:42:33 2001
From: dgoodger@atsautomation.com (Goodger, David)
Date: Wed, 18 Apr 2001 11:42:33 -0400
Subject: [Doc-SIG] directives and fields
Message-ID: 

[Edward D. Loper]
> > > The only other difference would be that, under my scheme, 
> the contents
> > > of a directive have to be properly formatted formatted text; where
> > > under your scheme it seems like they can be anything.
> > 
> > Not "anything", but directive-dependent. 
> 
> Yes.  But from the parser's point of view, it can be anything, because
> it doesn't know what extensions you'll be using.  Some later stage
> (after the parser) will put restrictions on it..

Not true. I'd like to clear up this concept of directives. It's completely
different from your proposed field concept, though not necessarily
incompatible.

Directives are a parser-control mechanism and can be used as an extension
mechanism. The reStructuredText directive proposal is similar to extension
modules in Python. Inevitably, someone will want to add a feature or some
behaviour to the reStructuredText parser which cannot be easily added
through character-construct syntax, because:

1. There's no natural or obvious candidate characters or constructs
   for syntax.
2. We've run out of characters to use as syntax.
3. The new feature or behaviour is too narrowly application- or 
   domain-dependent.
4. The new feature or behaviour cannot be added to the standard 
   due to lack of consensus (basically the same as case 3).

With one construct (regexp '^\.\. ', which comes from Setext) we have
comments, internal hyperlink targets, external URL hyperlinks, footnotes,
and directives. Directives were proposed as a mechanism for adding explicit
syntax that the parser can recognize, triggering parser extension code.

Say we add an 'SQL' extension to the parser, which performs a database query
and inserts the results. The extension would consist of an entry in the
directives dispatch table and support code to handle the query itself. This
code would be run by the parser as it is parsing, not afterward. The
semantics of the extension construct are up to the extension, but they could
easily include the processing of properly formatted text. For example, we
could add a set of admonition extensions::

    .. warning::

        Don't *ever* press the `Self-Destruct` button.
        If you do, you'll be sorry.

The 'warning' extension would tell the parser to process its text block as
usual, and simply wrap it in a new DOM object (hypothetically). The
*emphasis* and `literals` would be processed as usual.

Your field concept could be implemented using the '@' syntax as proposed, or
using the extension mechanism. If it's important enough, *and* the syntax is
natural enough, using the JavaDoc '@' syntax is no problem. The '@' syntax
doesn't strike me as natural though.

The cornerstone of the Setext/StructuredText-like approach is that the raw
text should be as readable as possible, even to the uninitiated. To quote
Jim Fulton's StructuredTextWiki,

    If you don't buy into this idea, you're probably wasting your time.

I think that '@' and especially 'C<>' stray from this ideal. I don't think
they belong in a Setext/StructuredText-like markup language. (Note: I'm not
saying you're wasting your time, Edward; far from it, these discussions have
been very helpful in many ways.)

/DG


From edloper@gradient.cis.upenn.edu  Wed Apr 18 18:24:30 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Wed, 18 Apr 2001 13:24:30 EDT
Subject: [Doc-SIG] Re: directives and fields
In-Reply-To: Your message of "Wed, 18 Apr 2001 11:42:33 EDT."
 
Message-ID: <200104181724.f3IHOUp28200@gradient.cis.upenn.edu>

> > Yes.  But from the parser's point of view, it can be anything, because
> > it doesn't know what extensions you'll be using.  Some later stage
> > (after the parser) will put restrictions on it..
> 
> Not true. I'd like to clear up this concept of directives. It's completely
> different from your proposed field concept, though not necessarily
> incompatible.

Well at least there should be rules in the "generic parser" that say
when directives end, so that a parser can ignore a directive if it
doesn't understand it.  As I understood your original proposal,
directives ended with blank lines.  I think that they should end with
a dedent back to the indent they started at, because then they can
include blank lines..

And I think that it should be *possible* to handle directives in a
second pass.  I.e., I don't think we should have any directives that
change the syntax of subsequent parts of the string, like::

   This is *emph*

   .. switch-emph-and-literal

   This is *literal*

Basically, it seems like you should be able to make a "generic" parser
which outputs a DOM tree for the formatted docstring, with "directive"
elements containing #CDATA (=character data, i.e., a string) like::

     ... 

Then a specialized parser could run the generic parser, and then
replace all the directive elements with some other elements..

> Inevitably, someone will want to add a feature or some behaviour to
> the reStructuredText parser which cannot be easily added through
> character-construct syntax, because:

I think we should -try- to keep feature-adding to a minimum, because
it tends to result in incompatibilities..  But that said, it does make
sense to me to have a generic extention mechanism, as long as we keep
in mind that we should be careful about not over-using it.  Also,
people adding new directives should keep in mind that "raw text should
be as readable as possible." (or whatever variant of that we decide we
like; see below).

I saw fields as being an extension mechanism, but a *much* more
constrained one than directives.  I think it makes sense to put *some*
constraints on directives (e.g., that they don't affect anything
outside themselves).  But maybe just using fields places too many
constraints.

> 1. There's no natural or obvious candidate characters or constructs
>    for syntax.
> 2. We've run out of characters to use as syntax.
> 3. The new feature or behaviour is too narrowly application- or
>    domain-dependent.

The only domain I care about is formatted docstrings.  Given, there
are subdomains of formatted docstrings (some types of
programs/programming style will make use of some features, others
not).  But I'm not sure that they vary enough that we want a nearly
arbitrarily powerful extension mechanism..

As for running out of characters to use as syntax, that's one of the
reasons I don't like *colorizing* `like this`...  

> With one construct (regexp '^\.\. ', which comes from Setext) we
> have comments, internal hyperlink targets, external URL hyperlinks,
> footnotes, and directives. Directives were proposed as a mechanism
> for adding explicit syntax that the parser can recognize, triggering
> parser extension code.

I think that my target is a much more lightweight markup language than
you're talking about.. or at least less powerful.  I really don't see
the need for most of those things in docstrings.

> Say we add an 'SQL' extension to the parser, which performs a
> database query and inserts the results.

Wouldn't this totally violate making the docstring readable?  And when
would you ever want to use this when writing a docstring??

>    .. warning::
> 
>        Don't *ever* press the `Self-Destruct` button.
>        If you do, you'll be sorry.

This could be implemented as a field.  I think that external URL
hyperlinks should be implemented with colorizing, if at all.  I don't
think that internal hyperlink targets make sense for docstrings.  I
don't think that comments are necessary for docstrings.  If you really
want, you can include a Python comment before or after the docstring.
Alternatively, comments could be done via colorizing..  

> Your field concept could be implemented using the '@' syntax as
> proposed, or using the extension mechanism. If it's important
> enough, *and* the syntax is natural enough, using the JavaDoc '@'
> syntax is no problem. The '@' syntax doesn't strike me as natural
> though.

I agree that the "@" syntax isn't very natural (except for the extent
to which it's natural simply because it's an established convention;
similar to the way that "\" is a "natural" way to escape a character).
I'd be just as happy writing fields like::

  .. param size: The number of elements in the list.

or::

  .. parameters::
     size: The number of elements in the list.

Although that seems no less readable to me than "@".  But I question
whether we want/need something as powerful as directives...

> The cornerstone of the Setext/StructuredText-like approach is that
> the raw text should be as readable as possible, even to the
> uninitiated.

I don't see how directives win here.  If anything, it seems like they
will make it harder to read by the uninitiated, given the power of
directives to use almost arbitrary syntax..

However, the idea that "raw text should be as readable as possible,
even to the uninitiated" is a *goal* of mine, but not a cornerstone.
Perhaps a cornerstone would be::

  Raw text should be readable, even by the uninitiated.

There are a lot of conflicting goals in designing a markup language,
and making it as readable as possible is by no means my most
fundamental goal.  In the case of colorizing, I believe that
colorizing should *never* be necessary to the understanding of a
docstring.. i.e., you should be able to strip away all colorizing, and
still understand what it says.  I think that the uninitiated will be
able to do that (and indeed I think it would be their first instinct).
When I first read perldoc comments, I didn't know what the C<..>s 
meant, but I ignored them, and was able to read the comments with
no trouble (well, the =.. directives were a bit confusing).

I guess that perhaps what it comes down to is that I am *not*
necessarily trying to design a Setext/StructuredText-like language.
I'm trying to design a markup language that is optimal for writing
Python docstrings.  

The problem with colorizing like *this* is that there are very few
conventions about what such colorizing means.  Indeed, I'd say that
*emph*, _underline_, and "quoting" 'of' `some' `sort` are the only
contentional ways of colorizing (well, maybe angle braces for ).
And none of the quoting mechanisms have conventional "colors"
associated with them.  In my mind, the only advantage of using
`quotes` over C{curly braces} is that quotes are easier to ignore..
In both cases, the uninitiated will (maybe) know that the region is
"colorized" in some way, but not what way it's colorized in.

-Edward


From mwh21@cam.ac.uk  Wed Apr 18 19:06:22 2001
From: mwh21@cam.ac.uk (Michael Hudson)
Date: 18 Apr 2001 19:06:22 +0100
Subject: [Doc-SIG] Where to find the docs
In-Reply-To: Ka-Ping Yee's message of "Sun, 15 Apr 2001 01:42:05 -0500 (CDT)"
References: 
Message-ID: 

Since noone else has responded...

Ka-Ping Yee  writes:

> Is there a standard place to look?

http://www.python.org/doc/current ?  Works for me, though those
without permanent 'net connections may feel differently.

More seriously: no.  I haven't got built html docs anywhere on my
system at the moment; if I did they'd be in
/usr/local/src/python/dist/src/Doc/html/, which is hardly canonical.
I doubt you can come up with sufficiently clever heuristics to get all
cases - the one's you posted sounded reasonable.  You could always
fall back to the python.org URLs...

Cheers,
M.

-- 
  M-x psych[TAB][RETURN]
                                                             -- try it



From edloper@gradient.cis.upenn.edu  Wed Apr 18 20:30:00 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Wed, 18 Apr 2001 15:30:00 EDT
Subject: [Doc-SIG] Structuring: a summary; and an attempt at EBNF..
Message-ID: <200104181930.f3IJU0p10299@gradient.cis.upenn.edu>

I figured I'd give a summary of all the structuring features that
I think we've agreed on, so we can tentatively take those as
a given.  If anyone objects, please say so..

  1. Paragraphs are left-justified and separated by blank lines.

  2. Literal blocks start with a paragraph that ends with "::"
     and continue to the next line whose indentation is equal to or
     less than that of the paragraph that started them.  Literal
     blocks should be indented and separated by blank lines.

  3. Doctest blocks start with ">>> " and continue to the next blank
     line.  Doctest blocks should be indented and separated by blank
     lines. 

  4. Lists should be indented and separated by blank lines.  List
     items within a list don't need to be separated by blank lines.
     List items start with bullets, which are either "-" or a single
     number followed by a period, like "1." or "12.".

  5. The second and subsequent lines of a list item are indented.
     This includes list items with multiple paragraphs, sublists, etc.

  6. Sections begin with headings, which are underlined with "=", "-", 
     or "~" (for level 1, 2, or 3 headings, respectively).

  7. Colorizing takes place entirely within paragraphs, and does not
     interact with structuring.

In my mind, the major questions left to resolve are:
  1. how to do colorizing?  Two main proposals: like C{this} and like
     `this`/*this*. 
  2. how to do escaping?
  3. do we need any other structuring constructs (e.g., fields,
     directives, footnotes, etc)?  If so, which ones, and how
     should we add them?

=====

Below is my first attempt at an EBNF-like formalism for these rules.
You should probably pay more attention to the "one-minute summary"
above than to the rules below -- I almost certainly didn't get
the rules below quite right (although if you want to point out
ways that I got it wrong, please do! :) ).

IND and DED are indent and dedent (by a sinlge space); I use 
the notation IND[n] to mean n IND tokens.  Note that the rule::

   x = a IND[n] b DED[n] c

is really just shorthand for::

   x = a y c
   y = IND y DED | b

However, I also use the foo[n] notation in one place where it can't
be simplified.  That's because in list items like:

   - this is a list
     item.

     Here's a second paragraph.

there are crossing dependancies.  In particular, the IND/DED need
to match up, but assuming that we want "this is a list item" to 
be result of a "paragraph" production, they can't.  Don't worry
if you don't understand what I just said, I think it should still
be relatively easy to understand the EBNF below.

I assume that, as part of the preprocessing, all indents/dedents
have been changed to IND/DED tokens.  This process ignores blank
lines, which are simply reduced to be empty.

================================================================
The top-level production::
 #  pytext = (BlankLine NL)*
 #           IND[n]
 #             (Para | List | Section | DocTestBlk)
 #             ((COLON COLON NL LitBlk) |
 #              (NL BlankLine NL (Para | List | Section | DocTestBlk)))*
 #           DED[n]
 #           (NL BlankLine)*

(pytext is just a convenient name, we'll probably want another) This
production assumes that the first-line-might-not-be-indented problem
has already been taken care of.  It says that a formatted docstring
consists of any number of blank lines, followed by an indented section
containing at least one paragraph, list, section, or doctest block,
followed by zero or more literal blocks, paragraphs, lists, sections,
or doctest blocks..  And there can be extra blank lines at the end.
The productions "Para", "List", "Section", etc. generally do *not*
include thier trailing NL, because that makes it easier to detect
paragraphs that end with COLON COLON.

Some useful types of lines are:
  - BlankLine: consists only of spaces.
  - TextLine: non-blank line.
    - StartLine: doesn't start with a Python prompt or a bullet
    - ContLine: anything
    - EndLine: doesn't end with "::"; doesn't include trailing spaces?
    - StartEndLine: doesn't start with PyPrompt or Bullet, and 
                    doesn't end with "::".

We can define them as::
 # BlankLine = (empty)
 # TextLine = [^ NL IND DED]+
 # StartLine = (?! PyPrompt | Bullet) TextLine
 # EndLine = [^ NL IND DED]* [^ NL IND DED COLON] [^ NL IND DED] |
 #           [^ NL IND DED]* [^ NL IND DED] [^ NL IND DED COLON] |
 #           [^ NL IND DED]
 # StartEndLine = (?! PyPrompt | Bullet) EndLine

As I said above, paragraphs don't include the trailing newline.
Paragraphs ending in "::" don't include the "::".::
 # SimplePara = StartLine (NL ContLine)* EndLine |
 #              StartEndLine

Lists are indented (n>1)::
 # List = IND[n] LI (BlankLine+ LI)* DED[n]

We need special list-starting paragraphs.  These don't include
trailing newlines, either::
 # LS_IndPara[n] = ContLine NL IND[n] ContLine (NL ContLine)*
 # LS_OneLinePara = EndLine

There are 3 types of list item::
 # LI = LI1 | LI2 | LI3

This production gives the contents of a list item, *after* its first
paragraph::
 # LI_Rest = ((COLON COLON NL LitBlk) |
 #            (NL BlankLine NL (Para | List | DocTestBlk)))+

List Item, form 1: start with a one-line pagraph, then indentation,
contents, and corresponding dedents.  The indentation/contents/dedent
is optional, so this also covers list items with just a one-line para
(no indent)::
 # LI1 = Bullet LS_OneLinePara
 #         (IND[n]
 #            (BlankLine+ (Para | List | DocTestBlock | LitBlk))+
 #          DED[n])?

List Item, form 2: start with a paragrpah containing indentation, then 
contents, then corresponding dedent::
 # LI2 = Bullet IndPara[n]
 #              (BlankLine+ (Para | List | DocTestBlock | LitBlk))+
 #              DED[n]

List Item, form 3: this is used when the bullet's on a line by itself::
 # LI3 = Bullet NL
 #         (IND[n]
 #            (BlankLine+ (Para | List | DocTestBlock | LitBlk))+
 #          DED[n])?

Sections consist of a heading, followed by an indended section
that can contain anything (i.e., epytext)::
 # Section = Heading NL epytext

DocTestBlocks are terminated by blank lines.  They must be indented::
 # DocTestBlk = IND[n] PyPrompt (ContLine NL)+ DED[n]

Literal blocks.  Within the literal block, all indents/dedents must be 
matched::
 # LitBlk = IND LitBlkContents DED
 # LitBlkContents = [^ IND DED]+ | IND LitBlkContents DED

================================================================
Anyway, I'm sure I didn't get that quite right, but it's a
start, anyway.

-Edward


From tim.one@home.com  Thu Apr 19 01:08:40 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 18 Apr 2001 20:08:40 -0400
Subject: [Doc-SIG] __all__, and how it relates to doc tools
In-Reply-To: <200104151312.IAA08960@cj20424-a.reston1.va.home.com>
Message-ID: 

[Edward]
> Would it also be reasonable for a doc tool to look at this value, for
> an indication of which objects to document?

[Tim]
> Absolutely.  In fact, that's probably the best use.

[Guido]
> Hm.  You may be right, but Ping told me that he had tried this in
> pyoc, and was unhapy with the result: too much stuff didn't get
> documented.  So we should at least be willing to retract this idea.

Well, every time you or I test pydoc under Windows, the first thing we do is
type "random" at it.  Because "_" appears early in the alphabet, the first
four methods it displays are:

_Random__whseed
__getstate__
__init__
__setstate__

The first 8 functions:

_acos
_cos
_exp
_log
_sin
_sqrt
_test
_test_generator

and then vrbls like _e, _inst and _pi.  Almost none of that is of any
interest to end users, while random.__all__ lists exactly what *is*
interesting to users.  However, random.__all__ is redundant, because
random.py uses the underscore *convention* with care, and __all__ merely
contains the names "import *" would import if __all__ didn't exist.

Some old modules are much sloppier in their use of underscores, and Skip put
a lot of work (when adding __all__ to them) into figuring out which names
they *did* intend to export.  pydoc can't do a better job of guessing *that*
than Skip did by hand, and by ignoring both __all__ *and* the underscore
conventions, pydoc shows too much irrelevant implementation detail.

You eventually need an option to show "private" stuff too, but that's a poor
default choice except for people working on a module's implementation.  I'm
happy to live with the underscore conventions alone to make the
public-private distinction, but since history shows that few others are
willing to live with that, something like __all__ does serve a purpose and
should be respected.



From ping@lfw.org  Thu Apr 19 03:06:44 2001
From: ping@lfw.org (Ka-Ping Yee)
Date: Wed, 18 Apr 2001 21:06:44 -0500 (CDT)
Subject: [Doc-SIG] __all__, and how it relates to doc tools
In-Reply-To: 
Message-ID: 

> 
> [Edward]
> > Would it also be reasonable for a doc tool to look at this value, for
> > an indication of which objects to document?
> 
> [Tim]
> > Absolutely.  In fact, that's probably the best use.
> 
> [Guido]
> > Hm.  You may be right, but Ping told me that he had tried this in
> > pyoc, and was unhapy with the result: too much stuff didn't get
> > documented.  So we should at least be willing to retract this idea.

Tim Peters wrote:
> Well, every time you or I test pydoc under Windows, the first thing we do is
> type "random" at it.

...why, because "random" has those weird bound methods at the top-level
that used to throw pydoc for a loop?  :)

> Because "_" appears early in the alphabet, the first
> four methods it displays are:
> 
> _Random__whseed
> __getstate__
> __init__
> __setstate__

Well, you definitely want to know about __init__.  I can see why you
might not want to see private methods like _Random__whseed, though.
As for __getstate__ and __setstate__, it's probably nice to know that
they exist ("oh, it's possible to pickle this").

> and by ignoring both __all__ *and* the underscore
> conventions, pydoc shows too much irrelevant implementation detail.

I should note that pydoc *did* try both of those things already.
In a previous incarnation, pydoc avoided top-level names beginning
with _, but Guido was unhappy that it did this at the module level
and not at the class level, so i changed it.  In an even earlier
incarnation, pydoc only displayed names listed in __all__, and so
many things were missing from the output that it wasn't useful any
more (e.g. errors in httplib, useful functions in cgi, constants
like keyword.kwlist).  Perhaps if the value of __all__ were
different (or if it's changed in the past couple of weeks) it
would be okay, but at the moment it just hides too much.


-- ?!ng



From pf@artcom-gmbh.de  Thu Apr 19 08:11:33 2001
From: pf@artcom-gmbh.de (Peter Funk)
Date: Thu, 19 Apr 2001 09:11:33 +0200 (MEST)
Subject: Meta: EBNF notation (was Re: [Doc-SIG] Structuring: a summary; and an attempt at EBNF..)
In-Reply-To: <200104181930.f3IJU0p10299@gradient.cis.upenn.edu> from "Edward D. Loper" at "Apr 18, 2001  3:30: 0 pm"
Message-ID: 

Hi,

Edward D. Loper:
[...]
> Below is my first attempt at an EBNF-like formalism for these rules.
[...]
> IND and DED are indent and dedent (by a sinlge space); I use=20
> the notation IND[n] to mean n IND tokens.  Note that the rule::
[...]

Why don't you simply use INDENT and DEDENT tokens, which may represent
any arbitrary number of spaces as long as they match up?  Don't forget:
This is Python and anyone seriously interested in Python should be
already familar with this concept from the Python Grammar file and
will probably understand this at the first glance.

This might help to get rid of your `[n]' meta notation.  In EBNF the
square brackets `[' and `]' are normally used as meta symbols to
enclose optional terms (see below).  So the notation you invented
here irritates because it suggests that `IND[n]' is an `IND' token
followed by an optional term `n' ;-).

For your entertainment I like to quote a small passage from science
report No.36 written by Niklaus Wirth, ETH Eidgen=F6ssische Technische
Hochschule Z=FCrich, Institut f=FCr Informatik, introducing the programmi=
ng
language MODULA-2 in March 1980:

"""Notation for syntactic description
   ----------------------------------
   To describe the syntax, an Extended Backus-Naur Formalism called EBNF
   is used.
..
   Each factor F is either a (terminal or non-terminal) symbol, or it is
   of the form [ E ] denoting the union of the set E and the empty senten=
ce,
   or { E } denoting the union of the empty sequence and E, EE, EEE, ... =
=2E
   Parentheses may be used for grouping terms and factors.
..
   EBNF is capable of describing its own syntax.  We use it here as an
   example:

       syntax     =3D { production } .
       production =3D NTSym "=3D" expression "." .
       expression =3D term {"|" term} .
       term       =3D factor {factor} .
       factor     =3D TSym | NTSym | "(" expression ")" |
                    "[" expression "]" | "{" expression "}"=20
"""

As a student I was very impressed by this short and precise description
of the EBNF formalism. =20

The most common variations of this notation are to
use `::=3D', `:=3D' or `<-' instead of `=3D' in productions or to use=20
`(' expression `)+' instead of the square brackets to mark optional
terms or to use `(' expression `)*' instead of the curly braces to
mark [0..n] repetition.  For example the Python Grammar file uses=20
the asterisk notation for repetitions.  IMO the {} notation as used
by N.Wirth is easier to read.

> Anyway, I'm sure I didn't get that quite right, but it's a
> start, anyway.

Yes.  That's fine.  I will try to have a deeper look into it later.

Regards, Peter
--=20
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 422=
2950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germa=
ny)



From tim.one@home.com  Thu Apr 19 08:41:53 2001
From: tim.one@home.com (Tim Peters)
Date: Thu, 19 Apr 2001 03:41:53 -0400
Subject: [Doc-SIG] __all__, and how it relates to doc tools
In-Reply-To: 
Message-ID: 

[Tim]
>> Well, every time you or I test pydoc under Windows, the first
>> thing we do is type "random" at it.

[Ping]
> ...why, because "random" has those weird bound methods at the
> top-level that used to throw pydoc for a loop?  :)

Na, it's that when pydoc was busted completely on Windows, I shouted out to
Guido "hey, bring up the pydoc GUI and search for a module".  "Which module?"
"Doesn't matter -- pick one at random."  "OK, I pick random."  Now it's a
ritual .

> ...
> I should note that pydoc *did* try both of those things already.
> In a previous incarnation, pydoc avoided top-level names beginning
> with _, but Guido was unhappy that it did this at the module level
> and not at the class level, so i changed it.

Changed it to what?  To avoid them at both levels, or to avoid them at
neither?  I expect he intended the former, not the latter.  Names that both
begin and end with (at least) two underscores don't count as "beginning with
'_'" for this purpose, though (as you said but I snipped, things like
__init__ and __getstate__ are potentially interesting to end users).

> In an even earlier incarnation, pydoc only displayed names listed
> in __all__, and so many things were missing from the output that
> it wasn't useful any more (e.g. errors in httplib, useful functions
> in cgi, constants like keyword.kwlist).  Perhaps if the value of
> __all__ were different (or if it's changed in the past couple of
> weeks) it would be okay, but at the moment it just hides too much.

__all__ is supposed to list all and only the "public" names in the module.
When it doesn't, that's a bug to be fixed in the module.  I agree there are
lots of bugs.  In the meantime, it would be better to suppress names that
pass

    name[:1] == "_" and not name[:2] == "__" == name[-2:]

That would, e.g., expose httplib's error classes, but suppress its internal
state-machine constants (_CS_IDLE etc) and non-user-callable methods (like
HTTPConnection._set_hostport).  Longer term, we should fix __all__ or get rid
of it; the former is better, because the latter leaves us documenting
accidental exports (like httplib.mimetools) forever; but the former is also
real work.



From ping@lfw.org  Thu Apr 19 09:36:46 2001
From: ping@lfw.org (Ka-Ping Yee)
Date: Thu, 19 Apr 2001 03:36:46 -0500 (CDT)
Subject: [Doc-SIG] __all__, and how it relates to doc tools
In-Reply-To: 
Message-ID: 

On Thu, 19 Apr 2001, Tim Peters wrote:
> > In a previous incarnation, pydoc avoided top-level names beginning
> > with _, but Guido was unhappy that it did this at the module level
> > and not at the class level, so i changed it.
> 
> Changed it to what?  To avoid them at both levels, or to avoid them at
> neither?

Neither, as you can see now.  I didn't think we had the time to
debate the starts-with-one-underscore-but-not-two rule then.

>     name[:1] == "_" and not name[:2] == "__" == name[-2:]

Yup, looks like a good rule to me.


-- ?!ng



From edloper@gradient.cis.upenn.edu  Thu Apr 19 09:15:30 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Thu, 19 Apr 2001 04:15:30 EDT
Subject: Meta: EBNF notation (was Re: [Doc-SIG] Structuring: a summary; and an attempt at EBNF..)
In-Reply-To: Your message of "Thu, 19 Apr 2001 09:11:33 +0200."
 
Message-ID: <200104190815.f3J8FUp19153@gradient.cis.upenn.edu>

> Why don't you simply use INDENT and DEDENT tokens, which may
> represent any arbitrary number of spaces as long as they match up?
> Don't forget: This is Python and anyone seriously interested in
> Python should be already familar with this concept from the Python
> Grammar file and will probably understand this at the first glance.

Because these assume that there is no single indent that corresponds
to multiple dedents.  Which is true in Python, but not necessarily in
the markup language we're talking about.  In particular, consider::

  - This is a list item.

      - This is a sublist item.

    This is another paragraph in the main list item.

According to python's rules for generating INDENT and DEDENT tokens,
the dedent before "this is another..." would be illegal because it
doesn't line up with anything.  But according to my EBNF (assuming
that I got it right), it comes out correctly::

    IND IND - this is a list item
    IND IND - this is a sublist item.
    DED DED - This is another paragraph in the main list item.
    DED DED

Also, I should apologize for being very fast and loose with notation.
I'll clean that up before I make anything formal (e.g., before putting
anything in a PEP).  There are indeed several variations on EBNF.  The
basic one I was using uses the kleene star (x*) to mean 0 or more
repetitions of x, and the kleene cross (x+) to mean 1 or more
repetitions of x; I think I may have also used x? to mean 0 or 1
x's.. Basically the productions I wrote should read roughly as regexps
(with the VERBOSE flag).

I agree that x[n] isn't the best choice of notation, especially given
that I think I may have used things like "[^ NL S]" to mean "any
character that's not a newline or a space..  Perhaps x?  

One thing to note here is that the language I'm using is strictly more
powerful than EBNF.  The reason, as I said before, is because I have
crossing dependancies.  It would be possible to express the same
*string* language without crossing dependancies, but only if we allow
the first paragraph of a list item to be split across two different
nonterminals.

Also, incidentally, I used "(?! ..)", too, which is also strictly more
powerful than EBNFs (it's not context free; you can generate a^n b^n
c^n with it)... But I used it just as a matter of convenience --
everything I wrote with it could be re-written without it.

-Edward


From hernan@orgmf.com.ar  Thu Apr 19 10:24:10 2001
From: hernan@orgmf.com.ar (Hernan Martinez Foffani)
Date: Thu, 19 Apr 2001 11:24:10 +0200
Subject: [Doc-SIG] got a Mac and 20 minutes?
In-Reply-To: <200104190815.f3J8FUp19153@gradient.cis.upenn.edu>
Message-ID: 

If you found this a bit off topic, please apologize (me, not you :-)
and just ignore it.

If anybody around there got a Mac with Internet Explorer (version 4.x
or 5.x) can I ask you to download the "Python Shelf" at
http://www.orgmf.com.ar/condor/pytstuff.html (it's a zip file that's
almost 5MB) and see if the Microsoft HTML Help files (the ones with
extension .chm) work?

I didn't found any official reference that it should work. Apparently
the format is platform independent (but coming from Microsoft...)

In case it does work, any suggestion about installing those files on a
Mac are welcome.

(mmm... why i'm feeling pessimistic?)

Thanks in advance,
-Hern�n


--
Hern�n Mart�nez Foffani
hernan@orgmf.com.ar
http://www.orgmf.com.ar/condor/




From tony@lsl.co.uk  Thu Apr 19 11:27:27 2001
From: tony@lsl.co.uk (Tony J Ibbs (Tibs))
Date: Thu, 19 Apr 2001 11:27:27 +0100
Subject: [Doc-SIG] Ho hum - back to work...
In-Reply-To: <200104181930.f3IJU0p10299@gradient.cis.upenn.edu>
Message-ID: <004b01c0c8bb$54ad0a80$f05aa8c0@lslp7o.int.lsl.co.uk>

Well, I recovered from my flu (eventually) and am now back to "normal".

One of the interesting side-effects of the flu, though, was its ability
to purge the mind. I'm afraid I've come out of the illness with much
less interest in the Doc-SIG than I went in with - it's very difficult
to see, from the standpoint of now, why I was insane enough to devote so
much time to something that, perhaps, not so many people really care
about, when I could instead have been reading, ironing, making my Debian
system work, talking to Joan - oh, all sorts of things. This means I am
unlikely to be as active as I was, particularly since I'm expecting to
be quite busy with some interesting things at work as well.

It's also why I've been refraining from comment on the "structure"
discussion - I just don't have the time at the moment to spend an hour
in the morning on Doc-SIG.

Anyway, to the point. I'm taking tomorrow (and maybe a day next week)
off to do *some* work for the effort. It's a bit short notice to ask
this, but given all the work that Edward and David are doing (I don't
necessarily *agree* with them, but that's another matter), I figured I'd
seek an opinion on how my time might best be spent.

There are two main options:

1. My original promise - get a version of docutils/fat.py working as a
testbed. It would come with lost of command line switches to try out
various ideas, and would try to incorporate some of Edward's structuring
options (although note that *I* am not going to code *anything* that
supports C{something} or C markup, as I consider this an
abomination). It would be suitable for running over the standard library
to see how well *that* renders when passed through a markup engine (this
seems like a very important point to me!).

2. Work on the Doc-SIG archives, to try to produce summaries of the
arguments from its lifetime. Note that (technically) we may need this
for any PEPs we produce! (and it would clearly be useful to be able to
*point* to who said what and why, given the history of the group).

Option 1 (a) probably needs doing anyway, and (b) fat.py is probably
likely to be the only tool that supports multiple ways of doing things,
to allow users to *compare* them (which seems valuable to me).

Option 2 is actually more tempting (I've done this sort of thing before,
and it's a lot of work, but can be very worthwhile). I think this
*needs* doing at some point - we don't want to lose useful wisdom from
the past.

Two separate questions, as well (if answering these, please start a
separate thread for each?)

A. Content markup pedagogy.

I still don't understand why Edward (and Guido, although I think he's
less likely to answer!) object to "simple" markup like ST and relatives
use - why they consider it a Bad Thing to (a) use punctuation characters
for markup, and (b) use them in a context dependent manner. The last, in
particular, bugs me, as I *really* don't understand what the problem is
(after all, I *read* text in a context dependent manner). An explanation
of the object, in simple terms, would be a nice thing to have for me,
and might be useful pedadgogy in the eventual PEP discussions.

(As a subpoint, I don't *quite* understand why Edward wants to separate
structuring and colourising so much - this seems to me to be
implementation detail (for this purpose, I consider the EBNF to be
"implementation" as well) - real people don't have trouble with fuzzy
distinctions about such things.)

B. Reasons to be doing this

The Types SIG defines several different (possible) reasons for wanting
to produce type annotation, etc. I think it might be useful to produce
similar distinctions for Doc-SIG. So here is a tentative list of *why*
we might do this work:

NOT --- We might *not* do this work because we think that informal plain
text, with pydoc *guessing* what to link to what, is sufficient. This is
a not entirely unreasonable point, as pydoc does a reasonably decent job
(I've been looking at the HTML it produces, and why it's too small to
read, which is why I didn't say "excellent job"!) of presenting the
plain text from doc strings.

DOC --- I personally want to be able to markup the text to get across
more meaning (e.g., I *do* want emphasis, but I also want to be able to
annotate an argument list as such, and indicate what is literal text,
etc.). This is tool independent. It is an advantage to standardise on
one form of markup, even for DOC, because that makes it easier for other
people to read my marked up text.

REP --- It is nice to be able to present a DOC string with a little more
intelligence than is possible if it is treated as just plain text. The
main thing I want here is actually distinction of literal text (be it
inline or not) from "plain" text. Given I like to have emphasis, it
would be nice if that is recognised as well. Note Eddy's point that we
are *not* after "professional" quality of presentation here - just
something easier on the eye than plain text.

STRUC --- One might imagine that there are uses for marked up text,
since one could extract information from it. This relies on use of
"Arguments:" and other tags, as well as (perhaps) using hints like
`#..#` to indicate what one *does* want links generated for. Of course,
it is only if the markup scheme is widely adopted (and used
consistently) that one gets much benefit from this.

Have I missed any options? To me, DOC is the most important, with REP
following. I'm not sure I actually believe that we're going to get a lot
from STRUC (*except* making it easier to guess that I *didn't* want this
"London" to refer to a class, but instead just meant it as plain text).

Tibs

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
"How fleeting are all human passions compared with the massive
continuity of ducks." - Dorothy L. Sayers, "Gaudy Night"
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)



From hernan@orgmf.com.ar  Thu Apr 19 12:08:23 2001
From: hernan@orgmf.com.ar (Hernan Martinez Foffani)
Date: Thu, 19 Apr 2001 13:08:23 +0200
Subject: [Doc-SIG] pydoc small letters. was: Ho hum - back to work...
In-Reply-To: <004b01c0c8bb$54ad0a80$f05aa8c0@lslp7o.int.lsl.co.uk>
Message-ID: 

>de Tony J Ibbs (Tibs):
>
>..., as pydoc does a
>reasonably decent job
>(I've been looking at the HTML it produces, and why it's too small to
>read, which is why I didn't say "excellent job"!) of presenting the
>plain text from doc strings.
>
It used to be "more" difficult to increase the font size on Ping's
pydoc HTML output. (By "more" I mean that you have to look for the
 tag around the code.)
In the pydoc.py that's included in 2.1 it's only a one line change
that's logically located:
the "small()" function.

On line 382 of Lib/pydoc.py change:
    def small(self, text): return '%s' % text
by
    def small(self, text): return text

...and you'll see the difference. :-)

Regards,
-Hern�n


--
Hern�n Mart�nez Foffani
hernan@orgmf.com.ar
http://www.orgmf.com.ar/condor/




From hernan@orgmf.com.ar  Thu Apr 19 12:28:41 2001
From: hernan@orgmf.com.ar (Hernan Martinez Foffani)
Date: Thu, 19 Apr 2001 13:28:41 +0200
Subject: [Doc-SIG] RE: pydoc small letters. was: Ho hum - back to work...
In-Reply-To: 
Message-ID: 

I said:
>...
>the "small()" function.
>
It's a method obviously...

-H.



From tony@lsl.co.uk  Thu Apr 19 13:17:29 2001
From: tony@lsl.co.uk (Tony J Ibbs (Tibs))
Date: Thu, 19 Apr 2001 13:17:29 +0100
Subject: [Doc-SIG] pydoc small letters. was: Ho hum - back to work...
In-Reply-To: 
Message-ID: <004c01c0c8ca$b3bf1f90$f05aa8c0@lslp7o.int.lsl.co.uk>

Hernan Martinez Foffani wrote:
> It used to be "more" difficult to increase the font size on Ping's
> pydoc HTML output. (By "more" I mean that you have to look for the
>  tag around the code.)

It was the use of  tags that I disliked - they're a Bad Idea!

> In the pydoc.py that's included in 2.1 it's only a one line change
> that's logically located:
> the "small()" method

Ah - thanks.

I haven't got that version yet (still using 1.5.2 Python, and haven't
updated pydoc for a little while). One day I'll "officially" grumble
that using  is Bad, and should not be the default (but only when
I've worked out why he wanted it, and what one can do to alleviate the
"problem" that was trying to be solved!).

Tibs
--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
"How fleeting are all human passions compared with the massive
continuity of ducks." - Dorothy L. Sayers, "Gaudy Night"
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)




From hernan@orgmf.com.ar  Thu Apr 19 14:00:45 2001
From: hernan@orgmf.com.ar (Hernan Martinez Foffani)
Date: Thu, 19 Apr 2001 15:00:45 +0200
Subject: [Doc-SIG] pydoc small letters. was: Ho hum - back to work...
In-Reply-To: <004c01c0c8ca$b3bf1f90$f05aa8c0@lslp7o.int.lsl.co.uk>
Message-ID: 

>de Tony J Ibbs (Tibs)
>
>I haven't got that version yet (still using 1.5.2 Python, and haven't
>updated pydoc for a little while). One day I'll "officially" grumble
>that using  is Bad, and should not be the default
>(but only when
>I've worked out why he wanted it, and what one can do to
>alleviate the
>"problem" that was trying to be solved!).
>

Taken from a pydoc's comment:

	# Note: this module is designed to deploy instantly and run under any
	# version of Python from 1.5 and up.  That's why it's a single file
and
	# some 2.0 features (like string methods) are conspicuously absent.

So it seems that you can download the 2.1 version from CVS and use it
to browse your 1.5.2 Python.

Since I'm leaving my office now, I'm emailing you the "unrequested"
pydoc.py version 2.1 file.

Regards,
-Hernan



From dgoodger@atsautomation.com  Thu Apr 19 14:41:38 2001
From: dgoodger@atsautomation.com (Goodger, David)
Date: Thu, 19 Apr 2001 09:41:38 -0400
Subject: [Doc-SIG] Structuring: a summary; and an attempt at EBNF..
Message-ID: 

[Edward D. Loper]
> I figured I'd give a summary of all the structuring features that
> I think we've agreed on, so we can tentatively take those as
> a given.  If anyone objects, please say so..

OK, I'll bite!

>   3. Doctest blocks start with ">>> " and continue to the next blank
>      line.  Doctest blocks should be indented and separated by blank
>      lines. 

Do Doctest blocks have to be preceeded by "::"? I.E, are Doctest blocks 
simply a special case of literal blocks, or are they detected by indentation
& ">>> " alone?

>   4. Lists should be indented and separated by blank lines.

Why should lists be indented? What's wrong with

- a list
- like this?

No indentation is necessary. I suggest that if there *is* indentation, an
alternate interpretation is possible.

>   7. Colorizing takes place entirely within paragraphs, and does not
>      interact with structuring.

(As an aside: where does this term "colourizing" come from? It was first
used on Doc-SIG by Tibs last November. I've otherwise never seen it used in
this sense wrt markup. I have seen it used in the sense of syntax colouring
(i.e. IDEs changing the colour of text in code). I believe the correct term
here would be something like "inline markup" or "mixed markup".)

/DG


From tony@lsl.co.uk  Thu Apr 19 15:12:12 2001
From: tony@lsl.co.uk (Tony J Ibbs (Tibs))
Date: Thu, 19 Apr 2001 15:12:12 +0100
Subject: [Doc-SIG] Structuring: a summary; and an attempt at EBNF..
In-Reply-To: 
Message-ID: <005401c0c8da$ba5963a0$f05aa8c0@lslp7o.int.lsl.co.uk>

Goodger, David wrote:
> (As an aside: where does this term "colourizing" come from?

I think it may have been mind-pollution from my initial looking at STNG
code (it's probably the only thing that remained! - certainly none of
the algorithms).

If it *does* come from ST, it *may* have been around in the Doc-SIG for
a while (I think the initial few messages of Doc-SIG went something
like:

1. I'm here - who else is?
2. What are we doing?
3. Here's setext
4. Here's StructuredText (that was Jim Fulton)
5. Yuck - why not use (I think it was TeX) - that one was me...

There were doubtless a few other messages mixed in there, too...)

I use the term interchangably with "markup", and although the latter is
probably more standard a term, I quite like it (as to whether one is
"colourising" in the IDE sense, well, that would be one use of the
resultant information). I'm sure there's probably some half-assed pun in
the back of things, but I can't see it for the nonce.

I probably hadn't come across its use in that manner before, either
(although colour analogies in data structures are not new things).

Edward has certainly tended to use "inline markup" when he's being
formal, I believe.

(of course, it's also nice to use a word whose spelling is unlikely to
be agreed on - but that's an incidental benefit...)

[nb: my personal vote is that *obviously* doctest blocks don't need a
"::" in front of them. Their detection should be *identical* to the
means used by doctest.py - otherwise people really *will* get
confused... - hmm, of course, that actually doesn't work already, as
doctest.py will happily "see" a ">>>" inside a literal block. Ho hum.]

Whilst I'm here...
>  2. Literal blocks start with a paragraph that ends with "::"

Pedantry - they start with the first non-blank line *after* the "::"
paragraph, *if* it is indented more than that paragraph (and presumably
in Edward's terms, a relatively unindented paragraph after a "::"
paragraph would be an error - unless he wants to allow indentation 0).

So::

    This here::

        Is clearly OK

but what about::

    This here::

    Is this literal?

and::

    Some text.

        This here::

    Is this literal?

In the first case, we're OK. In the second, it's either non-literal (and
for Edward an error?), or literal with indentation 0. In the third, it's
clearly non-literal - but does Edward want an error or not?

>     and continue to the next line whose indentation is equal
>     to or less than that of the paragraph that started them.

Surely that should (for a start) be "next non-blank line" (and possibly
even "next non-blank line following a blank line", for pedantry).

And terms like "the paragraph that started them" is why I like terms
like "parent paragraph" - it's a lot easier to work with.

>  Literal blocks should be indented and separated by blank lines.

So that answers the "indentation by 0" question. But they can't be
separated by blank lines, 'cos those are part of the literal block (this
is *quite* important - as is preservation of the correct *number* of
(internal) blank lines).

Damn - I was trying not to get involved...

Tibs
--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
"How fleeting are all human passions compared with the massive
continuity of ducks." - Dorothy L. Sayers, "Gaudy Night"
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)



From dgoodger@atsautomation.com  Thu Apr 19 16:22:52 2001
From: dgoodger@atsautomation.com (Goodger, David)
Date: Thu, 19 Apr 2001 11:22:52 -0400
Subject: [Doc-SIG] RE: directives and fields
Message-ID: 

[Edward D. Loper]
> Well at least there should be rules in the "generic parser" that say
> when directives end, so that a parser can ignore a directive if it
> doesn't understand it.  As I understood your original proposal,
> directives ended with blank lines.  I think that they should end with
> a dedent back to the indent they started at, because then they can
> include blank lines..

>From the reStructuredText spec, first draft:

"""
A comment/directive block is a text block:

- whose first line begins with '.. ' in column 1,
- whose second and subsequent lines are indented relative to the first, and
- which ends with a blank or unindented line.

...

Actions taken in response to directives and the
interpretation of data in the directive block or subsequent text block(s)
are directive- and implementation-dependent.
"""

I would only change the third list item to 'which ends with an unindented
line'.

> And I think that it should be *possible* to handle directives in a
> second pass.

Sure, if that's what the extension wants to do. The extension itself is
called during parsing. If it's tied to a post-parse process, that's its own
business.

There are essentially two types of directives: extensions, which apply to
their blocks only; and plugins, which may change the behaviour of the parser
for some defined part of the input (may be for the adjacent text block, may
be globally). Justification for plugins: it would be useful to modify the
parser's behaviour on the fly, without having to subclass. For example, a
'fields' plugin could add support for the '@' syntax, allowing
experimentation & testing. Kind of like the 'from __future__ import' hack.
;->

> I.e., I don't think we should have any directives that
> change the syntax of subsequent parts of the string, like::
>
>    This is *emph*
> 
>    .. switch-emph-and-literal
> 
>    This is *literal*

Of course, no such directive would be part of the standard package. Only a
lunatic would play games like this. But it would be a great way for people
to play with alternate syntax.

> Basically, it seems like you should be able to make a "generic" parser
> which outputs a DOM tree for the formatted docstring, with "directive"
> elements containing #CDATA (=character data, i.e., a string) like::
> 
>      ... 
> 
> Then a specialized parser could run the generic parser, and then
> replace all the directive elements with some other elements..

If the extension/directive wants to do this, fine. But what if it just wants
to wrap the normal behaviour of the parser with a new tag?

> The only domain I care about is formatted docstrings.

That's a big enough domain with enough controversy to make the feature
necessary. See the archives. See this discussion! :-) It's been going on for
years, you know.

> As for running out of characters to use as syntax, that's one of the
> reasons I don't like *colorizing* `like this`...  

Then implement a POD-like language or a JavaDoc-like language or whatever.
This is clearly the dividing line: do you "buy in" to the
Setext/StructuredText concept or not?

> I think that my target is a much more lightweight markup language than
> you're talking about.. or at least less powerful.  I really don't see
> the need for most of those things in docstrings.

Again, read through the archives. Everyone has different opinions, everyone
wants different levels of control. If you don't want to use a particular
feature, don't. But someone else does. Please don't limit *me*.

It is my opinion that incomplete, minimal markup schemes are doomed to
failure, because *your* minimal set of features doesn't match *my* set or
*anybody else's*. At least at the discussion level. ;-)

> > Say we add an 'SQL' extension to the parser, which performs a
> > database query and inserts the results.
> 
> Wouldn't this totally violate making the docstring readable?  And when
> would you ever want to use this when writing a docstring??

Just an example, not a serious proposal. C'mon, lighten up!

> >    .. warning::
> > 
> >        Don't *ever* press the `Self-Destruct` button.
> >        If you do, you'll be sorry.
> 
> This could be implemented as a field.

Then fields can't be restricted to the ends of docstrings -- I want a
warning in the middle! And what do fields *do*? Seems to me they're simply
descriptive, not functional. Maybe they are all we need, but please come up
with a more complete description!

> I think that external URL
> hyperlinks should be implemented with colorizing, if at all.

They're definitely required. I used readability as the overriding criterion
in making that decision. Which is more readable?

1. A hyperlink in StructuredText, inline::

      I love using the "Python":http//www.python.org programming language!

      (The URL has to be stuck next to the reference, whether it flows or
      not. The raw text looks very different from the processed!)

2. A hyperlink in reStructuredText (based on the Setext style), indirect::

      I love using the Python_ programming language!

      (Note that the URL can be anywhere: next to the reference, at the 
      end of the section, or at the end of the document. And the URL can be
      referred to multiple times: Python_.)

      .. _Python: http://www.python.org

> I don't
> think that internal hyperlink targets make sense for docstrings.

This comes back to the semantics or usage of docstrings, something that I'm
trying to avoid. How long can a docstring be?

> I don't think that comments are necessary for docstrings.  If you really
> want, you can include a Python comment before or after the docstring.

Comments are a freebie from the '.. ' syntax. Not necessary, but useful.

> Alternatively, comments could be done via colorizing..  

Please, no.

> > The cornerstone of the Setext/StructuredText-like approach is that
> > the raw text should be as readable as possible, even to the
> > uninitiated.
> 
> I don't see how directives win here.
>
> If anything, it seems like they
> will make it harder to read by the uninitiated, given the power of
> directives to use almost arbitrary syntax..

You seem to think that typing '.. some-directive::' will magically make
something happen. Not so. You'd have to first *implement* the directive, not
a trivial task.

I was referring to '@' and (especially) 'X<>', about the readability
cornerstone. OTOH, directives are readable by way of being explicit. If we
want a digibloofer construct, we say '.. digibloofer::' (having paid the
price for such impertinence by implementing the digibloofer-parsing
extension first, of course ;-).

> However, the idea that "raw text should be as readable as possible,
> even to the uninitiated" is a *goal* of mine, but not a cornerstone.
> Perhaps a cornerstone would be::
> 
>   Raw text should be readable, even by the uninitiated.

I don't see the distinction.

> There are a lot of conflicting goals in designing a markup language,
> and making it as readable as possible is by no means my most
> fundamental goal.

I'd say, for the Setext/StructuredText approach, it *is* the most
fundamental goal. If it's not yours, you'll save yourself a lot of grief by
using XML or TeX.

> In the case of colorizing, I believe that
> colorizing should *never* be necessary to the understanding of a
> docstring.. i.e., you should be able to strip away all colorizing, and
> still understand what it says.

In the Setext/StructuredText approach, you shouldn't *have* to strip away
anything. It should just be obvious, or at least unobtrusive.

> I guess that perhaps what it comes down to is that I am *not*
> necessarily trying to design a Setext/StructuredText-like language.

Aha! :-)

> I'm trying to design a markup language that is optimal for writing
> Python docstrings.  

A noble goal. Please use a different name for what you're doing and let's be
done with it. Lots of room for competition (the field's wide open right now!
;-). The more the merrier.

> In my mind, the only advantage of using
> `quotes` over C{curly braces} is that quotes are easier to ignore..

Precisely. Also, `quotes` have the connotation of, well, quoting.

... And a vigorous debate was had by all. Me and Edward, anyway. Thank you,
sir.

/DG


From edloper@gradient.cis.upenn.edu  Fri Apr 20 04:06:15 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Thu, 19 Apr 2001 23:06:15 EDT
Subject: [Doc-SIG] Re: Ho hum - back to work...
In-Reply-To: Your message of "Thu, 19 Apr 2001 11:27:27 BST."
 <004b01c0c8bb$54ad0a80$f05aa8c0@lslp7o.int.lsl.co.uk>
Message-ID: <200104200306.f3K36Fp19203@gradient.cis.upenn.edu>

> Well, I recovered from my flu (eventually) and am now back to
> "normal".

That's good to hear.  I was beginning to worry that you didn't like us
anymore. :)

> Anyway, to the point. I'm taking tomorrow (and maybe a day next
> week) off to do *some* work for the effort. It's a bit short notice
> to ask this, but given all the work that Edward and David are doing
> (I don't necessarily *agree* with them, but that's another matter),
> I figured I'd seek an opinion on how my time might best be spent.

It does seem like it would be nice to have a parser with which we can
try a number of different rules..  And since you've already spent a
fair amount of time on that, that seems like a reasonable thing to
work on.

> 2. Work on the Doc-SIG archives, to try to produce summaries of the
> arguments from its lifetime. Note that (technically) we may need
> this for any PEPs we produce! (and it would clearly be useful to be
> able to *point* to who said what and why, given the history of the
> group).

I tried to do this a few weeks back, (including copius pointers to
individual articles), but gave up because I don't have *that* much
free time. :) But it would be *really* useful to have, I think, and it
you're more familiar with the archives, then maybe it wouldn't take as
long.. At least getting a start on it would be nice.

Overall, I'd say to work on docutils/fat.py, but mainly because you've
already invested a fair amount of work in it.  Maybe we can convince
someone else to do the doc-sig summary stuff?  :)  

> I still don't understand why Edward (and Guido, although I think
> he's less likely to answer!) object to "simple" markup like ST and
> relatives use - why they consider it a Bad Thing to (a) use
> punctuation characters for markup, and (b) use them in a context
> dependent manner.

I actually don't object to either (a) or (b), strictly speaking.  What
I object to is markup that I think will be "unsafe."  For example, I
have no problem with using *one* *word* *emph*, or saying that
backticks around any valid Python identifier mark it as a Python
object.  My biggest pet peve about ST-like markup is having a markup
be context-dependant, with a basically unbounded context.. For
example, if "*" starts an emph region only if there's another "*"
later in the string somewhere; and otherwise is an asterisk.  This
seems very dangerous to me.  I want to be able to tell (under most
circumstances), by looking at a character and its immediate context,
whether it's markup or not..  So, as long as we keep our contexts
relatively small, I don't object to context-dependant markup.  (In
fact, both bullets and "::" are definitely context-sensitive markup,
and I think they're very intuitive.)

As for using punctuation characters, that's fine (what else would you
use??), but if possible we should try to keep the need for escaping to
a minimum, because escaping will be ugly and non-intuitive, no matter
how we do it.  So we should try to keep the number of punctuation
characters we use to a minimum.

> (As a subpoint, I don't *quite* understand why Edward wants to
> separate structuring and colourising so much - this seems to me to
> be implementation detail (for this purpose, I consider the EBNF to
> be "implementation" as well) - real people don't have trouble with
> fuzzy distinctions about such things.)

There are really 2 reasons:

  1. A general divide-and-conquor approach to the problem of coming
     up with a markup language.  I'm more confident that we'll be able
     to come to consensus on smaller issues/domains than larger ones.
     This reason has nothing to do with the final markup language, and
     everything to do with how we get there.

  2. A side-effect of dividing structuring and colorizing is 
     eliminating a number of issues, such as how to tell whether
     a line in a paragraph starting with "1." is a bullet or a
     continuation of the previous line.

  3. I think that the markup language will be easier to understand
     if colorizing and structuring don't interact much.  

My original reasons were (1) and (3).  (2) was something that happily
fell out.

> B. Reasons to be doing this
[Summarized:]
  - NOT: we don't need to invent a markup language
  - DOC: we want to be more expressive in our docstrings
  - REP: we want to be smarter about displaying docstrings
  - STRUC: we want to be able to do smart things with our
           docstrings (other than displaying them).

I would say that for me, your REP and DOC would be my most important
reasons for this work, probably in that order.  The reason that I put
REP above DOC is because I think that the need for standardization is
much less for DOC than it is for REP.

> I'm not sure I actually believe that we're going to get a lot from
> STRUC

One thing we get is the ability to check for certain completeness
criteria in our documentation.. e.g., did I specify a return
value/type for everything that returns something?  did I describe
every parameter?

-Edward


From edloper@gradient.cis.upenn.edu  Fri Apr 20 04:25:20 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Thu, 19 Apr 2001 23:25:20 EDT
Subject: [Doc-SIG] Structuring: a summary; and an attempt at EBNF..
In-Reply-To: Your message of "Thu, 19 Apr 2001 09:41:38 EDT."
 
Message-ID: <200104200325.f3K3PKp21027@gradient.cis.upenn.edu>

> Do Doctest blocks have to be preceeded by "::"? I.E, are Doctest
> blocks simply a special case of literal blocks, or are they detected
> by indentation & ">>> " alone?

I would say that they're detected by indentation and ">>>" alone (and
should be separated by leading and trailing blank lines).  That's to
be consistant with the doctest module.  We could also say that they
have to appear in literal blocks, but then I'd want the parser to
generate a warning whenever it sees a paragraph that starts with
">>>".

> Why should lists be indented? What's wrong with
> 
> - a list
> - like this?

In theory, *either* indentation *or* separation by a blank line would
suffice.  In fact, right now my parser would accept your example with
a warning.  I think that we should enforce both because it makes the
docstrings easier to read, and is more consistant.  The one problem
with this is if you want to include a list directly after a literal
block.  There's no way to do it if lists are required to be indented.
Maybe we could allow it in that case. :)

> No indentation is necessary. I suggest that if there *is*
> indentation, an alternate interpretation is possible.

When I read them, *I* don't interpret them differently (as an
uninitiated reader).  So I don't think we should be encoding any
semantic content in the difference, if we *do* allow unindented lists.
Doing so seems to me to go against the principle of making sure that
the uninitiated can understand the docstring..

Well, actually, there is one obvious interpretation: actually indent
the list in the output.  But I don't think that we should be giving
people that much control over the output.  If they *need* that much
control, they should be using LaTeX or something like it..  I like to
*think* that the markup language we're talking about is mainly a
semantic one..

> As an aside: where does this term "colourizing" come from? 

I picked it up here on doc-sig (or maybe on some page that I was
pointed to from here).  I'll try to remember to talk about "inline
markup" or "local markup" or "intraparagraph markup" or some such when
writing up a PEP (if we ever get there...).






From edloper@gradient.cis.upenn.edu  Fri Apr 20 05:17:09 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Fri, 20 Apr 2001 00:17:09 EDT
Subject: [Doc-SIG] Re: directives and fields
In-Reply-To: Your message of "Thu, 19 Apr 2001 11:22:52 EDT."
 
Message-ID: <200104200417.f3K4H9p25988@gradient.cis.upenn.edu>

[These responses are a bit out-of-order.. They're in the order
 that I felt like responding to them in.]

[David said:]
> Please use a different name for what you're doing and let's be done
> with it. Lots of room for competition (the field's wide open right
> now!  ;-). The more the merrier.

I don't think I've used any name related to ST to refer to the markup
language I'm talking about for two or three weeks (ever since we
decided that we didn't need to try to maintain compatibility with ST).
If you like, we can call "my" language "epytext," because that's what
I called the parser module I've been writing (edloper's version of
pytext).

But really, I'm not just trying to design a markup language that *I*
like.  If that were my goal, I'd write a parser and be done with it.
My goal is to produce a markup language for docstrings that the Python
community can embrace as a whole.  You may say that it's not going to
happen, but at least 2 languages (Perl and Java) have managed it, and
I don't see why we can't come up with a good standard ML for Python.

Of course, not everyone would use the same features of the markup
langauge as everyone else.  Some people might use emph inline markup,
some might not; some might use fields, and some might not.  But they
would all be using the same markup language..  Just like I can write
in LaTeX and decide not to use \emph{}.  And as a result, people can
write tools for the markup language.

>>> "raw text should be as readable as possible, even to the
>>> uninitiated"
>
> I'd say, for the Setext/StructuredText approach, it *is* the most
> fundamental goal. If it's not yours, you'll save yourself a lot of
> grief by using XML or TeX.

In designing a good markup language for docstrings, I think we really
need to balance a number of goals.  XML and TeX do well with some
goals (e.g., they're formal, and XML is simple)..  But not so well
with other goals (they're not very easy to write, and not easy for the
uninitiated to read).  I think it's dangerous to concentrate too much
on any one goal.

> Then implement a POD-like language or a JavaDoc-like language or
> whatever.  This is clearly the dividing line: do you "buy in" to the
> Setext/StructuredText concept or not?

Do I have to "buy in" to all of it?  For example, to things like
saying that "*" is an asterisk if it appears once in a paragraph, but
an emph delimiter if it appears twice?  I appreciate many of the
features of ST-like languages.  I think that there's great potential
for clean/simple structuring, using them.  I think there's good
potential for simple colorizing, as long as we restrict it so that
it's "safe."  I think that we could potentially use one of those
without using the other.

> Again, read through the archives. Everyone has different opinions,
> everyone wants different levels of control. If you don't want to use
> a particular feature, don't. But someone else does. Please don't
> limit *me*.

But constructing a standard embraced by the community is really all
about limiting *you* (the user).  Without limitations on the user, we
can't write compatible tools.

One option, if you like it, is to say that any paragraph starting with
".. " will generate an error unless it starts a directive that a
parser knows about...  And anyone who uses directives should know that
they are making their docstrings less standard and less portable
across tools..  And, perhaps, "standard" directives can be added
to the language as time goes on, which will *not* result in less
standard/portable docstrings.

> That's a big enough domain with enough controversy to make the
> feature necessary. See the archives. See this discussion! :-) It's
> been going on for years, you know.

I know it has.  I thought maybe we could end it.  :) But if you manage
to convince me otherwise, I guess I *will* go off and write my own
parser/docstring tools. ;)

> > I think that external URL
> > hyperlinks should be implemented with colorizing, if at all.
> 
> They're definitely required. I used readability as the overriding
> criterion in making that decision. Which is more readable?

I would argue for either::

  I love using the Python programming langauge (http://www.python.org).

or::

  I love using the Python programming language (U{http:/www.python.org}).

...  But I know you'll disagree. :)

> It is my opinion that incomplete, minimal markup schemes are doomed
> to failure, because *your* minimal set of features doesn't match
> *my* set or *anybody else's*. At least at the discussion level. ;-)

I was trying to base my minimal set on previous successful docstring
markup langagues (POD and JavaDoc)..  If you think we need to add more
features, then we should talk about what features to add.  But only if
we're still trying to work towards coming up with a "community
standard" markup language (i.e., something we can put in a PEP).
Otherwise, we might as well just go off and implement our own little
markup languages. :)  But at the end of the day, (perhaps I should say
end of the year? ;), I would like to have a simple, streight-forward,
*bounded* markup language.

> - whose first line begins with '.. ' in column 1,
> - whose second and subsequent lines are indented relative to the first, and
> - which ends with a blank or unindented line.

Hm..  My bad.  I skipped past the "Comments and Directives" section to
the "Directives" subsection.  From the example in that subsection
(which was presumably not correctly formatted), I assumed that the
second and subsequent lines didn't need to be indented::

    .. keywords::
    Author: Anne Elk (Miss)
    Revision: 1

So I guess we basically agree.  Is it ok with you to change that to
"and ends with an unindented line" for now (in our discussion of
directives)?

> There are essentially two types of directives: extensions, which
> apply to their blocks only; and plugins, which may change the
> behaviour of the parser for some defined part of the input.

I would like to *only* allow "extensions."  If we allow "plugins,"
then a parser that doesn't recognize a directive really has no choice
but to fail.  I really don't see the need for plugins..  

One advantage of just using fields is that we can deal with unknown
fields in a reasonable way: put thier contents in a section labeled
with the name of the field.

-Edward


From edloper@gradient.cis.upenn.edu  Fri Apr 20 05:32:38 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Fri, 20 Apr 2001 00:32:38 EDT
Subject: [Doc-SIG] Structuring: a summary; and an attempt at EBNF..
In-Reply-To: Your message of "Thu, 19 Apr 2001 15:12:12 BST."
 <005401c0c8da$ba5963a0$f05aa8c0@lslp7o.int.lsl.co.uk>
Message-ID: <200104200432.f3K4Wcp27309@gradient.cis.upenn.edu>

> [nb: my personal vote is that *obviously* doctest blocks don't need
> a "::" in front of them. Their detection should be *identical* to
> the means used by doctest.py - otherwise people really *will* get
> confused... - hmm, of course, that actually doesn't work already, as
> doctest.py will happily "see" a ">>>" inside a literal block. Ho
> hum.]

I agree that the detection of ">>>" should be identical to doctest's
algorithms.  But I think that if we ever *do* manage to come up with a
standard markup language, and get a PEP accepted, etc, we could
probably get the doctest module changed so it ignores any ">>>" that's
within a literal block (it should be pretty easy to scan for that).
So I wouldn't worry about the "doctest in literal block" problem for
now.

> >  2. Literal blocks start with a paragraph that ends with "::"
> 
> Pedantry - they start with the first non-blank line *after* the "::"

Um, yeah, that's what I meant.  And actually I think we should strip
leading and trailing blank lines (but not internal blank lines) from
literal blocks.

> So::
> 
>    This here::
>
>         Is clearly OK
>

Literal block.

> but what about::
>
>     This here::
> 
>     Is this literal?

Maybe a warning, more likely an error.

> and::
> 
>     Some text.
> 
>         This here::
> 
>     Is this literal?

Maybe a warning, more likely an error.

In my EBNF, the 2nd and 3rd would be errors.

>>     and continue to the next line whose indentation is equal
>>     to or less than that of the paragraph that started them.
> 
> Surely that should (for a start) be "next non-blank line" 

Yes.. let's change it to::

  2. Literal blocks start after a paragraph that ends with "::",
     and end before the next (non-blank) line whose indentation is
     less than or equal to that of the paragraph that introduced them.

or something like that, anyway.. The language could still use some
cleaning-up.  If anyone on the group doesn't understand, say so, and
I'll try to explain it better.

> And terms like "the paragraph that started them" is why I like terms
> like "parent paragraph" - it's a lot easier to work with.

But it's not great for a 1-minute overview that's supposed to be
accessible to anyone. :) (And, actually, I would say that that's the
previous sister paragraph, not the parent, at least in the resultant
DOM tree).

> So that answers the "indentation by 0" question. But they can't be
> separated by blank lines, 'cos those are part of the literal block
> (this is *quite* important - as is preservation of the correct
> *number* of (internal) blank lines).

As I said, I think that leading and trailing blank lines should be
stripped..  (but not any internal blank lines) Do you disagree?  What
do other people think?

> Damn - I was trying not to get involved...

Don't let us drag you into this too much.. Feel free not to respond to
anything I send...  Sanity can be a nice thing.  (I've been close to
going sane, myself, a few times over the last month).

-Edward


From edloper@gradient.cis.upenn.edu  Sat Apr 21 21:19:29 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Sat, 21 Apr 2001 16:19:29 EDT
Subject: [Doc-SIG] Finding cannonical names for objects
Message-ID: <200104212019.f3LKJTp10196@gradient.cis.upenn.edu>

When writing a documentation tool, it would be nice to be able to
figure out what the "parent" of an object, where by parent, I mean:

  - for a module in a package, its package
  - for a function or class, the module it was originally defined in
  - for a member function, its class

Among other things, this is useful for trying to establish a unique
"cannonical" name for something that we're documenting, so we can
make sure that inter-documentation pointers are correct (e.g., if
we're converting docs to HTML).

However, it's not clear how to do this in several cases.  The cases
where it *is* stright-forward to do it are:

  - for non-builtin modules, extract the package information from
    the __name__ field.  Will this work for built-in packages, too?
    What's an example of a built-in package?
  - for non-builtin classes, consult the __module__ field
  - for non-builtin member functions, consult the im_class field
  - built-in classes all seem to have a __module__ field (e.g.,
    exception.Exception or sys.last_type).  Is this always true?

In the case of non-builtin functions, I can think of two ways to do 
it::

    def find_function_module_1(func):
        for module in sys.modules.values():
            if func.func_globals == module.__dict__:
                return module.__name__
        raise ValueError("Couldn't find the module for this function")

    def find_function_module_2(func):
        from os.path import basename, splitext
	try:
            return splitext(basename(inspect.getabsfile(func)))[0]
        except:
	    raise ValueError("Couldn't find the module for this func")

Is one of these approaches preferable?  Will they ever give different
results?  Is there a reason that non-builtin functions don't have a
__module__ field, like classes do?  (Or a reason that built-in methods
*do* have the __module__ field?)

The other difficult cases are built-in objects.  In general, I don't
see any way to get parents for built-in objects.  The relevant
built-in objects that I know of are:

  - built-in functions (e.g., len, min, sys.settrace)
  - built-in methods (e.g., [].append, file(...).read)
  - non-builtin methods with underlying builtin functions (e.g.,
    Exception.__str__)

Is there any way to get the "parents" for these objects?  (It would be
*nice* if doctools could process built-in objects as well as
non-builtin ones.)  

Another possible approach to finding cannonical names for objects is
to use their ids (as returned by the builtin function id()).  This
wouldn't be as nice, since it would result in basically arbitrary
names, but at least everything we document could be given a unique,
cannonical name (within a given session).  But I'm somewhat confused
about id().  In particular, it seems to return a value for integers..
But since the returned value is an integer, it seems like that implies
that at least 2 *different* values will have the same id..  Am I
missing something?  Is there somewhere I can read about what
guarantees are given about whether two values' ids will be different?
(e.g., if a value is GC'ed, can its id be recycled?  I assume yes..)

-Edward

p.s., Is there a reason that __builtins__.__name__ == '__builtin__'
instead of '__builtins__'?


From fdrake@acm.org  Sun Apr 22 03:11:52 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Sat, 21 Apr 2001 22:11:52 -0400 (EDT)
Subject: [Doc-SIG] Finding cannonical names for objects
In-Reply-To: <200104212019.f3LKJTp10196@gradient.cis.upenn.edu>
References: <200104212019.f3LKJTp10196@gradient.cis.upenn.edu>
Message-ID: <15074.15848.187550.147390@cj42289-a.reston1.va.home.com>

Edward D. Loper writes:
 > p.s., Is there a reason that __builtins__.__name__ == '__builtin__'
 > instead of '__builtins__'?

  Yes; the same reason that "import __builtin__" works but "import
__builtins__" does not.  ;-)


  -Fred

-- 
Fred L. Drake, Jr.  
PythonLabs at Digital Creations



From edloper@gradient.cis.upenn.edu  Sun Apr 22 03:59:58 2001
From: edloper@gradient.cis.upenn.edu (Edward D. Loper)
Date: Sat, 21 Apr 2001 22:59:58 EDT
Subject: [Doc-SIG] Finding cannonical names for objects
In-Reply-To: Your message of "Sat, 21 Apr 2001 22:11:52 EDT."
 <15074.15848.187550.147390@cj42289-a.reston1.va.home.com>
Message-ID: <200104220259.f3M2xwp15626@gradient.cis.upenn.edu>

> Edward D. Loper writes:
>  > p.s., Is there a reason that __builtins__.__name__ == '__builtin__'
>  > instead of '__builtins__'?
> 
>   Yes; the same reason that "import __builtin__" works but "import
> __builtins__" does not.  ;-)

Ok.  I guess maybe my question should have been why the default global
(?) variable to access them is called "__builtins__" rather than
"__builtin__"::

    Python 2.1 (#1, Apr 21 2001, 20:23:34) 
    [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
    Type "copyright", "credits" or "license" for more information.
    >>> dir()
    ['__builtins__', '__doc__', '__name__']
    >>> type(__builtins__), __builtins__.__name__
    (, '__builtin__')

-Edward


From fdrake@acm.org  Sun Apr 22 06:21:42 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Sun, 22 Apr 2001 01:21:42 -0400 (EDT)
Subject: [Doc-SIG] Finding cannonical names for objects
In-Reply-To: <200104220259.f3M2xwp15626@gradient.cis.upenn.edu>
References: <15074.15848.187550.147390@cj42289-a.reston1.va.home.com>
 <200104220259.f3M2xwp15626@gradient.cis.upenn.edu>
Message-ID: <15074.27238.314564.581602@cj42289-a.reston1.va.home.com>

Edward D. Loper writes:
 > Ok.  I guess maybe my question should have been why the default global
 > (?) variable to access them is called "__builtins__" rather than
 > "__builtin__"::

  __builtins__ is an implementation detail, nothing more.  It is used
to obtain the built-in functions; for all namespaces other than
__main__, __builtins__ is a dictionary rather than a module.  The
identity of the __builtins__ dict is also used to determine if code is
running in restricted execution mode; if the name bound to
__builtins__ is not the __builtin__ module or the dictionary for that
module, the restricted execution rules are in place.  The name is
different so it doesn't clash, and allows a minor performance
improvement over using the __builtin__ module when accessing the
built-in namespace.


  -Fred

-- 
Fred L. Drake, Jr.  
PythonLabs at Digital Creations



From fdrake@beowolf.digicool.com  Sun Apr 22 07:08:22 2001
From: fdrake@beowolf.digicool.com (Fred Drake)
Date: Sun, 22 Apr 2001 02:08:22 -0400 (EDT)
Subject: [Doc-SIG] [maintenance doc updates]
Message-ID: <20010422060822.A3E4428A0B@beowolf.digicool.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/maint-docs/

First attempt to push maintenance docs to the SourceForge site.



From fdrake@beowolf.digicool.com  Sun Apr 22 07:12:15 2001
From: fdrake@beowolf.digicool.com (Fred Drake)
Date: Sun, 22 Apr 2001 02:12:15 -0400 (EDT)
Subject: [Doc-SIG] [maintenance doc updates]
Message-ID: <20010422061215.5C87D28A0B@beowolf.digicool.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/maint-docs/

Second attempt to push maintenance docs to the SourceForge site.



From fdrake@beowolf.digicool.com  Sun Apr 22 07:15:52 2001
From: fdrake@beowolf.digicool.com (Fred Drake)
Date: Sun, 22 Apr 2001 02:15:52 -0400 (EDT)
Subject: [Doc-SIG] [maintenance doc updates]
Message-ID: <20010422061552.5A99628A0B@beowolf.digicool.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/maint-docs/

Third attempt to push maintenance docs to the SourceForge site.

Sheesh!



From tony@lsl.co.uk  Mon Apr 23 11:54:29 2001
From: tony@lsl.co.uk (Tony J Ibbs (Tibs))
Date: Mon, 23 Apr 2001 11:54:29 +0100
Subject: [Doc-SIG] Structuring: a summary; and an attempt at EBNF..
In-Reply-To: <200104200432.f3K4Wcp27309@gradient.cis.upenn.edu>
Message-ID: <006d01c0cbe3$c54349f0$f05aa8c0@lslp7o.int.lsl.co.uk>

Edward D. Loper wrote:
>   2. Literal blocks start after a paragraph that ends with "::",
>      and end before the next (non-blank) line whose indentation is
>      less than or equal to that of the paragraph that introduced them.
>
> or something like that, anyway.. The language could still use some
> cleaning-up.  If anyone on the group doesn't understand, say so, and
> I'll try to explain it better.

I think it still needs work (!) but it's getting there...

> > And terms like "the paragraph that started them" is why I like terms
> > like "parent paragraph" - it's a lot easier to work with.
>
> But it's not great for a 1-minute overview that's supposed to be
> accessible to anyone. :) (And, actually, I would say that that's the
> previous sister paragraph, not the parent, at least in the resultant
> DOM tree).

Well, in the DOM tree, yes (I forgot that).

The term "preceding non-literal paragraph" is indeed a bit
cumbersome... - perhaps we're best off with "the '""' paragraph", which
is (sort of) fairly obvious.

> As I said, I think that leading and trailing blank lines should be
> stripped..  (but not any internal blank lines) Do you disagree?

No - it is clearly correct to strip preceding and trailing blank lines,
mea culpa for not being pedantic about that!

I also think that we agree on the "awkward examples" (which is why I
didn't copy them again).

> > Damn - I was trying not to get involved...
>
> Don't let us drag you into this too much.. Feel free not to respond to
> anything I send...  Sanity can be a nice thing.  (I've been close to
> going sane, myself, a few times over the last month).

I'm trying to maintain the attitude that I can be quieter on the grounds
that you and David are working on things - with TWO people arguing (erm,
discussing) their way towards something, I'm sort-of happy. Not that I'm
necessarily happy with the final result *in total*, but if I don't have
time to argue and/or implement, that's just tough luck.

Besides, I think the *structuring* is getting there (although I am a
little worried that an implicit goal of being able to cope with "text as
it were wrote" that already exists may be being lost - not a problem if
it's not an aim, but grist for an email at another time, I think).

Tibs
--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
"How fleeting are all human passions compared with the massive
continuity of ducks." - Dorothy L. Sayers, "Gaudy Night"
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)



From tony@lsl.co.uk  Mon Apr 23 12:16:49 2001
From: tony@lsl.co.uk (Tony J Ibbs (Tibs))
Date: Mon, 23 Apr 2001 12:16:49 +0100
Subject: Context dependent markup  (was  [Doc-SIG] Re: Ho hum - back to
 work...)
In-Reply-To: <200104200306.f3K36Fp19203@gradient.cis.upenn.edu>
Message-ID: <006e01c0cbe6$e42aa310$f05aa8c0@lslp7o.int.lsl.co.uk>

Edward D. Loper wrote (well, in response to me):
> > I still don't understand why Edward (and Guido, although I think
> > he's less likely to answer!) object to "simple" markup like ST and
> > relatives use - why they consider it a Bad Thing to (a) use
> > punctuation characters for markup, and (b) use them in a context
> > dependent manner.
>
> I actually don't object to either (a) or (b), strictly speaking.  What
> I object to is markup that I think will be "unsafe."  For example, I
> have no problem with using *one* *word* *emph*, or saying that
> backticks around any valid Python identifier mark it as a Python
> object.  My biggest pet peve about ST-like markup is having a markup
> be context-dependant, with a basically unbounded context.. For
> example, if "*" starts an emph region only if there's another "*"
> later in the string somewhere; and otherwise is an asterisk.

Which I don't think I've ever proposed...

> This
> seems very dangerous to me.  I want to be able to tell (under most
> circumstances), by looking at a character and its immediate context,
> whether it's markup or not..

But my problem is that I think this has always been possible (and I
think you disagree) so there is clearly some leeway on this clarity,
which is what I'm trying to track down.

> So, as long as we keep our contexts
> relatively small, I don't object to context-dependant markup.  (In
> fact, both bullets and "::" are definitely context-sensitive markup,
> and I think they're very intuitive.)
>
> As for using punctuation characters, that's fine (what else would you
> use??), but if possible we should try to keep the need for escaping to
> a minimum, because escaping will be ugly and non-intuitive, no matter
> how we do it.  So we should try to keep the number of punctuation
> characters we use to a minimum.

Ooh - there's that nasty POD-clone (hmm - bad puns about pod-people
narrowly averted)

> > (As a subpoint, I don't *quite* understand why Edward wants to
> > separate structuring and colourising so much - this seems to me to
> > be implementation detail (for this purpose, I consider the EBNF to
> > be "implementation" as well) - real people don't have trouble with
> > fuzzy distinctions about such things.)
>
> There are really 2 reasons:
>
>   1. A general divide-and-conquor approach to the problem of coming
>      up with a markup language.  I'm more confident that we'll be able
>      to come to consensus on smaller issues/domains than larger ones.
>      This reason has nothing to do with the final markup language, and
>      everything to do with how we get there.
>
>   2. A side-effect of dividing structuring and colorizing is
>      eliminating a number of issues, such as how to tell whether
>      a line in a paragraph starting with "1." is a bullet or a
>      continuation of the previous line.
>
>   3. I think that the markup language will be easier to understand
>      if colorizing and structuring don't interact much.
>
> My original reasons were (1) and (3).  (2) was something that happily
> fell out.

I like 1. I suspect that 3 doesn't always split *quite* that way, but
point taken. I think 2 addresses exactly that issue about how it doesn't
always split that way (and I tend to agree with Tim Peters' point some
while back that "if it looks like markup..." (to paraphrase
aggressively)). OK.

Tibs

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
"How fleeting are all human passions compared with the massive
continuity of ducks." - Dorothy L. Sayers, "Gaudy Night"
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)



From tony@lsl.co.uk  Mon Apr 23 12:16:51 2001
From: tony@lsl.co.uk (Tony J Ibbs (Tibs))
Date: Mon, 23 Apr 2001 12:16:51 +0100
Subject: Reasons to do this (was [Doc-SIG] Re: Ho hum - back to work...)
In-Reply-To: <200104200306.f3K36Fp19203@gradient.cis.upenn.edu>
Message-ID: <006f01c0cbe6$e5484810$f05aa8c0@lslp7o.int.lsl.co.uk>

Edward D. Loper wrote:
> > B. Reasons to be doing this
> [Summarized:]
>   - NOT: we don't need to invent a markup language
>   - DOC: we want to be more expressive in our docstrings
>   - REP: we want to be smarter about displaying docstrings
>   - STRUC: we want to be able to do smart things with our
>            docstrings (other than displaying them).
>
> I would say that for me, your REP and DOC would be my most important
> reasons for this work, probably in that order.  The reason that I put
> REP above DOC is because I think that the need for standardization is
> much less for DOC than it is for REP.

Interesting. For me, the order is quite close anyway, so I think this is
agreement.

> > I'm not sure I actually believe that we're going to get a lot from
> > STRUC
>
> One thing we get is the ability to check for certain completeness
> criteria in our documentation.. e.g., did I specify a return
> value/type for everything that returns something?  did I describe
> every parameter?

Which is *not* the classic thing people claim to want from STRUC - they
normally seem to be asking for the ability to extract information for
querying.

*If* we are primarily interested in DOC and REP (survey of 2 people, so
terribly significant) then I think that has repercussions. I need to
develop my ideas on this a bit more.

(It seems worth noting, to me, that if STRUC were the main reason, then
the use of __version__, __author__ (and potential others), and the way
that the Types SIG looks like allowing string annotation of typed
argument values, would seem to make its case a LOT less strong. And if
REP/DOC is our main aim, we need to consider what it is that we object
to about what pydoc does (which is a brutal rendering of the text as
written, with URIs guessed at) - an extreme position would say that
structuring was unnecessary and only the markup/colourising was
needed...

Hmm - I'll think about trying to expand on these things, since they seem
to be useful potential ammo for (a) convincing ourselves, (b) convincing
others...)

Tibs

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
"How fleeting are all human passions compared with the massive
continuity of ducks." - Dorothy L. Sayers, "Gaudy Night"
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)



From tony@lsl.co.uk  Mon Apr 23 12:16:55 2001
From: tony@lsl.co.uk (Tony J Ibbs (Tibs))
Date: Mon, 23 Apr 2001 12:16:55 +0100
Subject: [Doc-SIG] Re: Ho hum - back to work...
In-Reply-To: <200104200306.f3K36Fp19203@gradient.cis.upenn.edu>
Message-ID: <007001c0cbe6$e7749df0$f05aa8c0@lslp7o.int.lsl.co.uk>

Edward D. Loper wrote:
> > Well, I recovered from my flu (eventually) and am now back to
> > "normal".
>
> That's good to hear.  I was beginning to worry that you didn't like us
> anymore. :)

Oh no - silence is golden, and all that.

Seriously, I have lost a lot of the drive to steal time away from other
things (and we had a *major* backlog of ironing, too), so I suspect that
I am going to do much less implementation work. Which should not be a
problem if you and David are willing to code.

> It does seem like it would be nice to have a parser with which we can
> try a number of different rules..  And since you've already spent a
> fair amount of time on that, that seems like a reasonable thing to
> work on.

Unfortunately, I used some of Friday up on non-Python things (isn't that
just the way), and it's clear that there is more than a couple of days
work needed on fat.py.

I may get round to work on it, but it's likely to be slow...

(if someone else wants the current code, I'll update the web stuff - but
of course working on someone else's code at this stage in development
isn't necessarily unalloyed joy)

> > 2. Work on the Doc-SIG archives, to try to produce summaries of the
> > arguments from its lifetime. Note that (technically) we may need
> > this for any PEPs we produce! (and it would clearly be useful to be
> > able to *point* to who said what and why, given the history of the
> > group).
>
> I tried to do this a few weeks back, (including copius pointers to
> individual articles), but gave up because I don't have *that* much
> free time. :) But it would be *really* useful to have, I think, and it
> you're more familiar with the archives, then maybe it wouldn't take as
> long.. At least getting a start on it would be nice.

In the end that's what I started work on. It would be easier if my
modem/internet connection were reliable to download the whole archive,
but I've got up to beginning of December 2000 as one file, and have got
*most* of the way through removing non-relevant messages (damn - why do
so many of them have to be interesting) and starting to populate the
inside of my head with what previous arguments have said.

I think there will be some serious issues (particularly about the grand
scope/applicability of what Doc-SIG is trying to do for docstrings) that
emerge, so it does seem important to do.

> Overall, I'd say to work on docutils/fat.py, but mainly because you've
> already invested a fair amount of work in it.  Maybe we can convince
> someone else to do the doc-sig summary stuff?  :)

Oh well, I chose the other one for now.

The issues to be resolved include:

* strip out the support for the "initial one line summary" stuff that
Guido *doesn't* need, after all
* remove them from docstrings (both of these are trivial, of course)
* change the literal quote character to be backtick (trivial)
* add support for underlined headers (hmm)
* add (optional) support for your "lists must be indented after the
first line"
* add (optional) support for "blank lines between list items" (trivial)
* add (optional) support for "blank lines needed before and after lists"
(more complex, and I'm not convinced useful)
* add support for your "markup significant regardless of placement"
(strikes me as hard given the way the program currently works - needs
more thought)
* add (optional) support for requiring URIs to be in "<" and ">"

I toyed with doing everything except the markup one (since I'm still not
convinced on that issue), but it looked like more work than I'd
obviously have time for, so I put it off.

Other issues in other threads...

Tibs

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
"How fleeting are all human passions compared with the massive
continuity of ducks." - Dorothy L. Sayers, "Gaudy Night"
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)



From gherman@darwin.in-berlin.de  Wed Apr 25 08:36:49 2001
From: gherman@darwin.in-berlin.de (Dinu Gherman)
Date: Wed, 25 Apr 2001 09:36:49 +0200
Subject: [Doc-SIG] Issues with 2.1 doc PDF files
Message-ID: <3AE67E91.2DADD89D@darwin.in-berlin.de>

Hello,

I've just noticed that there are considerable differences
between the PDF files documenting Python 2.1 and those for
2.0. Basically, these are:

- file sizes are much bigger for 2.1 (50-100%)
- fonts are approximated with pixelized bitmaps

As a result you get:

- much longer page building times in PDF readers
- considerable longer search times
- much longer print times (probably - haven't checked) 

I've not verified this for each PDF file (it's prominently
obvious for the tutorial, though), but I assume the same 
effects can be observed for all of them, when comparing with 
corresponding files from the previous release.

So, I'm really curious what the reason for this phenomenon
could be? If there isn't any, I suggest reproducing the 
files to no longer show the described effects as they will
definitely distract people from reading the files online
and simply lead to a bad overall impression about their
generation process, if not even their content.

Regards,

Dinu

-- 
Dinu C. Gherman
ReportLab Consultant - http://www.reportlab.com
................................................................
"The only possible values [for quality] are 'excellent' and 'in-
sanely excellent', depending on whether lives are at stake or 
not. Otherwise you don't enjoy your work, you don't work well, 
and the project goes down the drain." 
                    (Kent Beck, "Extreme Programming Explained")


From gherman@darwin.in-berlin.de  Wed Apr 25 11:08:11 2001
From: gherman@darwin.in-berlin.de (Dinu Gherman)
Date: Wed, 25 Apr 2001 12:08:11 +0200
Subject: [Doc-SIG] Re: Issues with 2.1 doc PDF files
References: <3AE67E91.2DADD89D@darwin.in-berlin.de>
Message-ID: <3AE6A20B.D44D745F@darwin.in-berlin.de>

I wrote:
> 
> I've just noticed that there are considerable differences
> between the PDF files documenting Python 2.1 and those for
> 2.0. [...]

After running a diff over tut.tex for both versions I get an
even more interesting difference concerning the *content*
(left 2.0, right 2.1, excerpts only):

  2603c2620
  < '[31.4, 40000]'
  ---
  > '[31.400000000000002, 40000]'
  2611c2628
  < "(31.4, 40000, ('spam', 'eggs'))"
  ---
  > "(31.400000000000002, 40000, ('spam', 'eggs'))"

Being curious, I typed the following into Pythonwin and am
quite baffled by the results:

  PythonWin 2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 bit (Intel)] 
  [...]
  >>> x = 10 * 3.14
  >>> x
  31.400000000000002
  >>> x == 31.4
  0
  >>> 3.14
  3.1400000000000001
  >>> 

Aparently the doc for 2.0 might have been generated with 
1.5.2, but I don't have a running 1.5.2 to cross-check 
this quickly.

I guess for many this must look like something unexpected,
isn't it? But is it a reason to panic? I don't find anything 
explaining this behaviour in the FAQ. If it has to do with
internal floating point representation limits it might be
an issue of general interest and worth being documented
somewhere.

If this observation was discussed long ago on comp.lang.python
please forgive me (and point me to it), but I haven't had the 
time to follow this group very much recently...

Regards,

Dinu

-- 
Dinu C. Gherman
ReportLab Consultant - http://www.reportlab.com
................................................................
"The only possible values [for quality] are 'excellent' and 'in-
sanely excellent', depending on whether lives are at stake or 
not. Otherwise you don't enjoy your work, you don't work well, 
and the project goes down the drain." 
                    (Kent Beck, "Extreme Programming Explained")


From gherman@darwin.in-berlin.de  Wed Apr 25 20:36:36 2001
From: gherman@darwin.in-berlin.de (Dinu Gherman)
Date: Wed, 25 Apr 2001 21:36:36 +0200
Subject: [Doc-SIG] Re: Issues with 2.1 doc PDF files
References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <3AE6A20B.D44D745F@darwin.in-berlin.de>
Message-ID: <3AE72744.8FF44DD7@darwin.in-berlin.de>

I wrote:
> 
> If this observation was discussed long ago on comp.lang.python
> please forgive me (and point me to it), but I haven't had the
> time to follow this group very much recently...

Ok, I learned that this is said to be expected behaviour 
in 2.x. Still, I think it should be documented in the 
main FAQ and not only in this one:

  http://www.python.org/cgi-bin/moinmoin/FrequentlyAskedQuestions#line24

Dinu


From guido@digicool.com  Wed Apr 25 21:41:03 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 25 Apr 2001 15:41:03 -0500
Subject: [Doc-SIG] Re: Issues with 2.1 doc PDF files
In-Reply-To: Your message of "Wed, 25 Apr 2001 12:08:11 +0200."
 <3AE6A20B.D44D745F@darwin.in-berlin.de>
References: <3AE67E91.2DADD89D@darwin.in-berlin.de>
 <3AE6A20B.D44D745F@darwin.in-berlin.de>
Message-ID: <200104252041.PAA15107@cj20424-a.reston1.va.home.com>

> Being curious, I typed the following into Pythonwin and am
> quite baffled by the results:
> 
>   PythonWin 2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 bit (Intel)] 
>   [...]
>   >>> x = 10 * 3.14
>   >>> x
>   31.400000000000002
>   >>> x == 31.4
>   0
>   >>> 3.14
>   3.1400000000000001
>   >>> 

Yes, this is baffling at first, and a FAQ, if you knwo where to look:

http://www.python.org/cgi-bin/moinmoin/FrequentlyAskedQuestions#line24

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Wed Apr 25 21:48:55 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 25 Apr 2001 15:48:55 -0500
Subject: [Doc-SIG] Re: Issues with 2.1 doc PDF files
In-Reply-To: Your message of "Wed, 25 Apr 2001 21:36:36 +0200."
 <3AE72744.8FF44DD7@darwin.in-berlin.de>
References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <3AE6A20B.D44D745F@darwin.in-berlin.de>
 <3AE72744.8FF44DD7@darwin.in-berlin.de>
Message-ID: <200104252048.PAA15230@cj20424-a.reston1.va.home.com>

> Ok, I learned that this is said to be expected behaviour 
> in 2.x. Still, I think it should be documented in the 
> main FAQ and not only in this one:
> 
>   http://www.python.org/cgi-bin/moinmoin/FrequentlyAskedQuestions#line24

Rather than complaining, you can do it yourself.  The main FAQ's
password is "Spam".

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake@acm.org  Wed Apr 25 21:10:34 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 25 Apr 2001 16:10:34 -0400 (EDT)
Subject: [Doc-SIG] Issues with 2.1 doc PDF files
In-Reply-To: <3AE67E91.2DADD89D@darwin.in-berlin.de>
References: <3AE67E91.2DADD89D@darwin.in-berlin.de>
Message-ID: <15079.12090.738565.710150@cj42289-a.reston1.va.home.com>

Dinu Gherman writes:
 > I've just noticed that there are considerable differences
 > between the PDF files documenting Python 2.1 and those for
 > 2.0. Basically, these are:
 > 
 > - file sizes are much bigger for 2.1 (50-100%)

  Yeah, I thought they looked a little large, but wasn't sure why.
I've also noticed (and this isn't new) that the A4 versions are quite
a bit larger than the US-Letter versions: about 50% for the PDF, not
so much for the PostScript.  I have no idea why this would be the
case.  I presume you were looking at the A4 version.

 > - fonts are approximated with pixelized bitmaps

  That's not good.

 > As a result you get:
 > 
 > - much longer page building times in PDF readers
 > - considerable longer search times
 > - much longer print times (probably - haven't checked) 
 > 
 > I've not verified this for each PDF file (it's prominently
 > obvious for the tutorial, though), but I assume the same 
 > effects can be observed for all of them, when comparing with 
 > corresponding files from the previous release.

  This would be something to look at -- recall that we added some
magic to the tutorial to control interpretation of the document
encoding, so that some Latin-1 characters would be typeset correctly.
(This was at your prodding, as I recall!  ;-)  Could you please look
at at least one of the other documents to see if they exhibit the same
symptoms?

 > So, I'm really curious what the reason for this phenomenon
 > could be? If there isn't any, I suggest reproducing the 
 > files to no longer show the described effects as they will

  If you can tell me how to control these things, I'm sure we can
build another distribution.  I have no idea how to control this --
this goes deeper into the LaTeX/pdfLaTeX magic than I'm familiar
with.

 > definitely distract people from reading the files online
 > and simply lead to a bad overall impression about their
 > generation process, if not even their content.

  Do people really use the PDF onscreen?  I've always imagined Windows
users use them to print from, since PostScript printers are less
common under Windows than under Linux & Unix.  I'd be curious as to
whether onscreen display or printing is widespread for the PDF
version -- for onscreen viewing I'd expect a very different layout.


  -Fred

-- 
Fred L. Drake, Jr.  
PythonLabs at Digital Creations



From fdrake@acm.org  Wed Apr 25 21:15:14 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 25 Apr 2001 16:15:14 -0400 (EDT)
Subject: [Doc-SIG] Re: Issues with 2.1 doc PDF files
In-Reply-To: <3AE6A20B.D44D745F@darwin.in-berlin.de>
References: <3AE67E91.2DADD89D@darwin.in-berlin.de>
 <3AE6A20B.D44D745F@darwin.in-berlin.de>
Message-ID: <15079.12370.524256.761513@cj42289-a.reston1.va.home.com>

Dinu Gherman writes:
 > After running a diff over tut.tex for both versions I get an
 > even more interesting difference concerning the *content*
 > (left 2.0, right 2.1, excerpts only):

  The Python 2.0 documentation was not properly updated.


  -Fred

-- 
Fred L. Drake, Jr.  
PythonLabs at Digital Creations



From gherman@darwin.in-berlin.de  Wed Apr 25 22:05:55 2001
From: gherman@darwin.in-berlin.de (Dinu Gherman)
Date: Wed, 25 Apr 2001 23:05:55 +0200
Subject: [Doc-SIG] Re: Issues with 2.1 doc PDF files
References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <3AE6A20B.D44D745F@darwin.in-berlin.de>
 <3AE72744.8FF44DD7@darwin.in-berlin.de> <200104252048.PAA15230@cj20424-a.reston1.va.home.com>
Message-ID: <3AE73C33.98DFD478@darwin.in-berlin.de>

Guido van Rossum wrote:
> 
> > Ok, I learned that this is said to be expected behaviour
> > in 2.x. Still, I think it should be documented in the
> > main FAQ and not only in this one:
> >
> >   http://www.python.org/cgi-bin/moinmoin/FrequentlyAskedQuestions#line24
> 
> Rather than complaining, you can do it yourself.  The main FAQ's
> password is "Spam".


Great, I didn't know that! What is still unclear to me
is if there is a real need for two FAQs (and maybe more 
in the future)? Especially, as the MoinMoin one seems to 
be unreachable from python.org and python.org/search.
To me right now it looks like leading people astray with-
out a good reason.

Also, given more frequent releases, would it make sense,
perhaps, to indicate the Python version that a specific
feature/module/whatever is available from in some of the
standard Python documentation files?

Dinu


From dfan@harmonixmusic.com  Wed Apr 25 22:08:29 2001
From: dfan@harmonixmusic.com (Dan Schmidt)
Date: 25 Apr 2001 17:08:29 -0400
Subject: [Doc-SIG] Issues with 2.1 doc PDF files
In-Reply-To: <15079.12090.738565.710150@cj42289-a.reston1.va.home.com>
References: <3AE67E91.2DADD89D@darwin.in-berlin.de>
 <15079.12090.738565.710150@cj42289-a.reston1.va.home.com>
Message-ID: 

"Fred L. Drake, Jr."  writes:

| Dinu Gherman writes:
|
|  > definitely distract people from reading the files online
|  > and simply lead to a bad overall impression about their
|  > generation process, if not even their content.
| 
|   Do people really use the PDF onscreen?  I've always imagined
| Windows users use them to print from, since PostScript printers are
| less common under Windows than under Linux & Unix.  I'd be curious
| as to whether onscreen display or printing is widespread for the PDF
| version -- for onscreen viewing I'd expect a very different layout.

I don't view the Python docs with PDF, but I read many other PDF files
'onscreen' rather than printing them out.

-- 
http://www.dfan.org



From fdrake@acm.org  Wed Apr 25 22:25:44 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 25 Apr 2001 17:25:44 -0400 (EDT)
Subject: [Doc-SIG] Issues with 2.1 doc PDF files
In-Reply-To: 
References: <3AE67E91.2DADD89D@darwin.in-berlin.de>
 <15079.12090.738565.710150@cj42289-a.reston1.va.home.com>
 
Message-ID: <15079.16600.483240.541905@cj42289-a.reston1.va.home.com>

Dan Schmidt writes:
 > I don't view the Python docs with PDF, but I read many other PDF files
 > 'onscreen' rather than printing them out.

  Would you *like* to be able to read the Python PDF version onscreen,
or is one of the other versions preferable for you?
  I guess what I'd like to figure out is how many people would like to
use a version that they're not using because there's some impediment
(files are too large to d/l, lacks functionality, has formatting
problems, etc.).


  -Fred

-- 
Fred L. Drake, Jr.  
PythonLabs at Digital Creations



From gherman@darwin.in-berlin.de  Wed Apr 25 22:30:32 2001
From: gherman@darwin.in-berlin.de (Dinu Gherman)
Date: Wed, 25 Apr 2001 23:30:32 +0200
Subject: [Doc-SIG] Issues with 2.1 doc PDF files
References: <3AE67E91.2DADD89D@darwin.in-berlin.de>
 <15079.12090.738565.710150@cj42289-a.reston1.va.home.com>
  <15079.16600.483240.541905@cj42289-a.reston1.va.home.com>
Message-ID: <3AE741F8.2C4DF4FF@darwin.in-berlin.de>

"Fred L. Drake, Jr." wrote:
> 
> Dan Schmidt writes:
>  > I don't view the Python docs with PDF, but I read many other PDF files
>  > 'onscreen' rather than printing them out.
> 
>   Would you *like* to be able to read the Python PDF version onscreen,
> or is one of the other versions preferable for you?
>   I guess what I'd like to figure out is how many people would like to
> use a version that they're not using because there's some impediment
> (files are too large to d/l, lacks functionality, has formatting
> problems, etc.).

I'll add only that much here: I like viewing PDFs onscreen
because unlike HTML they allow to search stuff and nicely
print parts quickly. I'm less bothered by the paper layout 
which is not ideal for the screen.

I'll come back to your previous comments tomorrow...

Dinu


From fdrake@acm.org  Wed Apr 25 22:32:18 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 25 Apr 2001 17:32:18 -0400 (EDT)
Subject: [Doc-SIG] Re: Issues with 2.1 doc PDF files
In-Reply-To: <3AE73C33.98DFD478@darwin.in-berlin.de>
References: <3AE67E91.2DADD89D@darwin.in-berlin.de>
 <3AE6A20B.D44D745F@darwin.in-berlin.de>
 <3AE72744.8FF44DD7@darwin.in-berlin.de>
 <200104252048.PAA15230@cj20424-a.reston1.va.home.com>
 <3AE73C33.98DFD478@darwin.in-berlin.de>
Message-ID: <15079.16994.274877.42370@cj42289-a.reston1.va.home.com>

Dinu Gherman writes:
 > Also, given more frequent releases, would it make sense,
 > perhaps, to indicate the Python version that a specific
 > feature/module/whatever is available from in some of the
 > standard Python documentation files?

  There is an increasing number of annotations providing versioning
information in the documentation.  If you find anything specific that
lacks information that would have been helpful, please let me know.
This is sufficient reason to file a documentation bug report:

http://sourceforge.net/tracker/?func=add&group_id=5470&atid=105470

  Be sure to set the "Category" field to "Documentation"; the bug
report will be automatically assigned to me.


  -Fred

-- 
Fred L. Drake, Jr.  
PythonLabs at Digital Creations



From dfan@harmonixmusic.com  Wed Apr 25 22:37:49 2001
From: dfan@harmonixmusic.com (Dan Schmidt)
Date: 25 Apr 2001 17:37:49 -0400
Subject: [Doc-SIG] Issues with 2.1 doc PDF files
In-Reply-To: <15079.16600.483240.541905@cj42289-a.reston1.va.home.com>
References: <3AE67E91.2DADD89D@darwin.in-berlin.de>
 <15079.12090.738565.710150@cj42289-a.reston1.va.home.com>
 
 <15079.16600.483240.541905@cj42289-a.reston1.va.home.com>
Message-ID: 

"Fred L. Drake, Jr."  writes:

| Dan Schmidt writes:
|  > I don't view the Python docs with PDF, but I read many other PDF files
|  > 'onscreen' rather than printing them out.
| 
|   Would you *like* to be able to read the Python PDF version onscreen,
| or is one of the other versions preferable for you?
|
|   I guess what I'd like to figure out is how many people would like to
| use a version that they're not using because there's some impediment
| (files are too large to d/l, lacks functionality, has formatting
| problems, etc.).

I guess the HTML version is the most useful for me right now.  The
Info version would be if it were up to date.  So in the specific case
of Python, I don't really need the pdf.

However, as a general principle, when I do download a .pdf file, I
expect to be able to read it on-screen.  That may not be relevant to
the question you're asking, though.

-- 
http://www.dfan.org



From guido@digicool.com  Thu Apr 26 00:19:01 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 25 Apr 2001 18:19:01 -0500
Subject: [Doc-SIG] Re: Issues with 2.1 doc PDF files
In-Reply-To: Your message of "Wed, 25 Apr 2001 23:05:55 +0200."
 <3AE73C33.98DFD478@darwin.in-berlin.de>
References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <3AE6A20B.D44D745F@darwin.in-berlin.de> <3AE72744.8FF44DD7@darwin.in-berlin.de> <200104252048.PAA15230@cj20424-a.reston1.va.home.com>
 <3AE73C33.98DFD478@darwin.in-berlin.de>
Message-ID: <200104252319.SAA15920@cj20424-a.reston1.va.home.com>

> Guido van Rossum wrote:
> > 
> > > Ok, I learned that this is said to be expected behaviour
> > > in 2.x. Still, I think it should be documented in the
> > > main FAQ and not only in this one:
> > >
> > >   http://www.python.org/cgi-bin/moinmoin/FrequentlyAskedQuestions#line24
> > 
> > Rather than complaining, you can do it yourself.  The main FAQ's
> > password is "Spam".
> 
> Great, I didn't know that! What is still unclear to me
> is if there is a real need for two FAQs (and maybe more 
> in the future)? Especially, as the MoinMoin one seems to 
> be unreachable from python.org and python.org/search.
> To me right now it looks like leading people astray with-
> out a good reason.

The MoinMoin was an experiment because the FAQ appeared cumbersome to
maintain.  I'm not convinced that it worked.  The problem is, somebody
needs to own the FAQ and it ain't gonna be me, so until someone picks
it up, it's like my backyard -- a wasteland with great potential but
mostly collecting piles of dead leaves...

> Also, given more frequent releases, would it make sense,
> perhaps, to indicate the Python version that a specific
> feature/module/whatever is available from in some of the
> standard Python documentation files?

Carefully study the official Python docs -- they already indicates
the version where something is introduced or changed.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Thu Apr 26 00:26:56 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 25 Apr 2001 18:26:56 -0500
Subject: [Doc-SIG] Issues with 2.1 doc PDF files
In-Reply-To: Your message of "Wed, 25 Apr 2001 17:25:44 -0400."
 <15079.16600.483240.541905@cj42289-a.reston1.va.home.com>
References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <15079.12090.738565.710150@cj42289-a.reston1.va.home.com> 
 <15079.16600.483240.541905@cj42289-a.reston1.va.home.com>
Message-ID: <200104252326.SAA15953@cj20424-a.reston1.va.home.com>

> Would you *like* to be able to read the Python PDF version onscreen,
> or is one of the other versions preferable for you?

I would surmise that for almost everybody, HTML wins big over PDF
onscreen, and PDF wins big over HTML for printing.  Soon, I bet we
won't have to distribute PostScript any more, because everyone can use
PDF for printing.  But for on-screen viewing, the pagination of PDF
quickly gets annoying.  This is quite independent from the content,
and applies to any kind of documentation, not just Python's.

Now, some *producers* of information prefer to only give you PDF even
for on-screen viewing, because it gives them more control over fonts
and lay-out.  Occasionally (e.g. with detailed drawings where zooming
in is actually useful) I see the point, but usually browsing PDF just
annoys me.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake@acm.org  Wed Apr 25 23:31:35 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 25 Apr 2001 18:31:35 -0400 (EDT)
Subject: [Doc-SIG] Issues with 2.1 doc PDF files
In-Reply-To: <200104252326.SAA15953@cj20424-a.reston1.va.home.com>
References: <3AE67E91.2DADD89D@darwin.in-berlin.de>
 <15079.12090.738565.710150@cj42289-a.reston1.va.home.com>
 
 <15079.16600.483240.541905@cj42289-a.reston1.va.home.com>
 <200104252326.SAA15953@cj20424-a.reston1.va.home.com>
Message-ID: <15079.20551.752990.597538@cj42289-a.reston1.va.home.com>

Guido van Rossum writes:
 > I would surmise that for almost everybody, HTML wins big over PDF

  One reason for asking questions like these is to determine how
useful what we surmise is, compared with other peoples' expectations.
I have a pretty good idea what you & I surmise on this topic, but
that's different from knowing what others are looking for.


  -Fred

-- 
Fred L. Drake, Jr.  
PythonLabs at Digital Creations



From mwh21@cam.ac.uk  Wed Apr 25 23:54:31 2001
From: mwh21@cam.ac.uk (Michael Hudson)
Date: 25 Apr 2001 23:54:31 +0100
Subject: [Doc-SIG] Issues with 2.1 doc PDF files
In-Reply-To: Guido van Rossum's message of "Wed, 25 Apr 2001 18:26:56 -0500"
References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <15079.12090.738565.710150@cj42289-a.reston1.va.home.com>  <15079.16600.483240.541905@cj42289-a.reston1.va.home.com> <200104252326.SAA15953@cj20424-a.reston1.va.home.com>
Message-ID: 

Guido van Rossum  writes:

> I would surmise that for almost everybody, HTML wins big over PDF
> onscreen, and PDF wins big over HTML for printing.  Soon, I bet we
> won't have to distribute PostScript any more, because everyone can use
> PDF for printing.

Well, if I were to print out the python docs anytime soon (I'm not)
I'd definitely reach for the postscript.  OTOH, I'd also probably
build it locally.  But as long as it's not much burden to have both,
it's not that much of an issue.

Cheers,
M.

-- 
  I have a feeling that any simple problem can be made arbitrarily
  difficult by imposing a suitably heavy administrative process
  around the development.       -- Joe Armstrong, comp.lang.functional



From guido@digicool.com  Thu Apr 26 01:13:06 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 25 Apr 2001 19:13:06 -0500
Subject: [Doc-SIG] Issues with 2.1 doc PDF files
In-Reply-To: Your message of "25 Apr 2001 23:54:31 +0100."
 
References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <15079.12090.738565.710150@cj42289-a.reston1.va.home.com>  <15079.16600.483240.541905@cj42289-a.reston1.va.home.com> <200104252326.SAA15953@cj20424-a.reston1.va.home.com>
 
Message-ID: <200104260013.TAA16112@cj20424-a.reston1.va.home.com>

[me]
> > I would surmise that for almost everybody, HTML wins big over PDF
> > onscreen, and PDF wins big over HTML for printing.  Soon, I bet we
> > won't have to distribute PostScript any more, because everyone can use
> > PDF for printing.

[MH]
> Well, if I were to print out the python docs anytime soon (I'm not)
> I'd definitely reach for the postscript.

Yeah, but you're lucky to have a PS capable printer.  Heck, you're
probably on Linux.  Most Windows and even many Mac users don't!  PDF
can be printed from anywhere.  Even if your printer talks PostScript,
on Windows it's a pain to figure out how to print a PS file!  AcroRead
does it for you with PDF.

> OTOH, I'd also probably build it locally.

Lucky you.  This may be news for you, but most Python users don't know
how to use those tools any more, even if they have access.  Python is
a success -- meaning it has lots of unsophisticated users!
(Unsophisticated in their hacking abilities, not in their
intelligence, for sure -- but people who don't want to waste time
figuring out how to do something that the computer should be able to
do without their help.)

> But as long as it's not much burden to have both,
> it's not that much of an issue.

Agreed.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mwh21@cam.ac.uk  Thu Apr 26 00:31:45 2001
From: mwh21@cam.ac.uk (Michael Hudson)
Date: 26 Apr 2001 00:31:45 +0100
Subject: [Doc-SIG] Issues with 2.1 doc PDF files
In-Reply-To: Guido van Rossum's message of "Wed, 25 Apr 2001 19:13:06 -0500"
References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <15079.12090.738565.710150@cj42289-a.reston1.va.home.com>  <15079.16600.483240.541905@cj42289-a.reston1.va.home.com> <200104252326.SAA15953@cj20424-a.reston1.va.home.com>  <200104260013.TAA16112@cj20424-a.reston1.va.home.com>
Message-ID: 

Guido van Rossum  writes:

> [me]
> > > I would surmise that for almost everybody, HTML wins big over PDF
> > > onscreen, and PDF wins big over HTML for printing.  Soon, I bet we
> > > won't have to distribute PostScript any more, because everyone can use
> > > PDF for printing.
> 
> [MH]
> > Well, if I were to print out the python docs anytime soon (I'm not)
> > I'd definitely reach for the postscript.
> 
> Yeah, but you're lucky to have a PS capable printer.  Heck, you're
> probably on Linux.  Most Windows and even many Mac users don't!  PDF
> can be printed from anywhere.  Even if your printer talks PostScript,
> on Windows it's a pain to figure out how to print a PS file!  AcroRead
> does it for you with PDF.

Oh, I know, I know.  But your comment I was replying to said "Soon, I
bet we won't have to distribute PostScript any more...".

> > OTOH, I'd also probably build it locally.
> 
> Lucky you.  This may be news for you, but most Python users don't know
> how to use those tools any more, even if they have access.

Wot, typing "make ps"? 

I'm aware I'm hardly a typical Python user.

Cheers,
M.

-- 
  Those who have deviant punctuation desires should take care of their
  own perverted needs.                  -- Erik Naggum, comp.lang.lisp



From tim.one@home.com  Thu Apr 26 00:43:20 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 25 Apr 2001 19:43:20 -0400
Subject: [Doc-SIG] Issues with 2.1 doc PDF files
In-Reply-To: <200104252326.SAA15953@cj20424-a.reston1.va.home.com>
Message-ID: 

[Guido]
> ...
> But for on-screen viewing, the pagination of PDF quickly gets
> annoying.  This is quite independent from the content, and applies
> to any kind of documentation, not just Python's.

I confess PDF grows on me over time, and especially since I figured out how
to tell Acrobat Reader to view stuff in "continuous mode" (== the page
boundaries are still there, but scrolling pays no attention to them).  The
one great advatage of PDF over HTML is whole-document searching, although
(like it or not ) Microsoft .chm format adds a form of that to
HTML-based docs too.



From gherman@darwin.in-berlin.de  Thu Apr 26 09:19:34 2001
From: gherman@darwin.in-berlin.de (Dinu Gherman)
Date: Thu, 26 Apr 2001 10:19:34 +0200
Subject: [Doc-SIG] Issues with 2.1 doc PDF files
References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <15079.12090.738565.710150@cj42289-a.reston1.va.home.com>
Message-ID: <3AE7DA16.8387BB44@darwin.in-berlin.de>

Hi Fred,

ok, I did some investigation into that topic, basically some
comparisons of PDF files across different realeases. The pre-
liminary result is, maybe, quite interesting.

"Fred L. Drake, Jr." wrote:
> 
> Dinu Gherman writes:
>  >
>  > - file sizes are much bigger for 2.1 (50-100%)
> 
>   Yeah, I thought they looked a little large, but wasn't sure why.
> I've also noticed (and this isn't new) that the A4 versions are quite
> a bit larger than the US-Letter versions: about 50% for the PDF, not
> so much for the PostScript.  I have no idea why this would be the
> case.  I presume you were looking at the A4 version.

I think there shouldn't be any reason why this has to be like
that! In fact, if you compare a file like ref.pdf for 2.0 and
2.1 by listing the used fonts in Acrobat Reader via the menu
File -> Document Info -> Fonts you'll see that in the A4 ver-
sions the 2.1 file lists Helvetica, Helvetica-Oblique and
Times-Roman. But the 2.0 version lists a whole bunch of CM*
fonts which is TeX's ancient Computer Modern family. You can
actually *see* the difference if you sufficiently magnify, 
say, the document title on the front page and the version
number below it.

For me the reason why this 2.1 A4 version of ref.pdf is about
twice the size of the corresponding 2.0 file is that these
CM fonts are embedded in the PDF! 

The reason why this effect cannot be observed for the corres-
ponding PDF letter verions of the same document is that they 
both do not use embedded CM fonts! 

I haven't verified this for all other documents, but it seems
like a good reason to me explaining the general difference in
size between A4 and letter PDFs. Now, *why* the A4 files do
contain embedded fonts is an entirely different question! ;-)

>  > - fonts are approximated with pixelized bitmaps
> 
>   That's not good.

This is also an entirely different issue, as the tut.pdf in
A4 for 2.1 doesn't contain any normal fonts at all, but only
bitmaps as you can also find out doing the same research in 
Acrobat Reader!

>  > As a result you get:
>  >
>  > - much longer page building times in PDF readers
>  > - considerable longer search times
>  > - much longer print times (probably - haven't checked)
>  >
>  > I've not verified this for each PDF file (it's prominently
>  > obvious for the tutorial, though), but I assume the same
>  > effects can be observed for all of them, when comparing with
>  > corresponding files from the previous release.
> 
>   This would be something to look at -- recall that we added some
> magic to the tutorial to control interpretation of the document
> encoding, so that some Latin-1 characters would be typeset correctly.
> (This was at your prodding, as I recall!  ;-)  Could you please look
> at at least one of the other documents to see if they exhibit the same
> symptoms?

Apparently, the only line you added to do this is this one:

  \usepackage[T1]{fontenc}

which should do the job. I also use this:

  \usepackage[latin1]{inputenc}

but I just found out I can do without. In general I use this
line when running pdf(La)TeX over the sources:

\usepackage[pdftex,
  plainpages=false, 
  colorlinks=true, 
  bookmarks=true, 
  bookmarksnumbered=true,
  linkcolor=blue]{hyperref}

I haven't seen anything equivalent in the official LaTeX sources,
though, so I'm not quite sure how these are build...

>  > So, I'm really curious what the reason for this phenomenon
>  > could be? If there isn't any, I suggest reproducing the
>  > files to no longer show the described effects as they will
> 
>   If you can tell me how to control these things, I'm sure we can
> build another distribution.  I have no idea how to control this --
> this goes deeper into the LaTeX/pdfLaTeX magic than I'm familiar
> with.

Well, I can tell you what I do to create these documents on 
my box. I'm using vanilla MiKTeX 2 on Win2K, which gives me
the following version number for pdftex:

  C:\>pdftex
  This is pdfTeX, Version 3.14159-14f-released-20000525 (MiKTeX 2)
  **

Using this I get PDFs without any pixelized or mebedded fonts,
and without applying any additonal magic.

For all the offical PDF documentation files I've checked I get 
via File -> Document Info -> General this: pdfTeX-0.13d. Those 
that I'm producing myself say pdfTeX-0.14f. This might be a 
reason and it might be not - I don't know, probably not.

In any case something strange was going on when the A4 PDFs 
were produced (leading to embedded CM fonts) and something 
very strange happened for the 2.1 A4 PDF tutorial (bitmaps
throughout). BTW, the former (including file size differen-
ces) can be also observed on the official PS files.

This is about all I can do in a reasonable time frame to pro-
vide a good starting point for further research. I'm not con-
sidering myself a TeX guru or something like that, so I'll 
need to pass this on to somebody else here...

Regards,

Dinu


From hernan@orgmf.com.ar  Fri Apr 27 10:32:58 2001
From: hernan@orgmf.com.ar (Hernan Martinez Foffani)
Date: Fri, 27 Apr 2001 11:32:58 +0200
Subject: [Doc-SIG] Issues with 2.1 doc PDF files
In-Reply-To: 
Message-ID: 

 de Tim Peters
 > ....  The one great advatage of PDF over HTML is whole-document
 > searching, although
 > (like it or not ) Microsoft .chm format adds a form of that to
 > HTML-based docs too.

That's why I built them.
The other great advantage of PDF is that it is portable.
It's a pity that the Microsoft .chm format isn't portable even to the
platforms that can run their browser. (Or at least, that's what I've been
told.)
I tried JavaHelp and it is too slow for practical use on big packages.
May be in the near future Mozilla can drive a suitable online Help engine.

-H.