From fdrake@acm.org Sun Apr 1 16:55:48 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sun, 1 Apr 2001 11:55:48 -0400 (EDT) Subject: [Doc-SIG] Parro-DL -- new documentation project Message-ID: <15047.20356.586507.158341@beowolf.pythonlabs.org> Announcing a new joint documentation effort... Parro-DL Parrot Documentation Language With the public announcement of the development of Parrot (see http://use.perl.org/article.pl?sid=01/03/31/206248 and http://www.python.org/parrot.htm), a new documentation effort is being initiated to provide developer information on the new language and its libraries. Guido van Rossum and Larry Wall, joint creators of the new language, are both aware of the significance of quality documentation in the adoption of Parrot. Shortly after the decision to create Parrot, they enlisted Fred Drake and Tom Christiansen to begin work on the documentation system for Parrot. The two advocates of language and library documentation have collaborated privately for the past six months to design a new markup language that can be embedded into the language or used indepentently, similar to POD, but which allows richer semantic markup similar to the LaTeX-based markup used by the Python documentation project. Drake and Christiansen expect to release the reference manual for the new markup language, call Parro-DL (for "Parrot Documentation Language") within two weeks. The specification, which weighs in at about 150 typeset pages, was written in Parro-DL and is processed by new tools written using an early prototype interpreter for the Parrot language. The specification includes information on syntax, linguistic integration, and processing expectations. ISO standardization is expected to be complete in 3rd quarter of 2006. Drake and Christiansen are joining their efforts to organize a documentation project dedicated to producing free documentation for Parrot to avoid a monopoly on the reference documentation by the technical publisher O'Reilly. The effort will be subsidized by their new joint venture, Iterpolated Documentation Systems. Offices for the new firm will be located in Chicago. Drake's separation from PythonLabs came as a surprise to his colleagues there. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From vbruand@infonie.fr Mon Apr 2 15:17:36 2001 From: vbruand@infonie.fr (vbruand) Date: Mon, 2 Apr 2001 16:17:36 +0200 Subject: [Doc-SIG] LaTeX question References: Message-ID: <001f01c0bb7f$ab138dc0$f375f2c3@letitbeii> Thanks a lot... I have tried already \~{} and a symbol called \sim ( or something like that, I have found it in the later help) I do really think it's a problem is the \url thing. Bye. ----- Original Message ----- From: "Edward Welbourne" To: Cc: Sent: Saturday, March 31, 2001 5:05 PM Subject: Re: [Doc-SIG] LaTeX question > > I don't understand what I should exactly write instead of tilde. ( > > because %7e counts as a remark nor does \symbol{"7e}) > Have you tried \~{} > > > the tilde character ('~') is mis-handled; > hrm. By the \url{} directive ? > > Anyhow, being confused by what you're saying, here's what's special > about tilde in TeX: > > The ~ character is TeX's non-breaking space. You can obtain a ~ accent > on a letter, e.g. n, by writing \~n or \~{n} and, in the second form, > you can use \~{} to give \~ nothing to put its accent on, so it gives > you a ~ character, of sorts. > > There may also be something like \tilde somewhere in TeX's huge > vocabulary of defined names, but I don't know it. > > However, the problem with \url{url} may be that the \url command does > some weird things to its arguments which make a mess of the results. > The answer in such a case would be to fix the definition of \url ... > anyone tell me where the relevant definition is in a .sty file or > similar and I'll see what I can do to it. LaTeX is infinitely flexible, > albeits internals nearly unmaintainably ugly. > > Eddy. > From guido@digicool.com Mon Apr 2 22:20:57 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 02 Apr 2001 16:20:57 -0500 Subject: [Doc-SIG] syntax vs semantics: implicit --> explicit In-Reply-To: Your message of "Fri, 30 Mar 2001 13:57:03 EST." References: Message-ID: <200104022120.QAA04326@cj20424-a.reston1.va.home.com> > (Last year I wrote a chapter on Python for Wrox Press' "Professional > Linux Programming". I would have been much happier using a complete > ST-like markup than futzing around in MSWord.) I've a feeling that one reason the doc-sig is going around in circles is the tension between the needs of formatting docstrings and the needs of formatting larger documents. For docstrings, there's an explicit requirement (although not everybody gives it the same weight) that the source is pleasantly readable as plain text. For authoring larger documents like your Wrox chapter, that argument doesn't have the same importance: wat you want is easy authoring, which is subtly but importantly different. I propose that this time around, we should focus on docstrings only, and not on authoring other documents, lest we never reach an agreement. (Aside: I'd like to know more about why you think Word didn't work for you; I wonder if it could be unfamiliarity with advanced Word features? When using styles properly, Word is quite a capable authoring tool -- depending, of course, on the processing done by the publisher.) --Guido van Rossum (home page: http://www.python.org/~guido/) From dgoodger@atsautomation.com Mon Apr 2 21:46:43 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Mon, 2 Apr 2001 16:46:43 -0400 Subject: [Doc-SIG] syntax vs semantics: implicit --> explicit Message-ID: GvR wrote: > I propose that this time around, we should focus on docstrings only, > and not on authoring other documents, lest we never reach an > agreement. Agreed. But how long is a docstring? :> I like to put mini-to-full man-pages in my programs (accessible through --help), with section headers; perhaps I'm the exception. > (Aside: I'd like to know more about why you think Word didn't work for > you; I wonder if it could be unfamiliarity with advanced Word > features? No, I'm quite comfortable with Word and its features (having actually *read* the fine manual ... some versions ago, I admit). > When using styles properly, Word is quite a capable > authoring tool -- depending, of course, on the processing done by the > publisher.) It was a matter of ill-defined styles in the publisher's stylesheet, Word quirks, cross-platform mangling, inter-version incompatibilities (I refuse to make M$ richer whenever they change a digit), and editorial bungling (I made sure all my curly quotes were right, and somebody converted them all to straight-quotes). Plus the stress of an all-nighter. Add a crash or two. I've never lost work due to an emacs crash! /DG From guido@digicool.com Tue Apr 3 03:05:24 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 02 Apr 2001 21:05:24 -0500 Subject: [Doc-SIG] syntax vs semantics: implicit --> explicit In-Reply-To: Your message of "Mon, 02 Apr 2001 16:46:43 -0400." References: Message-ID: <200104030205.VAA05100@cj20424-a.reston1.va.home.com> > > I propose that this time around, we should focus on docstrings only, > > and not on authoring other documents, lest we never reach an > > agreement. > > Agreed. But how long is a docstring? :> I like to put mini-to-full man-pages > in my programs (accessible through --help), with section headers; perhaps > I'm the exception. How important is it that those mini-man pages are readable as part of the source? I've sometimes put arbitrary data in Python programs, for bundling reasons, but not necessarily cared too much about how it looks in the Python source (or even how easily editable it is). I'm just trying to figure out if your requirements really fit in the requirements for docstrings. It still sounds like you're stretching things a bit. I'm trying to argue for a smaller set of requirements, so we can make progress. > It was a matter of ill-defined styles in the publisher's stylesheet, Word > quirks, cross-platform mangling, inter-version incompatibilities (I refuse > to make M$ richer whenever they change a digit), and editorial bungling (I > made sure all my curly quotes were right, and somebody converted them all to > straight-quotes). Plus the stress of an all-nighter. Add a crash or two. Sounds like a stretch to blame it on Word. --Guido van Rossum (home page: http://www.python.org/~guido/) From dgoodger@atsautomation.com Tue Apr 3 14:46:00 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Tue, 3 Apr 2001 09:46:00 -0400 Subject: [Doc-SIG] syntax vs semantics: implicit --> explicit Message-ID: > How important is it that those mini-man pages are readable as part of > the source? In the case of the man-pages, not terribly. They could be in a separate file. I guess I'm a proponent of the "keep it all together" school. One of the advantages of docstrings: they're *there*, easily accessible, no dependency on possibly nonexistant external files and all those headaches. > I'm trying to argue for a smaller set of requirements, so we can make > progress. You've convinced me. I hope to spend a few hours over the next week or so revising my spec, maybe even turning it into a PEP (to complement/compete with Tibs' & Edward L's work). Watch this space... > Sounds like a stretch to blame it on Word. Such an easy target! /DG From support@internetdiscovery.com Fri Apr 6 08:42:41 2001 From: support@internetdiscovery.com (Mike Clarkson) Date: Fri, 06 Apr 2001 00:42:41 -0700 Subject: [Doc-SIG] Python documentation in info? Message-ID: <3.0.6.32.20010406004241.007de100@popd.ix.netcom.com> Is there a version of the Python documentation in info format? I looked in the canonical place but couldn't find it. I tried regenerating the info from the html in the Doc tree, but the perl script to do it is missing some HTML:: packages. Does anyone know the right packages and versions for these? Will it work with these packages under Perl 5? Many thanks, Mike From edloper@gradient.cis.upenn.edu Fri Apr 6 19:52:21 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 06 Apr 2001 14:52:21 EDT Subject: [Doc-SIG] which characters to use for docstring markup Message-ID: <200104061852.f36IqMp07364@gradient.cis.upenn.edu> I've been a bit busy lately, but I'm still working on coming up with a good markup language for docstrings... I was trying to figure out which characters should be used for markup.. (e.g., to delimit colored regions, etc). And so I wrote a script to see who often different characters are used in docstrings, using all the docstrings in the standard library (well, actually, in /usr/local/lib/python2.0/*.py) as a "representative" sample. Here are the results: Character Count Module Count Character -------------------------------------------------- 1 1 ^H 10 3 ^M 11 4 ^ 12 5 ~ 13 10 { 13 10 } 16 6 % 28 7 $ 48 12 ? 50 20 ! 70 16 ` 75 8 & 87 12 \ 108 18 + 130 12 | 197 7 @ 222 22 * 229 20 # 269 35 ] 277 36 [ 313 44 = 331 53 ; 421 48 / 441 46 " 514 23 < 663 67 : 779 54 _ 875 28 > 1302 75 ' 1858 94 ( 1874 94 ) 2145 97 , 2277 92 - 3413 110 . 1. Any character(s) that are used for markup will have to be either backslashed/quoted whenever they are used, or will have to be only allowed in literal blocks. Clearly, we want to keep either of these to a minimum. 2. These results suggest that using perldoc style coloring, like B, may not be the best idea, given that '<' and '>' are used so often. This is because people often talk about orderings between elements, like x>y. We might be better off using B{this} instead. '<' and '>' are used 53 times more frequently than '{' and '}'. 3. It makes much more sense to use "`" rather than "'" for literals, since "'" occurs 18 times more often. Of course, we would probably want to use *either* "`" for literals *or* something like L{literal} or C{code} or whatever. 4. You should keep in mind that any of these characters will be used in the docstring for *something* (well, actually, I was surprised to see a backspace in a docstring..). So, for the most part, it's a matter of inconveniencing the least number of people the least amount of time.. I'm leaning towards using either:: C{code}, E{emph} etc. or:: `literal` and *one* *word* *emph* (and that's it) to color code in my markup. Any comments? -Edward p.s., I'll probably have a preliminary description of my proposed markup language in about 2 weeks.. I hope. :) From tim.one@home.com Fri Apr 6 20:10:21 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 6 Apr 2001 15:10:21 -0400 Subject: [Doc-SIG] which characters to use for docstring markup In-Reply-To: <200104061852.f36IqMp07364@gradient.cis.upenn.edu> Message-ID: [Edward D. Loper] > ... > 3. It makes much more sense to use "`" rather than "'" for ... In the font I'm using to view this email, I can't see the difference between those suggestions . > ... > I'm leaning towards using either:: > > C{code}, E{emph} etc. > > or:: > > `literal` and *one* *word* *emph* (and that's it) > > to color code in my markup. Any comments? I happen to like the former better, because it's extensible and unambiguous. It wasn't suitable for Perl because, e.g., $C{$i} is legit Perl code. And they invented the C<$i> notation before "->" (as in $A->[$i]) thingies were added to the language. X{...} is never legit Python syntax today, and seems very unlikely it ever will be. From guido@digicool.com Fri Apr 6 22:06:19 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 06 Apr 2001 16:06:19 -0500 Subject: [Doc-SIG] which characters to use for docstring markup In-Reply-To: Your message of "Fri, 06 Apr 2001 14:52:21 EDT." <200104061852.f36IqMp07364@gradient.cis.upenn.edu> References: <200104061852.f36IqMp07364@gradient.cis.upenn.edu> Message-ID: <200104062106.QAA15927@cj20424-a.reston1.va.home.com> > I've been a bit busy lately, but I'm still working on coming up with > a good markup language for docstrings... > > I was trying to figure out which characters should be used for > markup.. (e.g., to delimit colored regions, etc). And so I wrote > a script to see who often different characters are used in > docstrings, using all the docstrings in the standard library (well, > actually, in /usr/local/lib/python2.0/*.py) as a "representative" > sample. You should also look into /usr/local/lib/python2.0/*/*.py -- that's a vast collection of code, e.g. Tkinter.py. [Table omitted] > 1. Any character(s) that are used for markup will have to be either > backslashed/quoted whenever they are used, or will have to be > only allowed in literal blocks. Clearly, we want to keep either > of these to a minimum. > 2. These results suggest that using perldoc style coloring, like > B, may not be the best idea, given that '<' and '>' are > used so often. This is because people often talk about orderings > between elements, like x>y. We might be better off using B{this} > instead. '<' and '>' are used 53 times more frequently than > '{' and '}'. But you counted single characters. I grepped for '[A-Z]<' and found none that occurred in docstrings. (The actual re should be r'\B[A-Z]<'; I believe the POD rules ask for a single upper case letter before the <. Now, there's one significant use of [A-Z]< that might trip us up: the regular expression syntax (?P<...>...). I certainly could see this being useful in docstrings for methods that take regular expression argument. There's also one use of [A-Z]{: \N{...} means something in Unicode literal syntax. > 3. It makes much more sense to use "`" rather than "'" for > literals, since "'" occurs 18 times more often. Of course, we > would probably want to use *either* "`" for literals *or* > something like L{literal} or C{code} or whatever. I don't like `...`, because (a) it means something very specific in Python (and in the Unix shell), (b) it's hard to distinguish from '...' in some fonts, and (c) except for the `...` Python and shell notation, I expect ` to be closed with '. > 4. You should keep in mind that any of these characters will be used > in the docstring for *something* (well, actually, I was surprised > to see a backspace in a docstring..). Where? > So, for the most part, it's > a matter of inconveniencing the least number of people the least > amount of time.. > > I'm leaning towards using either:: > > C{code}, E{emph} etc. > > or:: > > `literal` and *one* *word* *emph* (and that's it) > > to color code in my markup. Any comments? I still like C and *multi word emph* better. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From gward@python.net Fri Apr 6 22:22:17 2001 From: gward@python.net (Greg Ward) Date: Fri, 6 Apr 2001 17:22:17 -0400 Subject: [Doc-SIG] which characters to use for docstring markup In-Reply-To: <200104061852.f36IqMp07364@gradient.cis.upenn.edu>; from edloper@gradient.cis.upenn.edu on Fri, Apr 06, 2001 at 02:52:21PM -0400 References: <200104061852.f36IqMp07364@gradient.cis.upenn.edu> Message-ID: <20010406172217.B2749@gerg.ca> On 06 April 2001, Edward D. Loper said: > I'm leaning towards using either:: > > C{code}, E{emph} etc. > > or:: > > `literal` and *one* *word* *emph* (and that's it) > > to color code in my markup. Any comments? I definitely prefer C{code} or E or E{emph} or whatever. (I'm not terribly concerned about the shape of the brackets used.) Guido's point about C<> being OK because the (opening) regex would be r'[A-Z]<' missed one nit: code samples like C 5> are ambiguous; the ">" would have to be escaped. But then, with curly braces, C{d = {'a': 37}} is ambiguous. So whatever delimiter you pick -- and I can live with <> or {} -- there must be a simple escaping mechanism. I definitely prefer backslash to POD's E hack: C 5> vs. C 5>. The former is yucky, the latter is super-yucky. Backslash also means you can escape anything; with POD's E<> escaping mechanism, there has to be alternate spelling for any characters you want to escape, eg. "gt" for ">". Yuck. Greg -- Greg Ward - programmer-at-big gward@python.net http://starship.python.net/~gward/ I used to be a FUNDAMENTALIST, but then I heard about the HIGH RADIATION LEVELS and bought an ENCYCLOPEDIA!! From guido@digicool.com Fri Apr 6 23:31:38 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 06 Apr 2001 17:31:38 -0500 Subject: [Doc-SIG] which characters to use for docstring markup In-Reply-To: Your message of "Fri, 06 Apr 2001 17:22:17 -0400." <20010406172217.B2749@gerg.ca> References: <200104061852.f36IqMp07364@gradient.cis.upenn.edu> <20010406172217.B2749@gerg.ca> Message-ID: <200104062231.RAA21298@cj20424-a.reston1.va.home.com> > I definitely prefer C{code} or E or E{emph} or whatever. (I'm not > terribly concerned about the shape of the brackets used.) > > Guido's point about C<> being OK because the (opening) regex would be > r'[A-Z]<' missed one nit: code samples like C 5> are ambiguous; the > ">" would have to be escaped. But then, with curly braces, > C{d = {'a': 37}} is ambiguous. Good point! > So whatever delimiter you pick -- and I can live with <> or {} -- there > must be a simple escaping mechanism. I definitely prefer backslash to > POD's E hack: C 5> vs. C 5>. The former is yucky, the > latter is super-yucky. Backslash also means you can escape anything; > with POD's E<> escaping mechanism, there has to be alternate spelling > for any characters you want to escape, eg. "gt" for ">". Yuck. I think I'd prefer to have to write \> for > than \} for }, so I *still* prefer C<> over C{}. --Guido van Rossum (home page: http://www.python.org/~guido/) From hernan@orgmf.com.ar Fri Apr 6 23:21:38 2001 From: hernan@orgmf.com.ar (Hernan Martinez Foffani) Date: Sat, 7 Apr 2001 00:21:38 +0200 Subject: [Doc-SIG] which characters to use for docstring markup In-Reply-To: <200104062231.RAA21298@cj20424-a.reston1.va.home.com> Message-ID: Eduard on a mini-markup language for docstrings propose: "... either C{code}, E{emph} etc. or `literal` and *one* *word* *emph* (and that's it)..." Guido: "... I still like C and *multi word emph* better. :-)" Tim: "... C{code}, E{emph} ..." Greg: "... C{code} or E or E{emph} or whatever ..." And it's just a two-tags markup language! :-) Sorry guys, it's almost 1:00AM and couldn't resist! :-) Seriously now. Here are my 2 cents just to fill the email with something other than a stupid joke. I don't care the tag syntax if: a) either not many more than the "two-tags ML", or I don't have to be aware of the rest of the tags while commenting code. b) escape = backslash Again :-) Have a nice weekend! -Hernan -- Hernán Martínez Foffani hernan@orgmf.com.ar http://www.orgmf.com.ar/condor/ From fdrake@cj42289-a.reston1.va.home.com Sat Apr 7 06:45:43 2001 From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake) Date: Sat, 7 Apr 2001 01:45:43 -0400 (EDT) Subject: [Doc-SIG] [development doc updates] Message-ID: <20010407054543.226DD2879A@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Lots of small fixes, but also the first installment of the unittest documentation. From support@internetdiscovery.com Sat Apr 7 22:21:48 2001 From: support@internetdiscovery.com (Mike Clarkson) Date: Sat, 07 Apr 2001 14:21:48 -0700 Subject: [Doc-SIG] which characters to use for docstring markup In-Reply-To: <200104062106.QAA15927@cj20424-a.reston1.va.home.com> References: <200104061852.f36IqMp07364@gradient.cis.upenn.edu> Message-ID: <3.0.6.32.20010407142148.00847660@popd.ix.netcom.com> At 04:06 PM 4/6/01 -0500, Guido van Rossum wrote: >> I've been a bit busy lately, but I'm still working on coming up with >> a good markup language for docstrings... FYI, I have the HappyDoc formatter/docstring extractor (happydoc.sourceforge.net), generating standard Python documentation LaTeX from docstrings. It's kind of nice, because it means that I immediately have all of the python.sty features available to me for crosseferencing etc., plus it immediately gives me my docstring derived documents in PDF, PS, HTML, and info (if I can get info working again). Because the delimiters are the standard \backslash and curlys, it means you can convieniently take advantage of r"""strings"" to protect the backslash markup in docstrings: def foo(): r""" My \code{foo} function \emph{breaks} the \module{bar} module.\index{Foos and Bars} """ pass Seems that this is legal Python, and allows HappyDoc to use Python to get the class structure with the docstrings. Also, the TeX convention of a blank line to mark a paragraph is really useful. Would this work for what you are wanting to do with docstring markup? The advantages I see is that: 1) It's quite complete for all of the entended uses (\emph(...}) Because it's more or less TexInfo compatible, most people know it or can learn it easily, even if you don't know LaTeX. 2) It means you can cut and paste between docstrings and the formal module documentation for Doc/. 3) The macros/commands are already completely documented, and the documentation for them ships with the core distribution. 4) It would reinforce the use of the Doc/ tools. 5) It would reinforce the use of HappyDoc (semi-literate programmming). 6) It would avoid yet another documentation markup language. 7) It's likely to be mainly backward compatible - I doubt many docstrings use \ a lot. On the other hand, I bet a lot of them use blank line as a paragraph seperator. 8) No need for syntax discussions, as it's already done :-) Mike. From guido@digicool.com Sat Apr 7 23:54:48 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 07 Apr 2001 17:54:48 -0500 Subject: [Doc-SIG] which characters to use for docstring markup In-Reply-To: Your message of "Sat, 07 Apr 2001 14:21:48 MST." <3.0.6.32.20010407142148.00847660@popd.ix.netcom.com> References: <200104061852.f36IqMp07364@gradient.cis.upenn.edu> <3.0.6.32.20010407142148.00847660@popd.ix.netcom.com> Message-ID: <200104072254.RAA24765@cj20424-a.reston1.va.home.com> > At 04:06 PM 4/6/01 -0500, Guido van Rossum wrote: > >> I've been a bit busy lately, but I'm still working on coming up with > >> a good markup language for docstrings... That's a bogus attribution. I said no such thing. Ed Loper did. --Guido van Rossum (home page: http://www.python.org/~guido/) From edloper@gradient.cis.upenn.edu Sun Apr 8 20:20:33 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 08 Apr 2001 15:20:33 EDT Subject: [Doc-SIG] which characters to use for docstring markup In-Reply-To: Your message of "Fri, 06 Apr 2001 16:06:19 CDT." <200104062106.QAA15927@cj20424-a.reston1.va.home.com> Message-ID: <200104081920.f38JKXp25694@gradient.cis.upenn.edu> > But you counted single characters. I grepped for '[A-Z]<' and found > none that occurred in docstrings. (The actual re should be > r'\B[A-Z]<'; I believe the POD rules ask for a single upper case > letter before the <. Well, presumably the occurance of '[A-Z]{' will be comperably small. However, it's not just the open delimiters that we have to worry about. You can't include a close delimiter in a colored region. For example, if you want to put "x > y" in a "code" region, then you can't:: C y> There's no way for it to know that the first ">" isn't a close delimiter. Similarly for bold, etc. Also, there's a question of how context- sensitive we want our delimiters to be. It may confuse people that they can say:: x Now, there's one significant use of [A-Z]< that might trip us up: the > regular expression syntax (?P<...>...). I certainly could see this > being useful in docstrings for methods that take regular expression > argument. This may be important, but I see it as less important than the issues of using < and > to mean less-than and greater-than. > There's also one use of [A-Z]{: \N{...} means something in > Unicode literal syntax. I agree that there will be cases where any character gets used. But I would argue that, in these cases, we should either use literal blocks (do you really need to say "\N{...}" in a paragraph? Maybe..) or use some sort of backslashing. (But again, let's come back to the discussions of backslashing). > I don't like `...`, because (a) it means something very specific in > Python (and in the Unix shell), (b) it's hard to distinguish from > '...' in some fonts, and (c) except for the `...` Python and shell > notation, I expect ` to be closed with '. I'm leaning more towards the L{...} syntax anyway. Although I would argue against b on the count that, if you're viewing it in a non-parsed form, then you're viewing it in your source-code editor, and presumably you chose a font for your source-code editor in which you can distinguish "'" and "`", since they mean very different things in Python. Even if you did choose a different font for your docstring comments, you'd still like to be able to distinghish "'x'" and "`x`" when you read a doctest block.. so presumably you'd pick a font in which you can..? > > 4. You should keep in mind that any of these characters will be used > > in the docstring for *something* (well, actually, I was surprised > > to see a backspace in a docstring..). > > Where? In the sre module docstring, if I remember correctly. > I still like C and *multi word emph* better. :-) I was thinking of these as mutually exclusive. If we're going to use C or C{code} or whatever, we might as well use E. No need to go removing even more characters from docstring writers' repetoires. (Incidentally, multiword emph can be somewhat dangerous.. Especially if you let people have docstrings like "*hi" and "bye*", where the '*'s get parsed as normal asterisks.. People will get confused.. The problem is made worse if *multi word emphs* can span lines.. But then, if they can't, then word-rewrapping won't work as expected. etc...) -Edward From edloper@gradient.cis.upenn.edu Sun Apr 8 20:40:39 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 08 Apr 2001 15:40:39 EDT Subject: [Doc-SIG] which characters to use for docstring markup In-Reply-To: Your message of "Sat, 07 Apr 2001 14:21:48 PDT." <3.0.6.32.20010407142148.00847660@popd.ix.netcom.com> Message-ID: <200104081940.f38Jedp27592@gradient.cis.upenn.edu> > FYI, I have the HappyDoc formatter/docstring extractor > (happydoc.sourceforge.net), > generating standard Python documentation LaTeX from docstrings. It's kind > of nice, because it means that I immediately have all of the python.sty > features available > to me for crosseferencing etc., plus it immediately gives me my docstring > derived documents in PDF, PS, HTML, and info (if I can get info working > again). The only formatters I could find for HappyDoc use StructuredTextClassic, or some variant. And many people (incl. Guido) are not terribly happy with ST. Does the formatter you're talking about use something else? What does it do about lists, etc? > def foo(): > r""" > My \code{foo} function \emph{breaks} the > \module{bar} module.\index{Foos and Bars} > """ > pass Most people have objected to "heavyweight" markup for docstrings.. i.e., they don't want to have to write docstrings in LaTeX or XML or whatever.. It *looks* like you're basically just writing docstrings using some subset of LaTeX? If so, we'd have to carefully define *which* subset, and what everything means, etc., before I would accept it. We don't want people assuming that, just because they can use \emph{...}, they can use all their other favorite LaTeX commands (we do, after all, want it to be possible to convert this to HTML, info pages, etc.) > 1) It's quite complete for all of the entended uses (\emph(...}) > Because it's more or less TexInfo compatible, most people know it > or can learn it easily, even if you don't know LaTeX. But you can't make it *too* "complete," or it won't be a standard that people can write tools to process anymore.. We don't want to just reimplement LaTeX here... > 2) It means you can cut and paste between docstrings and the formal > module documentation for Doc/. > 3) The macros/commands are already completely documented, and the > documentation for them ships with the core distribution. > 4) It would reinforce the use of the Doc/ tools. I actually am not too familiar with the Doc/ tools.. Can you give me a pointer to them? But copy/paste does seem useful. (Although it should at least be possible to write conversion tools, in any case, given a good standard). > 5) It would reinforce the use of HappyDoc (semi-literate programmming). Happydoc seems like a nice tool. Whatever markup language we settle on (if we ever do), a HappyDoc formatter will probably be implemented.. > 7) It's likely to be mainly backward compatible - I doubt many > docstrings use \ a lot. On the other hand, I bet a lot of > them use blank line as a paragraph seperator. I believe that trying to be "backward compatible" with a markup language is an extremely dangerous thing to do, esp. if your markup language is relatively "forgiving," because you probably won't *notice* the places where it gets confused. I would rather be explicitly non-backward- compatible. -Edward From edloper@gradient.cis.upenn.edu Sun Apr 8 20:59:12 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 08 Apr 2001 15:59:12 EDT Subject: [Doc-SIG] backslashing Message-ID: <200104081959.f38JxCp29343@gradient.cis.upenn.edu> Backslashing has been coming up lately. So let's go over it again. :) I used to be strongly in favor of backslashing, using '\'. After all, it's *the* standard backslashing character, and clearly we need a backslashing character, etc. etc... Now I'm not so sure anymore. The one big problem with backslashing, as I see it, arises from the following principle, which has been guiding a lot of docstring ML design: * Intuitiveness: The meaning of a well-formed formatted documentation string should be obvious to a reader, even if that reader is not familiar with the formatting conventions. Now consider what happens if a newbie user prints out a formatted docstring:: >>> print somefunc.__doc__ Somefunc will start an interactive session. When you want to exit the session, simply type "\\exit" >>> Now, the user gets confused, and types "\\exit" instead of "\exit". Anoter reasonable example might be:: >>> print otherfunc.__doc__ [...] C should use "\\d" for digits and "\\s" for whitespace. C should not include "\\w". Example use: >>> otherfunc("\\s\\d") hi there! >>> Note that this is entirely separate from issues of whether to use r"..." strings for docstrings, etc.. The problem is that, when a docstring is *printed*, it should be easy to interpret it. Of course, this problem might very well go away if people stopped printed docstrings, and started using tools (pydoc, etc), and those tools decided to take care of these things for them.. But that's probably a bit distant in the future right now. So what's the alternative? Don't allow markup characters in paragraphs, and force docstring writers to put them in literal blocks if they want to use them. Instead of:: def f(s): """ Return the number of occurances of the string "{}" in s. you would have to write:: def f(s): """ Return the number of occurances of the string:: {} in s. Is this better? I'm not sure. As someone else mentioned, it's possible (like perldoc does?) to say {lb} and {rb} or and or some such. But that would probably be just as non-intuitive as using backslashes.. Of course, if we decide that "intuitiveness" is not as important to us, then maybe we can go ahead and use backslashing anyway... -Edward From edloper@gradient.cis.upenn.edu Sun Apr 8 21:06:32 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 08 Apr 2001 16:06:32 EDT Subject: [Doc-SIG] which characters to use for docstring markup Message-ID: <200104082006.f38K6Wp00128@gradient.cis.upenn.edu> Greg: >> code samples like C 5> are ambiguous Hm, I guess I should make sure I've read *all* my email before I start writing. :) Guido: >> I think I'd prefer to have to write \> for > than \} for }, so I >> *still* prefer C<> over C{}. Is this just because you think it looks nicer? Or for some semblance of compatability with perldoc? Or is there some other reason? I guess I'm not strongly commited to C{}, but I think that it would inconvenience fewer people, since talking about order relationships (gt and lt) is fairly common.. Also, it makes for a simpler system that's easier to understand/use if you just say that delimiter characters are always delimiter characters. -Edward From edloper@gradient.cis.upenn.edu Sun Apr 8 22:46:49 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 08 Apr 2001 17:46:49 EDT Subject: [Doc-SIG] Structuring rules Message-ID: <200104082146.f38Lknp10218@gradient.cis.upenn.edu> I've been working on designing a docstring markup language, based losely on ST and a few other sources.. And I wanted to see what people thought of the structuring rules so far. Note that these rules only talk about how to indicate the *structure* of a docstring, not coloring (things like emph and inline code). There are 9 structural blocks: - basic block: these blocks do not contain other blocks - paragraph: a paragraph of text. paragraphs are the only place where coloring (emph, etc) can occur. - literal block: a block of unprocessed text, which will be displayed as-is. - doctest block: a block containing python code, which can be used by doctest. - heading: a single line of text, providing the heading for a section. - hierarchical blocks: these blocks contain other blocks - list item: a single item of a list. List items can contain paragraphs, literal blocks, doctest blocks, and lists. - list: a list. Lists contain one or more list items. - section: a section or subsection of text. Contains a heading followed by paragraphs, literal blocks, doctest blocks, lists, and sections). - field: a semantically tagged section of text. It is used to describe specific aspects of an object, like the return value, a parameter to a function, or the authors of a module. Contains paragraphs, literal blocks, doctest blocks, and lists. - top: the top-level. contains paragraphs, literal blocks, doctest blocks, lists, sections, and fields. In case you're not familiar with these blocks, here's an example:: This is a one-line paragraph. This is a multi-line paragraph. Paragraphs are usually separated by blank lines. - This is a list. - Lists consist of list items. List items may span multiple lines. List items may contain multiple paragraphs. Blocks ====== That was a top-level heading. Here's a subheading: Literal Blocks -------------- Literal blocks are introduced with double-colons, like this:: Literal / / Block And end on the first line whose indentation is equal to or less than the indentation of the paragraph that introduced them. Doctest Blocks -------------- Doctest blocks start with '>>> '. Here's a doctest block: >>> print 1+2 3 author: This is a field. This particular field should be used to describe the author of the object documented. param x: Fields can take arguments. So.. the markup language I'm defining makes a fair amount of use of the concept of "indentation." Instead of defining it right now, I'll just show it by example. I'll worry about formalizing it later:: This paragraph has an indentation of 3, since it is preceeded by 3 spaces. >>> # This doctest block has an indentation of 6 >>> print(" even if some of its lines are indented more") even if some of its lines are indented more >>> # Indentation of a doctest block is the indentation of >>> # its first line. The following literal block has an indentation of 4:: Literal Block! That's one plus the indentation of the paragraph that introduced it. - This list has an indentation of six - That's because each list item is preceeded by 6 spaces. This list item has an indentation of 8. That's because each of its paragraphs has an indentation of 8. Heading ======= That heading had an indentation of 3, since it was preceeded by 3 spaces. This section has an indentation of 6, since each of its paragraphs has an indentation of 6. Heading2 ======== This section has an indentation of 3. author: Field indentation works just like list item indentation. Now we can discuss what rules to put on indentation. These rules can be used when parsing to figure out where blocks start/end etc. I propose: - all paragraphs must be left-justified. i.e., the indentation of each line in a paragraph must be the same. - the indentation of a paragraph must be equal to the indentation of the block that contains it. - the indentation of a list must be greater than or equal to the indentation of the block that contains it. Although I might consider changing this to strictly greater than. - the indentation of a list item must be strictly greater than the indentation of the list. In other words, the following type of list item is not allowed:: - a list item where the indentation of the paragraph is equal to the indentation of the list. - the indentation of a field must be strictly greater than the indentation of the block that contains it. Thus, the following is not allowed: field: a field where the indentation of the field is equal to the indentation of the block that contains it. - the indentation of a section must be greater than or equal to the indentation of the block that contains it. But this leaves open the question of how to figure out the indentation of certain entities, such as: - a paragraph starting on the first line of a docstring - list-items with one-line paragraphs - list-items with one-line paragraphs followed only by sublists - fields with one-line paragraphs - fields with one-line paragraphs followed only by sublists For now, I'll set aside the issue of dealing with the first line of a docstring. I see two basic options for dealing with the rest of the issues. 1. The indentation of a list item is the number of characters before the first non-space character following the bullet. Thus, the following would be an invalid list item:: - This is a list item, where the number of characters before the first non-space non-bullet character on the first line doesn't match the indentation of the subsequent lines. But you could say, for example:: - List item - sublist item 1 - sublist item 2 Another paragraph in the top-level list-item. Note that its identation matches "List item"'s indentation. You could also say something like:: 1. A list item that spans multiple lines [...] 10. Another list item. Note that the use of an extra space in (1) makes this line up prettily. 2. The indentation of a list item is indeterminant unless there is a paragraph that constrains it. Thus, for example, you could say:: - This is a multiline list item. - List item - sublist item 1 - sublist item 2 Another paragraph in the top list-item. I see 2 main problems with approach (1): - it doesn't work well if you try to use a non-monospaced font for docstrings, since it's hard to tell if it's "lined up." - it may not be convenient for labels:: param x: you have to line up with the first line, like this. You can't go like this:: param x: a multiline description of parameter x return: a multiline description of the return value I see 1 main problem with approach (2): - if a list item contains a one-line paragraph, then the list item's indentation is indeterminant, so you can't figure out the indentation of a child literal block. E.g.:: - list item:: What's the indentation of this literal block?? Is this another paragraph in the list item, or part of the literal block? Thoughts/comments? -Edward p.s., requiring paragraphs to be justified, and requiring lists to be indented, gets rid of the problem of accidentally word-wrapping a sentence ending in 1. There's still a minor problem if we go with approach (2), since you can't tell if the second line is a list item or a continuation of the first line in:: - a list item with a sentence that ends in 1. That's not easy for humans to parse, either, though. :) From support@internetdiscovery.com Mon Apr 9 00:08:18 2001 From: support@internetdiscovery.com (Mike Clarkson) Date: Sun, 08 Apr 2001 16:08:18 -0700 Subject: [Doc-SIG] which characters to use for docstring markup In-Reply-To: <200104081940.f38Jedp27592@gradient.cis.upenn.edu> References: Message-ID: <3.0.6.32.20010408160818.007c3db0@popd.ix.netcom.com> >At 03:40 PM 4/8/01 EDT, Edward D. Loper wrote: >> At 04:06 PM 4/6/01 -0500, Guido van Rossum didn't write :-) >> At 02:21 PM 4/7/01 -0700, I wrote: >> FYI, I have the HappyDoc formatter/docstring extractor >> (happydoc.sourceforge.net), >> generating standard Python documentation LaTeX from docstrings. It's kind >> of nice, because it means that I immediately have all of the python.sty >> features available >> to me for crosseferencing etc., plus it immediately gives me my docstring >> derived documents in PDF, PS, HTML, and info (if I can get info working >> again). > >The only formatters I could find for HappyDoc use StructuredTextClassic, >or some variant. And many people (incl. Guido) are not terribly happy >with ST. Does the formatter you're talking about use something else? >What does it do about lists, etc? The formatter is an extension that I've added to HappyDoc. I'm working with the author to get the changes back into the distribution; with luck they may be done RSN (days). I hope they will be adopted into the next version (the changes are small, and it really just introduces a new hdformatter). >> def foo(): >> r""" >> My \code{foo} function \emph{breaks} the >> \module{bar} module.\index{Foos and Bars} >> """ >> pass > >Most people have objected to "heavyweight" markup for docstrings.. >i.e., they don't want to have to write docstrings in LaTeX or XML >or whatever.. It *looks* like you're basically just writing >docstrings using some subset of LaTeX? Yes it's a subset - I should have made that clear. There is a subset of LaTeX implicitly defined by the Python \file{Doc/} tools, by virtue of the constraint that the output be generatable in HTML and info as well. It's really the subset of LaTeX that is equivalent to TeXinfo, (give or take some minor naming differences). For the sake of discussion, let me call this LaTeXinfo. The subset contains all of what you need for docstring highlighting etc., plus, and in my eyes a big plus, everything you need for cross-referencing, TOC and indexing of a group of modules. For the sake of discussion, we'll say it contains nothing else. Heavyweight is a relative term of course, and I think most users of TeXinfo feel it's not too heavy. It's a fair balance between light and complete. > If so, we'd have to carefully >define *which* subset, and what everything means, etc., before I >would accept it. We don't want people assuming that, just because >they can use \emph{...}, they can use all their other favorite LaTeX >commands (we do, after all, want it to be possible to convert this >to HTML, info pages, etc.) Agreed. The subset is well defined and documented already, and in widespread use as the current documentation standard for Python. The current installed base equals the installed base of Python. >> 1) It's quite complete for all of the entended uses (\emph(...}) >> Because it's more or less TexInfo compatible, most people know it >> or can learn it easily, even if you don't know LaTeX. > >But you can't make it *too* "complete," or it won't be a standard that >people can write tools to process anymore.. We don't want to just >reimplement LaTeX here... You're right. I find the subset to be complete enough, especially for docstrings, and the tools are already written. It has to be small for info. >> 2) It means you can cut and paste between docstrings and the formal >> module documentation for Doc/. >> 3) The macros/commands are already completely documented, and the >> documentation for them ships with the core distribution. >> 4) It would reinforce the use of the Doc/ tools. > >I actually am not too familiar with the Doc/ tools.. Can you give >me a pointer to them? They are with every Python distribution, or take a look at \citetitle[http://www.python.org/doc/current/doc/doc.html]{Documenting Python} In my view, the documentation is one of Python's strengths, and the benefits of having standardized the documentation early are huge. But documentation is always a painful task, and I think there are real benefits to a documentation approach that is scalable, from docstrings all the way up to the reference documentation. >But copy/paste does seem useful. (Although it >should at least be possible to write conversion tools, in any case, >given a good standard). It's \emph{really} nice to have the \key{PASTE} key as a conversion tool. I find myself documenting modules a lot, classes a little, etc. and by then, a first draft of the reference documentation is already done. >> 5) It would reinforce the use of HappyDoc (semi-literate programmming). > >Happydoc seems like a nice tool. Whatever markup language we settle >on (if we ever do), a HappyDoc formatter will probably be implemented.. I only wrote the LaTeXinfo extention to HappyDoc last week, and already I'm very Happy \grin. But the LaTeXinfo version is by far the most advanced: having my entire module and class structure documented with indexing, Table of Contents and cross-references, in HTML, info and PDF is huge.* >> 7) It's likely to be mainly backward compatible - I doubt many >> docstrings use \ a lot. On the other hand, I bet a lot of >> them use blank line as a paragraph seperator. > >I believe that trying to be "backward compatible" with a markup language >is an extremely dangerous thing to do, esp. if your markup language is >relatively "forgiving," because you probably won't *notice* the places >where it gets confused. I would rather be explicitly non-backward- >compatible. Sorry, what I meant was backward compatible with all the current docstrings in the existing Python library. It is backward compatible in the sense: \begin{enumerate} \item There are very few occurences of \textbackslash. \item A blank line implies a paragraph. \end{enumerate} This would not be true if we went to an HTML markup system for example: all current docstrings in existing code would require the insersion of \code{

} for the blank lines, plus worrying about the more frequently used \samp{<} and \samp{>} characters. Small details, but nice. The whole docstring implementation in this way could be done simply by: \begin{enumerate} \item Define a subset of LaTeXinfo commands that would be admissible in docstrings. You could start very small, and add to them in time. \item Change the page where these command are documented in the Python documentation to seperate out the docstring subset on their own page, and tell people about the \code{r"""markup"""} trick (see below). \item Implement the tty parser for docstrings so that they look pretty at the terminal. For this, it it important to note that there are existing reference implementations of a tty representation (info in C, info in Emacs), so presumably you could blindly copy the info representations. That way you would be compatible with the primordial Python IDE: Emacs. \end{enumerate} Note also, that a lot of the mileage of this approach is gained from the fortutious coincidence that r""" docstring with \LaTeX\ markup""" works for docstrings too, which makes it easy to use backslashes: def foo(): r""" My \code{foo} function \emph{breaks} the \module{bar} module.\index{Foos and Bars} """ pass Mike. * \footnote{If people want, I can put a copy of a HappyDoc LaTeXinfo generated PDF file up on starship for people to browse.} PS: My apologies if anyone was mislead by the apparent misattribution of my previous post; it was Edward D. Loper I was quoting. From guido@digicool.com Mon Apr 9 01:24:53 2001 From: guido@digicool.com (Guido van Rossum) Date: Sun, 08 Apr 2001 19:24:53 -0500 Subject: [Doc-SIG] which characters to use for docstring markup In-Reply-To: Your message of "Sun, 08 Apr 2001 16:06:32 EDT." <200104082006.f38K6Wp00128@gradient.cis.upenn.edu> References: <200104082006.f38K6Wp00128@gradient.cis.upenn.edu> Message-ID: <200104090024.TAA31945@cj20424-a.reston1.va.home.com> > Guido: > >> I think I'd prefer to have to write \> for > than \} for }, so I > >> *still* prefer C<> over C{}. > > Is this just because you think it looks nicer? Or for some semblance > of compatability with perldoc? Or is there some other reason? I guess because I feel that dict displays are probably more common than comparisons -- but I haven't made a study. The argument could be made, though, that inside C{}, one could allow unescaped {} to *nest*, and this would make C{{'1': 2, '2': 1}} unambiguous. We can't do this for < and >, because they occur unpaired much more often than } and {. So I withdraw that objection. (My other reason was that I thought that C{} would be harder to type than C<>, but I can't find an objective reason for that.) --Guido van Rossum (home page: http://www.python.org/~guido/) From edloper@gradient.cis.upenn.edu Mon Apr 9 00:36:16 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 08 Apr 2001 19:36:16 EDT Subject: [Doc-SIG] which characters to use for docstring markup In-Reply-To: Your message of "Sun, 08 Apr 2001 16:08:18 PDT." <3.0.6.32.20010408160818.007c3db0@popd.ix.netcom.com> Message-ID: <200104082336.f38NaGp20864@gradient.cis.upenn.edu> > The formatter is an extension that I've added to HappyDoc. I'm working with > the author to get the changes back into the distribution; with luck they > may be done RSN (days). I hope they will be adopted into the next version > (the changes are small, and it really just introduces a new hdformatter). Could you put it on the web someplace? Incidentally, I have a question about HappyDoc terminology -- they seem to use the word formatter to refer to both what I would call a "parser" (convert representation to an interlingua) and an "outputter" (convert interlingua to output representations). Do they do all the translations in one step? If so, doesn't that make it a pain to write outputters for each output format? Or do they just use terminology differently than I expect them to (I would think that a "formatter" would be what I would call an "outputter"??) > Heavyweight is a relative term of course, and I think most users of TeXinfo > feel it's not too heavy. It's a fair balance between light and complete. I think that many people on this sig would say that its syntax for lists is too heavy-weight. I myself would be ok using XML, so I'm noot really one of the ones strongly lobbying for lightweight.. but I want somethign that people will accept. And I think it's much more likely that people will type: - lists like this than: \begin{itemize} \item lists like this \end{itemize} (which is not to say that I'd support any sort of hybrid.. if you're using the subset of LaTeX supported by Doc, then you should use just that) > I only wrote the LaTeXinfo extention to HappyDoc last week, and already > I'm very Happy \grin. But the LaTeXinfo version is by far the most advanced: > having my entire module and class structure documented with indexing, Table > of Contents and cross-references, in HTML, info and PDF is huge.* I assume you use somethign like \label{foo} and \ref{foo} for cross-referencing? > Sorry, what I meant was backward compatible with all the current docstrings > in the existing Python library. It is backward compatible in the sense: > > \begin{enumerate} > \item There are very few occurences of \textbackslash. > \item A blank line implies a paragraph. > \end{enumerate} Presumably you also have to worry about '{' and '}' because LaTeX will treat '{hi}' as equivalant to 'hi', etc. > \item Implement the tty parser for docstrings so that they look pretty > at the terminal. For me, this would be a pretty essential precondition to accepting a markup language like the one you propose. > * \footnote{If people want, I can put a copy of a HappyDoc LaTeXinfo generated > PDF file up on starship for people to browse.} I'd like to see what the HTML output looks like too. -Edward From dgoodger@atsautomation.com Mon Apr 9 16:12:32 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Mon, 9 Apr 2001 11:12:32 -0400 Subject: [Doc-SIG] backslashing Message-ID: Edward D. Loper wrote: > Now consider what happens if a newbie user prints out a formatted > docstring:: > > >>> print somefunc.__doc__ > Somefunc will start an interactive session. When you want > to exit the session, simply type "\\exit" > >>> > > Now, the user gets confused, and types "\\exit" instead of "\exit". I assume you meant that the user should type "\exit", and you doubled-up the backslashes in order to avoid escaping the "e" in "exit". Correct? In a nutshell: - No matter what characters are chosen for markup, some day someone will want to write documentation *about* that markup (hopefully sooner than later :). Therefore, any complete markup language must have an escaping or encoding mechanism. - If we want a lightweight markup system, encoding mechanisms like SGML/XML's '*' are out. So an escaping mechanism is in. The backslash is the only viable candidate IMO. - However, with carefully chosen markup, it should be necessary to use the escaping mechanism only infrequently. - As in many systems with escaping, we can define the escape character to have the "escaping" meaning only for specific characters (the markup characters themselves). (Example: in Python, len('\t') == 1, len('\T') == 2.) So '\*' would escape the asterisk (evaluates to '*', but not processed as markup), but '\e' would be a backslash and an 'e', two characters. No '\\e' required. - In extreme cases, or when we want to be absolutely clear, we can use a literal block instead:: When you want to exit the session, simply type:: \exit > So what's the alternative? Don't allow markup characters in > paragraphs, and force docstring writers to put them in literal > blocks if they want to use them. Have fun enforcing that! :> I would change that to: allow markup characters in paragraphs via the escaping mechanism, but encourage authors to put them in literal blocks instead. /DG From edloper@gradient.cis.upenn.edu Wed Apr 11 18:06:41 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 11 Apr 2001 13:06:41 EDT Subject: [Doc-SIG] lightweight markup: bullets Message-ID: <200104111706.f3BH6fp26132@gradient.cis.upenn.edu> So I'm still playing around with developing a lightweight markup language for docstrings, and wanted to bounce an idea off the list.. Background ========== Traditionally, there has been some difficulty in deciding how to do lightweight lists. The simplest idea is to do something like:: - this is a list item - this is another list item - This is a multiline list item. Where list items are lines that start with a bullet character. But then there's an issue of whether that makes it safe to include dashes (surrounded by spaces) in paragraphs.. because they could get word-wrapped such that the dash appears at the beginning of the line, which would then make it into a list item. So a paragraph containing the sentence:: This is a paragraph - it contains a dash. Might get word-wrapped at some point to:: This is a paragraph - it contains a dash. The problem also applies to ordered lists: the paragraph:: Some people like the number 1. Some don't. Might get word-wrapped to:: Some people like the number 1. Some don't. There are some ways to get around this *most* of the time with indentation (by requiring that lists be indented), but they don't work all of the time. For example, with a paragraph like:: return: Some real number that's greater than 1. The number should also be less than 2. ... it doesn't help, because there's no way to tell whether that's a list item or part of the first paragraph.. My Question =========== So the approach that I wanted to get peoples' opinion on is using bullets that look like:: <-> This is an unordered list item. <1> This is an ordered list item. This is a description list item. Here, I'm assuming that we're already using C<...> etc. to delimit colored regions, so <...> without a letter before it should never appear in a paragraph. (Alternatively, replace '<' with '{' and '>' with '}'. I've been going back and forth on which one I like better.) The advantages of this approach are: - it's consistant between list types - it's very easy to detect (for coloring, etc.) - it's safe with respect to word-wrapping paragraphs - it easily allows for a wide variety of bullets (e.g., for description list items) The main disadvantage that I see is: - It's uglier than just using '-' or '1.'. Is it too ugly? Do you see any other problems with it? Do you have any better ideas? -Edward From edloper@gradient.cis.upenn.edu Wed Apr 11 18:16:36 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 11 Apr 2001 13:16:36 EDT Subject: [Doc-SIG] which characters to use for docstring markup In-Reply-To: Your message of "Sun, 08 Apr 2001 16:08:18 PDT." <3.0.6.32.20010408160818.007c3db0@popd.ix.netcom.com> Message-ID: <200104111716.f3BHGap27199@gradient.cis.upenn.edu> Mike said: > The formatter is an extension that I've added to HappyDoc. I'm > working with the author to get the changes back into the > distribution; with luck they may be done RSN (days). I hope they > will be adopted into the next version (the changes are small, and it > really just introduces a new hdformatter). Does it have any provisions for specifying descriptions of parameters, etc? Like "@param" and "@return" etc. in javadoc? > Heavyweight is a relative term of course, and I think most users of > TeXinfo feel it's not too heavy. It's a fair balance between light > and complete. It's more heavyweight than what I was aiming for, but that's not to say that it's too heavyweight. I was just reacting to what I percieved as the desire of most people.. > It's \emph{really} nice to have the \key{PASTE} key as a conversion > tool. I find myself documenting modules a lot, classes a little, > etc. and by then, a first draft of the reference documentation is > already done. It still seems to me like this won't *quite* work with your system, since you'll allow comments that include unbackslashed &'s and \\s and whatever other characters LaTeX treats funnily.. But it's certainly much closer than the markup languages we've been talking about on docsig. > I only wrote the LaTeXinfo extention to HappyDoc last week, and > already I'm very Happy \grin. But the LaTeXinfo version is by far > the most advanced: having my entire module and class structure > documented with indexing, Table of Contents and cross-references, in > HTML, info and PDF is huge.* It seems like a lot of the indexing and table-of-contents stuff is really a tool issue, not a markup language issue.. How much explicit info do you put in? The markup language I've been thinking about is mainly intended for API-level documentation, which is arguably different from reference docs.. (I'm not one of the people who says that the reference docs should be included inline). I wouldn't imagine most docstrings using sections, etc. -Edward From edloper@gradient.cis.upenn.edu Wed Apr 11 18:22:21 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 11 Apr 2001 13:22:21 EDT Subject: [Doc-SIG] which characters to use for docstring markup In-Reply-To: Your message of "Sun, 08 Apr 2001 19:24:53 CDT." <200104090024.TAA31945@cj20424-a.reston1.va.home.com> Message-ID: <200104111722.f3BHMLp27608@gradient.cis.upenn.edu> > I guess because I feel that dict displays are probably more common > than comparisons -- but I haven't made a study. The fact that only 13 '{'s appear in the reference docs I searched suggests otherwise.. ;) But really, the question isn't how often dicts are included, but how often they're included inline. I would *think* that dictionary displays tend not to be inline.. But I don't trust my intuitions on such things, though.. > The argument could be made, though, that inside C{}, one could allow > unescaped {} to *nest*, and this would make C{{'1': 2, '2': 1}} > unambiguous. We can't do this for < and >, because they occur > unpaired much more often than } and {. We'd want to be careful with this, but that could work. > (My other reason was that I thought that C{} would be harder to type > than C<>, but I can't find an objective reason for that.) Which is something I hadn't considered, but is actually a fairly good reason.. It *is* easier to type C<> than to type C{} (for me anyway), since '<' and '>' are closer to the home row than '{' and '}'.. :) I think I'm still leaning towards '{}', but I'm not strongly set on it.. For now, I'll just remain uncommitted, until I've finished implementing something, and we can play with them and see which one looks/feels better. -Edward From edloper@gradient.cis.upenn.edu Wed Apr 11 18:49:43 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 11 Apr 2001 13:49:43 EDT Subject: [Doc-SIG] backslashing In-Reply-To: Your message of "Mon, 09 Apr 2001 11:12:32 EDT." Message-ID: <200104111749.f3BHnhp29835@gradient.cis.upenn.edu> > - As in many systems with escaping, we can define the escape > character to have the "escaping" meaning only for specific > characters (the markup characters themselves). (Example: in Python, > len('\t') == 1, len('\T') == 2.) So '\*' would escape the asterisk > (evaluates to '*', but not processed as markup), but '\e' would be a > backslash and an 'e', two characters. No '\\e' required. I can see this getting rediculously complicated if we're talking about any regexps inline.. And regexps are hard enough to read anyway.. :) >>> print my_confusing_docstring ... ...defaults to the regexp "\s\*", which will match zero or more spaces... :) Of course, I'm currently planning to use E{emph} instead of *emph*, but you ge the idea.. :) Hm.. So in my current markup language, there are two coloring characters ('{' and '}' or '<' and '>') and the following structuring characters: - '-': a bullet, when it occurs at the start of a line - '([0-9]+.)+': a bullet, when at the start of a line - '::': introduces a literal lock, when at the end of a para - '=': used for underlining headings - '-': used for underlining headings - '~': used for underlining headings If we're doing escaping, then clearly we need to be able to escape '{' and '}'. We might be able to get away with not escaping any of the structuring characters by saying that when they appear within a colored region, they don't count. So, for example, in:: Find the value of C{x - y}. The second line wouldn't be a list item because it's in a colored region.. We might also need a new "null" coloring that could be used in examples like:: This is a sentence that ends in the number N{1.} Is this better or worse than:: This is a sentence that ends in the number \1. Of course, if we require bullets to be in a special colored region, then we don't have to worry abou them.. And we don't have to worry about '::', since it's only interpreted when it comes at the end of a paragraph (not at the end of a line)... And presumably people will never write:: x = y as:: x = y (which would be read as a heading "x" followed by a paragraph containing "y"). In that case, we could say that the only characters that you can backslash are '\{' and '\}'. So then I might feel better about saying that '\' is interpreted as a literal backslash except before '\', '{', or '}'.. Although I would still be worried that people would get confused with regexps:: >>> print another_confusing_docstring ... The regexp r"\\." matches a literal period. -Edward From tim.one@home.com Wed Apr 11 19:45:59 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 11 Apr 2001 14:45:59 -0400 Subject: [Doc-SIG] lightweight markup: bullets In-Reply-To: <200104111706.f3BH6fp26132@gradient.cis.upenn.edu> Message-ID: [Edward D. Loper] > ... > So the approach that I wanted to get peoples' opinion on is using > bullets that look like:: > > <-> This is an unordered list item. > <1> This is an ordered list item. > This is a description list item. > > Here, I'm assuming that we're already using C<...> etc. to delimit > colored regions, ... > ... > Is it too ugly? Do you see any other problems with it? Do you have > any better ideas? If we're reserving X<...> notation, let's use it uniformly: L<-> This is an unordered list item. L<1> This is an ordered list item. L This is a description list item. L> This is a descriptive item with embedded code in the description. > ... > The main disadvantage that I see is: > - It's uglier than just using '-' or '1.'. Ya, and I'm uglier than my sisters, but that's no argument for letting them write your docstrings . From edloper@gradient.cis.upenn.edu Wed Apr 11 20:33:59 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 11 Apr 2001 15:33:59 EDT Subject: [Doc-SIG] lightweight markup: bullets In-Reply-To: Your message of "Wed, 11 Apr 2001 14:45:59 EDT." Message-ID: <200104111933.f3BJXxp10083@gradient.cis.upenn.edu> > If we're reserving X<...> notation, let's use it uniformly: > > L<-> This is an unordered list item. > L<1> This is an ordered list item. > L This is a description list item. > L> This is a descriptive item with embedded code in > the description. Perhaps. The reason that I didn't do that is that the use of <...> or L<...> for bullets is really very different from the use of X<...> for coloring. X<...> coloring is something that happens within a paragraph.. L<...> is a structuring primitive.. For example, you can't say:: This makes L sense. But you can say:: This I make sense. Of course, if we decided to use '{' and '}' instead of '<' and '>', and used 'L{...}' instead of '{...}', then we could say that '{...}' when not preceeded by a capitalized letter will have the '{' and '}' rendered as braces (c.f., Guido's suggestion to allow things like 'C{x={1:2, 3:4}}'.. > > ... > > The main disadvantage that I see is: > > - It's uglier than just using '-' or '1.'. > > Ya, and I'm uglier than my sisters, but that's no argument for > letting them write your docstrings . The main reason for not just using something established like XML or LaTeX is that they're too complex/ugly. There's no point in having a new markup language if it's also complex/ugly.. :) So I'd like to keep this markup language as simple and clean as possible. -Edward From tim.one@home.com Wed Apr 11 21:10:45 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 11 Apr 2001 16:10:45 -0400 Subject: [Doc-SIG] lightweight markup: bullets In-Reply-To: <200104111933.f3BJXxp10083@gradient.cis.upenn.edu> Message-ID: [Edward D. Loper, on L<...> for list items] > Perhaps. The reason that I didn't do that is that the use of <...> or > L<...> for bullets is really very different from the use of X<...> for > coloring. It's barely different at all to me: it's markup, as opposed to not markup, and that's the *primary* distinction that needs to be learned. You overburden my biological pattern-recognition engine if I have to learn N different lexical conventions for N different categories of markup. > X<...> coloring is something that happens within a paragraph. > L<...> is a structuring primitive.. For example, you can't say:: > > This makes L sense. > > But you can say:: > > This I make sense. So L has to appear at the start of a line. Fine: additional constraints on specific X<...> thingies are easy to live with. > ... > The main reason for not just using something established like XML or > LaTeX is that they're too complex/ugly. There's no point in having a > new markup language if it's also complex/ugly.. :) So I'd like to keep > this markup language as simple and clean as possible. I've got no particular use for list markup at all, but since people will insist that it's necessary, it's simpler and cleaner to reuse one lexical gimmick whenever it suffices to get the job done. WRT beauty, <1> and L<1> are *both* "ugly", but at least the latter is ugly in exactly the same way that E is uglier than *beautiful* Occam's Beautifier suggests that new varieties of ugliness not be multiplied beyond necessity . From dgoodger@atsautomation.com Wed Apr 11 21:29:27 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Wed, 11 Apr 2001 16:29:27 -0400 Subject: [Doc-SIG] backslashing Message-ID: > I can see this getting rediculously complicated if we're talking > about any regexps inline.. And regexps are hard enough to read > anyway.. :) Any inline regexps will be complicated, no matter what escaping mechanism you choose. It's the nature of the beast! > Although I would still be worried that > people would get confused with regexps:: > > >>> print another_confusing_docstring > ... > The regexp r"\\." matches a literal period. Inline literals would be better:: The regexp `r"\."` matches a literal period. Or a literal block:: The regexp:: r"\." matches a literal period. Two ways to look at backslash escapes (i.e., a way to selectively suppress markup recognition): as an occasional tool, or as a horrible wart. You seem to be looking at it as a wart: how ugly can it get? Very ugly, indeed. Try looking at it the other way: use it only when necessary, which should be quite infrequently. > So, for example, in:: > > Find the value of C{x > - y}. I haven't spoken up about this, but: ugh! Somewhat dismayed to see the X<> type of construct being taken seriously. It works fine for POD: long live POD! Want to use it in docstrings? Implement a POD parser for HappyDoc or pydoc. In my estimation, readability is the most important criterion; X<> fails miserably. > The second line wouldn't be a list item because it's in a colored > region.. We might also need a new "null" coloring that could be used > in examples like:: > > This is a sentence that ends in the number > N{1.} (Let's not get ridiculous here.) > Is this better or worse than:: > > This is a sentence that ends in the number > \1. Recently, I've come to the conclusion that requiring a blank line before the start of a list is reasonable and correct, even if we don't require blank lines between items. Minimizing ambiguity trumps minimizing vertical space. /DG From edloper@gradient.cis.upenn.edu Thu Apr 12 00:04:26 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 11 Apr 2001 19:04:26 EDT Subject: [Doc-SIG] backslashing In-Reply-To: Your message of "Wed, 11 Apr 2001 16:29:27 EDT." Message-ID: <200104112304.f3BN4Rp02123@gradient.cis.upenn.edu> > Any inline regexps will be complicated, no matter what escaping > mechanism you choose. It's the nature of the beast! If you used E and E or E{lb} and E{rb} or something like that, then regexps would generally look how they're supposed to (at least when you print them). > Inline literals would be better:: > > The regexp `r"\."` matches a literal period. But then we have to say that inline literals can't ever contain "'".. which in my mind is no better than saying that you can't backslash '{' and '}'. > I haven't spoken up about this, but: ugh! Somewhat dismayed to see > the X<> type of construct being taken seriously. It works fine for > POD: long live POD! Want to use it in docstrings? Implement a POD > parser for HappyDoc or pydoc. In my estimation, readability is the > most important criterion; X<> fails miserably. I asked about this before, and didn't get any negative feedback. Basically, I'd be happy to not use X<...> markup (or X{..} markup) if we restrict ourselves to only using: - either 'literal' or `literal` - maybe *single* *word* *emph* - no nesting Guido has objected to `literal` (which doesn't mean we can't do it, of course). I think that if the reason we're rejecting X{} or X<> is because it's "not readable," then there's no reason to accept #code#, which to me is signifigantly less intuitive than C{code}. > Recently, I've come to the conclusion that requiring a blank line > before the start of a list is reasonable and correct, even if we > don't require blank lines between items. Minimizing ambiguity trumps > minimizing vertical space. That would make things easier. But we would also have to require that sublists are surrounded by blank lines. So instead of:: text - item1 - subitem 1.1 - subitem 1.2 - item2 - item3 text We would have:: text - item1 - subitem 1.1 - subitem 1.2 - item2 - item3 text Any objections to that? The way my markup language currently works, we don't have to worry about how to detect when a new list item starts, because list item contents are required to be indented:: - this is a valid list item. - This is not a valid list item. -Edward From fdrake@beowolf.digicool.com Thu Apr 12 05:39:34 2001 From: fdrake@beowolf.digicool.com (Fred Drake) Date: Thu, 12 Apr 2001 00:39:34 -0400 (EDT) Subject: [Doc-SIG] [development doc updates] Message-ID: <20010412043934.B61E12879C@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Almost to Python 2.1 release candidate 1 status. This includes a variety of small updates and a good bit more documentation on the PyUnit version that will be included with the final release (new text essentially converted from Steve Purcell's HTML docs). From ping@lfw.org Fri Apr 13 06:56:52 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 12 Apr 2001 22:56:52 -0700 (PDT) Subject: [Doc-SIG] Re: [Python-Dev] [development doc updates] In-Reply-To: <20010412043934.B61E12879C@beowolf.digicool.com> Message-ID: On Thu, 12 Apr 2001, Fred Drake wrote: > The development version of the documentation has been updated: > > http://python.sourceforge.net/devel-docs/ I had a browse through the reference manual. This file is empty: http://python.sourceforge.net/devel-docs/ref/unicode.html This file says "scopes do not nest" and doesn't mention the availability of nested scopes via __future__: http://python.sourceforge.net/devel-docs/ref/execframes.html This file still has a big XXX in it: http://python.sourceforge.net/devel-docs/ref/import.html Do you already have updates for these? I may be able to offer a little help, but i'm stretched pretty thin at the moment... -- ?!ng From ping@lfw.org Fri Apr 13 12:41:50 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 13 Apr 2001 04:41:50 -0700 (PDT) Subject: [Doc-SIG] Doc nit: << and >> Message-ID: I just noticed on the page http://python.sourceforge.net/devel-docs/ref/summary.html that the shifting operators are not shown as << and >>, but as the double-angle-quote characters ("guillemets"?) that the French are fond of. This should probably get fixed, though i'm not enough of a TeX expert to know how (will a couple of backslashes do the trick?). -- ?!ng From ping@lfw.org Fri Apr 13 14:25:56 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 13 Apr 2001 06:25:56 -0700 (PDT) Subject: [Doc-SIG] pydoc.py: new help feature Message-ID: As well as fixes to pydoc which i have just checked in (the module actually got *smaller* this time!) i have also spent a good portion of the evening rewriting the Helper class in pydoc to try to do a better job of providing help. I haven't yet committed a version containing this new feature (though i was quite tempted!) as it would be irresponsible of me to check in such a big change the day before a deadline without at least asking first. I know my timing is really terrible, but i would like you to have a look at it. I think the help utility is in fairly good shape. Please try it out if you have time, and consider the possibility of allowing it into 2.1. The fancy version is at http://www.lfw.org/python/pydoc.py Aside from a big chunk of new code for the Helper class, it is otherwise the same as the version currently in CVS. *** In particular, i'm also looking for any suggestions about how to make Helper.__init__ more robust at finding the docs. *** If you really like it, we could even think about adding a few lines to site.py: class Helper: def __repr__(self): import pydoc pydoc.help() return '' def __call__(self, *args): import pydoc pydoc.help(*args) __builtin__.help = Helper() I know that's pushing it, but hey, i thought it wouldn't hurt to ask. :) -- ?!ng Here follows a transcript of a session, to show you some examples. I've inserted marks like this: ##1## to refer to commentary at the bottom. skuld[1052]% python Python 2.1b2 (#29, Apr 10 2001, 04:59:40) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> from pydoc import help >>> help Welcome to Python 2.1! This is the online help utility. If this is your first time using Python, you should definitely check out the tutorial on the Internet at http://www.python.org/doc/tut/. Enter the name of any module, keyword, or topic to get help on writing Python programs and using Python modules. To quit this help utility and return to the interpreter, just type "quit". To get a list of available modules, keywords, or topics, type "modules", "keywords", or "topics". Each module also comes with a one-line summary of what it does; to list the modules whose summaries contain a given word such as "spam", type "modules spam". help> topics ##1## Here is a list of available topics. Enter any topic name to get more help. ASSERTION DELETION LOOPING SEQUENCEMETHODS2 ASSIGNMENT DICTIONARIES MAPPINGMETHODS SEQUENCES ATTRIBUTEMETHODS DICTIONARYLITERALS MAPPINGS SHIFTING ATTRIBUTES ELLIPSIS METHODS SLICINGS AUGMENTEDASSIGNMENT EXCEPTIONS MODULES SPECIALATTRIBUTES BACKQUOTES EXECUTION NAMESPACES SPECIALIDENTIFIERS BASICMETHODS EXPRESSIONS NONE SPECIALMETHODS BINARY FILES NUMBERMETHODS STRINGMETHODS BITWISE FLOAT NUMBERS STRINGS BOOLEAN FORMATTING OBJECTS SUBSCRIPTS CALLABLEMETHODS FRAMEOBJECTS OPERATORS TRACEBACKS CALLS FRAMES PACKAGES TRUTHVALUE CLASSES FUNCTIONS POWER TUPLELITERALS CODEOBJECTS IDENTIFIERS PRECEDENCE TUPLES COERCIONS IMPORTING PRINTING TYPEOBJECTS COMPARISON INTEGER PRIVATENAMES TYPES COMPLEX LISTLITERALS RETURNING UNARY CONDITIONAL LISTS SCOPING UNICODE CONVERSIONS LITERALS SEQUENCEMETHODS1 help> LISTS ##2## 2.1.5.4 Mutable Sequence Types List objects support additional operations that allow in-place modification of the object. These operations would be supported by other mutable sequence types (when added to the language) as well. Strings and tuples are immutable sequence types and such objects cannot be modified once created. The following operations are defined on mutable sequence types (where x is an arbitrary object): Operation Result Notes s[i] = x item i of s is replaced by x s[i:j] = t slice of s from i to j is replaced by t del s[i:j] same as s[i:j] = [] s.append(x) same as s[len(s):len(s)] = [x] (1) s.extend(x) same as s[len(s):len(s)] = x (2) s.count(x) return number of i's for which s[i] == x s.index(x) return smallest i such that s[i] == x (3) s.insert(i, x) same as s[i:i] = [x] if i >= 0 s.pop([i]) same as x = s[i]; del s[i]; return x (4) s.remove(x) same as del s[s.index(x)] (3) s.reverse() reverses the items of s in place (5) s.sort([cmpfunc]) sort the items of s in place (5), (6) Notes: (1) The C implementation of Python has historically accepted multiple parameters and implicitly joined them into a tuple; this no longer works in Python 2.0. Use of this misfeature has been deprecated since Python 1.4. (2) Raises an exception when x is not a list object. The extend() method is experimental and not supported by mutable sequence types other than lists. (3) Raises ValueError when x is not found in s. (4) The pop() method is only supported by the list and array types. The optional argument i defaults to -1, so that by default the last item is removed and returned. (5) The sort() and reverse() methods modify the list in place for economy of space when sorting or reversing a large list. They don't return the sorted or reversed list to remind you of this side effect. (6) The sort() method takes an optional argument specifying a comparison function of two arguments (list items) which should return -1, 0 or 1 depending on whether the first argument is considered smaller than, equal to, or larger than the second argument. Note that this slows the sorting process down considerably; e.g. to sort a list in reverse order it is much faster to use calls to the methods sort() and reverse() than to use the built-in function sort() with a comparison function that reverses the ordering of the elements. Related help topics: LISTLITERALS help> LISTLITERALS ##3## 5.2.4 List displays A list display is a possibly empty series of expressions enclosed in square brackets: list_display: "[" [listmaker] "]" listmaker: expression ( list_for | ( "," expression)* [","] ) list_iter: list_for | list_if list_for: "for" expression_list "in" testlist [list_iter] list_if: "if" test [list_iter] A list display yields a new list object. Its contents are specified by providing either a list of expressions or a list comprehension. When a comma-separated list of expressions is supplied, its elements are evaluated from left to right and placed into the list object in that order. When a list comprehension is supplied, it consists of a single expression followed by at least one for clause and zero or more for or if clauses. In this case, the elements of the new list are those that would be produced by considering each of the for or if clauses a block, nesting from left to right, and evaluating the expression to produce a list element each time the innermost block is reached. Related help topics: LITERALS help> LITERALS ##4## 5.2.2 Literals Python supports string literals and various numeric literals: literal: stringliteral | integer | longinteger | floatnumber | imagnumber Evaluation of a literal yields an object of the given type (string, integer, long integer, floating point number, complex number) with the given value. The value may be approximated in the case of floating point and imaginary (complex) literals. See section 2.4[1] for details. All literals correspond to immutable data types, and hence the object's identity is less important than its value. Multiple evaluations of literals with the same value (either the same occurrence in the program text or a different occurrence) may obtain the same object or a different object with the same value. Related help topics: STRINGS BACKQUOTES NUMBERS TUPLELITERALS LISTLITERALS DICTIONARYLITERALS help> NUMBERS 2.4.3 Numeric literals There are four types of numeric literals: plain integers, long integers, floating point numbers, and imaginary numbers. There are no complex literals (complex numbers can be formed by adding a real number and an imaginary number). Note that numeric literals do not include a sign; a phrase like -1 is actually an expression composed of the unary operator `-' and the literal 1. Related help topics: INTEGER FLOAT COMPLEX TYPES help> keywords ##5## Here is a list of the Python keywords. Enter any keyword to get more help. and elif global or assert else if pass break except import print class exec in raise continue finally is return def for lambda try del from not while help> if ##6## 7.1 The if statement The if statement is used for conditional execution: if_stmt: "if" expression ":" suite ("elif" expression ":" suite)* ["else" ":" suite] It selects exactly one of the suites by evaluating the expressions one by one until one is found to be true (see section 5.10[1] for the definition of true and false); then that suite is executed (and no other part of the if statement is executed or evaluated). If all expressions are false, the suite of the else clause, if present, is executed. Related help topics: TRUTHVALUE help> TRUTHVALUE 2.1.1 Truth Value Testing Any object can be tested for truth value, for use in an if or while condition or as operand of the Boolean operations below. The following values are considered false: None zero of any numeric type, for example, 0, 0L, 0.0, 0j. any empty sequence, for example, '', (), []. any empty mapping, for example, {}. instances of user-defined classes, if the class defines a __nonzero__() or __len__() method, when that method returns zero.2.2[1] All other values are considered true -- so objects of many types are always true. Operations and built-in functions that have a Boolean result always return 0 for false and 1 for true, unless otherwise stated. (Important exception: the Boolean operations "or" and "and" always return one of their operands.) ------------------------------------------------------------------------ Footnotes ... zero.2.2[2] Additional information on these special methods may be found in the Python Reference Manual[3]. Related help topics: if while and or not BASICMETHODS help> continue 6.10 The continue statement continue_stmt: "continue" continue may only occur syntactically nested in a for or while loop, but not nested in a function or class definition or try statement within that loop.6.1[1]It continues with the next cycle of the nearest enclosing loop. ------------------------------------------------------------------------ Footnotes ... loop.6.1[2] It may occur within an except or else clause. The restriction on occurring in the try clause is implementor's laziness and will eventually be lifted. Related help topics: while for help> while 7.2 The while statement The while statement is used for repeated execution as long as an expression is true: while_stmt: "while" expression ":" suite ["else" ":" suite] This repeatedly tests the expression and, if it is true, executes the first suite; if the expression is false (which may be the first time it is tested) the suite of the else clause, if present, is executed and the loop terminates. A break statement executed in the first suite terminates the loop without executing the else clause's suite. A continue statement executed in the first suite skips the rest of the suite and goes back to testing the expression. Related help topics: break continue if TRUTHVALUE help> modules color ##7## Here is a list of matching modules. Enter any module name to get more help. colorsys - Conversion functions between RGB and other color systems. tkColorChooser help> modules mail Here is a list of matching modules. Enter any module name to get more help. mailbox - Classes to handle Unix style, MMDF style, and MH style mailboxes. mailcap - Mailcap file handling. See RFC 1524. mimify - Mimification and unmimification of mail messages. test.test_mailbox help> colorsys ##8## Help on module colorsys: NAME colorsys - Conversion functions between RGB and other color systems. FILE /home/ping/dev/python/dist/src/Lib/colorsys.py DESCRIPTION This modules provides two functions for each color system ABC: rgb_to_abc(r, g, b) --> a, b, c abc_to_rgb(a, b, c) --> r, g, b All inputs and outputs are triples of floats in the range [0.0...1.0]. Inputs outside this range may cause exceptions or invalid outputs. Supported color systems: RGB: Red, Green, Blue components YIQ: used by composite video signals HLS: Hue, Luminance, Saturation HSV: Hue, Saturation, Value CONSTANTS ONE_SIXTH = 0.16666666666666666 ONE_THIRD = 0.33333333333333331 TWO_THIRD = 0.66666666666666663 __all__ = ['rgb_to_yiq', 'yiq_to_rgb', 'rgb_to_hls', 'hls_to_rgb', 'rg... __doc__ = 'Conversion functions between RGB and other color...uminance... __file__ = '/home/ping/dev/python/dist/src/Lib/colorsys.pyc' __name__ = 'colorsys' help> modules ##9## Please wait a moment while I gather a list of all available modules... BaseHTTPServer delegate multifile sndhdr Bastion difflib mutex socket CDROM dircache mytok (package) spam CGIHTTPServer dirctest neelk (package) sps Canvas dis netrc sre ConfigParser distutils (package) new sre_compile Cookie dl nis sre_constants Dialog doctest nntplib sre_parse FCNTL dospath ntpath stat FileDialog dumbdbm nturl2path statcache FixTk echatui oldgnut statvfs IN eggs operator string MimeWriter encodings (package) os stringold Queue errno parser strop ScrolledText exceptions pcre struct SimpleDialog fcntl pdb sunau SimpleHTTPServer festival pickle sunaudio SocketServer filecmp pipes symbol StringIO fileinput popen2 symtable TERMIOS findcode poplib sys Tix fnmatch posix syslog Tkconstants foo posixfile tabnanny Tkdnd foo (package) posixpath telnetlib Tkinter formatter ppm tempfile UserDict fpectl pprint termios UserList fpformat pre test (package) UserString ftplib profile testalias __builtin__ gc pstats tester __future__ gdbm pty tester15 _codecs getopt pwd tester1c _curses getpass py_compile tester2 _curses_panel gettext pyclbr thespark _locale glob pydoc thread _socket gnut pydoc-ell threading _sre gnutellalib pydoc-findmod time _symtable gopherlib pydoc-help timing _testcapi grp pydoc-nohelp tkColorChooser _tkinter gviz pyhints tkCommonDialog _weakref gzip pyscan tkFileDialog aifc html quopri tkFont alesis-old htmlentitydefs random tkMessageBox anydbm htmllib re tkSimpleDialog array http readline toaiff asynchat httpcli reconvert token asyncore httplib regex tokenize atexit icecream regex_syntax traceback audiodev ihooks regsub tree audioop imageop repr tty base64 imaplib resource turtle bdb imghdr reverb types binascii imgsize rexec tzparse binhex imp rfc822 unicodedata bisect imputil rgbimg unittest blech inspect rlcompleter urllib bsddb inspect-cvs robotparser urllib2 cPickle inspect-ping romandate urlparse cStringIO keyword rotor user calendar knee rxb uu cgen linecache rxb14 vchtml cgi linuxaudiodev rxb15 warnings chunk locale sched watcher cmath logo scopetest watcher1 cmd macpath scraper wave code macurl2path search weakref codecs mailbox select webbot codeop mailcap sequencer webbrowser collab makesums sequencer-badread webbrowser-mine collab2 marshal sequencer-types whichdb colorsys marshali sgmllib whrandom commands math sha wiki compileall md5 shelve worker copy memtest shlex xdrlib copy_reg mhlib shutil xml (package) coredump mimetools signal xmllib crypt mimetypes site xreadlines curses (package) mimify slk zipfile dbhash mmap smtpd zlib dbm mpz smtplib Enter any module name to get more help. Or, type "modules spam" to search for modules whose descriptions contain the word "spam". help> md5 Help on module md5: NAME md5 FILE /home/ping/dev/python/dist/src/build/lib.linux-i686-2.1/md5.so DESCRIPTION This module implements the interface to RSA's MD5 message digest algorithm (see also Internet RFC 1321). Its use is quite straightforward: use the new() to create an md5 object. You can now feed this object with arbitrary strings using the update() method, and at any point you can ask it for the digest (a strong kind of 128-bit checksum, a.k.a. ``fingerprint'') of the concatenation of the strings fed to it so far using the digest() method. Functions: new([arg]) -- return a new md5 object, initialized with arg if provided md5([arg]) -- DEPRECATED, same as new, but for compatibility Special Objects: MD5Type -- type object for md5 objects FUNCTIONS md5(...) new([arg]) -> md5 object Return a new md5 object. If arg is present, the method call update(arg) is made. new(...) new([arg]) -> md5 object Return a new md5 object. If arg is present, the method call update(arg) is made. CONSTANTS MD5Type = __doc__ = "This module implements the interface to RSA's MD...Objects:... __file__ = '/home/ping/dev/python/dist/src/build/lib.linux-i686-2.1/md... __name__ = 'md5' help> help ##10## Welcome to Python 2.1! This is the online help utility. If this is your first time using Python, you should definitely check out the tutorial on the Internet at http://www.python.org/doc/tut/. Enter the name of any module, keyword, or topic to get help on writing Python programs and using Python modules. To quit this help utility and return to the interpreter, just type "quit". To get a list of available modules, keywords, or topics, type "modules", "keywords", or "topics". Each module also comes with a one-line summary of what it does; to list the modules whose summaries contain a given word such as "spam", type "modules spam". help> abs ##11## Help on built-in function abs: abs(...) abs(number) -> number Return the absolute value of the argument. help> sys.getrefcount ##12## Help on built-in function getrefcount in sys: getrefcount(...) getrefcount(object) -> integer Return the current reference count for the object. This includes the temporary reference in the argument list, so it is at least 2. help> asdfadsf ##13## no Python documentation found for 'asdfadsf' help> quit ##14## You're now leaving help and returning to the Python interpreter. If you want to ask for help on a particular object directly from the interpreter, you can type "help(object)". Executing "help('string')" has the same effect as typing a particular string at the help> prompt. >>> help(3) ##15## Help on int: 3 >>> help([]) Help on list: [] >>> help([].append) ##16## Help on built-in function append: append(...) L.append(object) -- append object to end >>> import sys >>> help(sys.path) ##17## Help on list: ['', '/home/ping/python', '/home/ping/dev/python/dist/src/Lib', '/home/ping/dev/ python/dist/src/Lib/plat-linux2', '/home/ping/dev/python/dist/src/Lib/lib-tk', ' /home/ping/dev/python/dist/src/Modules', '/home/ping/dev/python/dist/src/build/l ib.linux-i686-2.1'] >>> help('sys.path') ##18## Help on list in sys: path = ['', '/home/ping/python', '/home/ping/dev/python/dist/src/Lib', '/home/pi ng/dev/python/dist/src/Lib/plat-linux2', '/home/ping/dev/python/dist/src/Lib/lib -tk', '/home/ping/dev/python/dist/src/Modules', '/home/ping/dev/python/dist/src/ build/lib.linux-i686-2.1'] >>> help('array') ##19## Help on module array: NAME array FILE /home/ping/dev/python/dist/src/build/lib.linux-i686-2.1/array.so DESCRIPTION This module defines a new object type which can efficiently represent an array of basic values: characters, integers, floating point numbers. Arrays are sequence types and behave very much like lists, except that the type of objects stored in them is constrained. The type is specified at object creation time by using a type code, which is a single character. The following type codes are defined: Type code C Type Minimum size in bytes 'c' character 1 'b' signed integer 1 'B' unsigned integer 1 'h' signed integer 2 'H' unsigned integer 2 'i' signed integer 2 'I' unsigned integer 2 'l' signed integer 4 'L' unsigned integer 4 'f' floating point 4 'd' floating point 8 Functions: array(typecode [, initializer]) -- create a new array Special Objects: ArrayType -- type object for array objects FUNCTIONS array(...) array(typecode [, initializer]) -> array Return a new array whose items are restricted by typecode, and initialized from the optional initializer value, which must be a list or a string. CONSTANTS ArrayType = __doc__ = 'This module defines a new object type which can ...cts:\n\n... __file__ = '/home/ping/dev/python/dist/src/build/lib.linux-i686-2.1/ar... __name__ = 'array' >>> ##1## Topic names are all in capital letters so that it's very unlikely they will collide with module or package names. The hope is that the user will understand they're supposed to enter them in capitals, as shown. ##2## Each topic is associated with one of the HTML files in the library documentation. The formatter module is used to turn HTML into text. A couple of small enhancements to the HTML parser allow tables to be crudely displayed; columns are separated by tabs, so they don't always line up, but at least that's a lot better than having the entire table get mushed into one paragraph. The generated text is displayed using the pager, like everything else. ##3## Each topic can also have a number of cross-references, which are shown following the help docs (after the pager is done). The cross-references can be other topics, keywords, or modules. ##4## You can surf through the docs by picking one of the related topics and typing it back in. Yes, i know the list of related topics isn't word-wrapped -- that's an easy change. ##5## A list of Python keywords is available. It's okay for them to be entered by the user in lowercase -- they're reserved words, so there will never be modules with these names. ##6## Each keyword is similarly associated with an HTML file and possibly some related topics. ##7## Typing "modules" followed by a search key does the same thing as "pydoc -k" from the shell. ##8## Typing in a module name is just like running "pydoc" from the shell on a module. ##9## Typing in "modules" by itself produces a list of all the modules and packages. It takes less than two seconds on my machine to gather the list. ##10## What happens if you type "help" in help? You get the intro. ##11## Built-in functions are available too. ##12## You can look things up with a dotted path of arbitrary depth. ##13## What happens if there's no such module? You get a message. ##14## "quit", "q", "QUIT", "Quit", "Q", and Ctrl-D all quit help. Even typing in '"quit"' with the quotation marks works, just to make sure beginners don't get stuck. ##15## What happens if you ask for help on a number? Nothing too useful, but at least it doesn't explode. ##16## Asking for help on a built-in method is actually useful. ##17## If you ask for help on an object, it just shows you the object. ##18## If you ask for help and give the path to get to an object, it shows you the object and also where it came from. ##19## You can get help directly from the interpreter level by invoking "help()" with an argument. From fdrake@acm.org Thu Apr 12 14:41:46 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 12 Apr 2001 09:41:46 -0400 (EDT) Subject: [Doc-SIG] Doc nit: << and >> In-Reply-To: References: Message-ID: <15061.45210.173196.47640@beowolf.digicool.com> Ka-Ping Yee writes: > I just noticed on the page > > http://python.sourceforge.net/devel-docs/ref/summary.html > > that the shifting operators are not shown as << and >>, but > as the double-angle-quote characters ("guillemets"?) that the > French are fond of. This should probably get fixed, though > i'm not enough of a TeX expert to know how (will a couple of > backslashes do the trick?). Fixed in CVS; thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From Juergen Hermann" > In-Reply-To: Message-ID: On Fri, 13 Apr 2001 04:41:50 -0700 (PDT), Ka-Ping Yee wrote: >I just noticed on the page > > http://python.sourceforge.net/devel-docs/ref/summary.html > >that the shifting operators are not shown as << and >>, but >as the double-angle-quote characters ("guillemets"?) that the >French are fond of. This should probably get fixed, though >i'm not enough of a TeX expert to know how (will a couple of >backslashes do the trick?). My TeX is rusted, but $<$$<$ should do the trick. Ciao, J=FCrgen -- J=FCrgen Hermann, Developer (jhe@webde-ag.de) WEB.DE AG, http://webde-ag.de/ From guido@digicool.com Thu Apr 12 20:38:45 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 12 Apr 2001 14:38:45 -0500 Subject: [Doc-SIG] pydoc.py: new help feature In-Reply-To: Your message of "Fri, 13 Apr 2001 06:25:56 MST." References: Message-ID: <200104121938.OAA21112@cj20424-a.reston1.va.home.com> OK, Ping, because pydoc is so new, we'll take the latest version. Please check it in ASAP!!! You're going to have to stay up the next 24 hours looking for bug reports, and also over the weekend once the release candidate is out. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From ping@lfw.org Fri Apr 13 21:13:35 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 13 Apr 2001 13:13:35 -0700 (PDT) Subject: [Doc-SIG] pydoc.py: new help feature In-Reply-To: <200104121938.OAA21112@cj20424-a.reston1.va.home.com> Message-ID: On Thu, 12 Apr 2001, Guido van Rossum wrote: > OK, Ping, because pydoc is so new, we'll take the latest version. > Please check it in ASAP!!! Done. > You're going to have to stay up the next 24 hours looking for bug > reports, and also over the weekend once the release candidate is > out. :-) You got it! -- ?!ng From fdrake@beowolf.digicool.com Fri Apr 13 06:10:02 2001 From: fdrake@beowolf.digicool.com (Fred Drake) Date: Fri, 13 Apr 2001 01:10:02 -0400 (EDT) Subject: [Doc-SIG] [development doc updates] Message-ID: <20010413051002.795BD2879C@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ More description and explanation in the unittest documentation; update to match the final code and decisions from the pyunit-interest mailing list. Added information on urllib.FancyURLopener's handling of basic authentication and how to change the prompting behavior. Added documentation for the ColorPicker module for the Macintosh. From fdrake@acm.org Fri Apr 13 19:02:55 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 13 Apr 2001 14:02:55 -0400 (EDT) Subject: [Doc-SIG] Docs are frozen. Message-ID: <15063.16207.884585.823138@beowolf.digicool.com> The documentation tree is frozen for Python 2.1c1. All further changes should be submitted via the SourceForge patch manager until Python 2.1 has been released. Thanks! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@beowolf.digicool.com Fri Apr 13 19:15:38 2001 From: fdrake@beowolf.digicool.com (Fred Drake) Date: Fri, 13 Apr 2001 14:15:38 -0400 (EDT) Subject: [Doc-SIG] [development doc updates] Message-ID: <20010413181538.7BA3F28A06@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Final documentation for Python 2.1c1. From edloper@gradient.cis.upenn.edu Fri Apr 13 20:47:33 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 13 Apr 2001 15:47:33 EDT Subject: [Doc-SIG] lightweight markup: bullets In-Reply-To: Your message of "Wed, 11 Apr 2001 16:10:45 EDT." Message-ID: <200104131947.f3DJlXp14396@gradient.cis.upenn.edu> Tim Peters said: > [Edward D. Loper, on L<...> for list items] > > Perhaps. The reason that I didn't do that is that the use of <...> or > > L<...> for bullets is really very different from the use of X<...> for > > coloring. > > It's barely different at all to me: it's markup, as opposed to not markup, > and that's the *primary* distinction that needs to be learned. You > overburden my biological pattern-recognition engine if I have to learn N > different lexical conventions for N different categories of markup. Well, I guess that part of the idea behind a lightweight markup is that we should try to re-use regexps that are already in your brain. Which might be a good argument with just sticking with lists that look like: - list item - another list item or: 1. list item 2. another list item > I've got no particular use for list markup at all, What markup do you find that you do have use for (while writing docstrings)? I personally tend to just use C{code} regions (for identifiers, mainly); unordered lists; and literal blocks/doctest blocks. Oh, and fields for specifying info about specific parameters or the return value or what exceptions are thrown, etc. -Edward From klm@digicool.com Sat Apr 14 17:37:56 2001 From: klm@digicool.com (klm@digicool.com) Date: Sat, 14 Apr 2001 12:37:56 -0400 (EDT) Subject: [Doc-SIG] lightweight markup: bullets Message-ID: <15064.31972.597163.363329@serenade.digicool.com> First chance i've had to chime in this week, and only have a moment to sound my repeating refrain: i would like to see the structured-text-ish approach. Someone (was it you, edward?) mentioned the non-geek CP4E-type audience earlier this week - i'm dismayed to think that we're talking about exposing them to code in docstrings, eg C<> or Z{} or whatever, that's more cryptic than lots of python code. The docstrings should be more self-obvious, not less!! Truly, even as a *programmer* i find it helpful that docstring encoding is written-language encoding, which i can automatically decipher. **That's** the reason that the structured text approach makes sense - the overt meanings of the conventions for the reader, even the unitiated reader, are the intended ones. You don't need the secret codes or a tool to read the docstrings in the program text. Evidently, the trick is coming up with a decent set of structured text style rules that are unambiguous and "unsurprising" - in particular, conventions that don't collide with common writing practices. (Eg, collide ones recently discussed: use of '--' for description lists, or "1." at the end of a sentence but beginning of line translating to the start of an ordered list item.) Once again, it seems to me that we're close to this goal, but veering off to a new language, with C<> or whatever - totally at the expense of the reader. Really, it seems to me that such docstrings would make python code *less* readable, not more. Oh well. Ken klm@digicool.com From edloper@gradient.cis.upenn.edu Sat Apr 14 18:05:12 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sat, 14 Apr 2001 13:05:12 EDT Subject: [Doc-SIG] lightweight markup: bullets In-Reply-To: Your message of "Sat, 14 Apr 2001 12:37:56 EDT." <15064.31972.597163.363329@serenade.digicool.com> Message-ID: <200104141705.f3EH5Cp06968@gradient.cis.upenn.edu> > i would like to see the structured-text-ish approach. In my mind, there are 2 things we're encoding here: structuring (lists, sections, literal blocks, etc), and colorizing (emphasized, inline literals, etc.). Colorizing only occurs within a paragraph.. I've been working on both designing & implementing a parser for a markup language for docstrings. The structuring is based in the structured-text-ish approach. I'm currently undecided about whether I want do do colorizing like E{this} or like *this*. The advantage of the former is that it means you can have more types of colorizing (e.g., colorizing for URIs, for code, for emphasis, for math, for definitions of terms that should be included in indeces, etc). The advantage of the later is that it's presumably more readable. But if we go with the later, I think we need to constrain ourselves to maybe 1 or 2 different colors (emph and code/identifier? or just identifier?). > Someone (was it you, edward?) mentioned the non-geek CP4E-type > audience earlier this week - I don't remember mentioning them, but I do think we need to keep them in mind. That would be one of my objections to some of the escaping proposals so far.. > i'm dismayed to think that we're talking about exposing them to code > in docstrings, eg C<> or Z{} or whatever, that's more cryptic than > lots of python code. The docstrings should be more self-obvious, not > less!! When I see it in context, it actually doesn't seem that cryptic to me. But then the people we should be asking about that are people who don't code. Maybe we should try encoding some docs with both kinds of markup, and see what they think. > You don't need the secret codes or a tool to read the docstrings in > the program text. In general, I think that the colorizing should *never* be necessary to understand what's being said.. i.e., you should be able to blindly ignore any X{}s (the "X{" and the "}", not the content). The only place where that wouldn't be true would be if X{}s were used to escape characters, which should hopefully be very rare. > Evidently, the trick is coming up with a decent set of structured > text style rules that are unambiguous and "unsurprising" - in > particular, conventions that don't collide with common writing > practices. (Eg, collide ones recently discussed: use of '--' for > description lists, or "1." at the end of a sentence but beginning of > line translating to the start of an ordered list item.) Once again, > it seems to me that we're close to this goal, but veering off to a > new language, with C<> or whatever - totally at the expense of the > reader. For structuring, I think I have a set of such rules. I'll send out mail about that when I've done more testing etc., but basically: 1. all paragraphs *must* be left-justified 2. all lists must be either indented or separated by a blank line. 3. The second and subsequent line of a list item must be indented further than the bullet. All lines but the first must be left-justified. Subsequent paragraphs in the same list item must line up with that indentation level. There are some more, but those are the basics required to avoid mis-interpreting bullets.. The only true ambiguities you get with rules like these are things like: 1. This is a list item whose second line begins with the number 1. Was that "1." a bullet or part of a sentence? > Really, it seems to me that such docstrings would make python code > *less* readable, not more. Do you think that we should have any colorizing at all? If so, what colors? People usually talk about *emphasis*, although I really very rarely find it useful in docstrings (despite its usefulness in *email*). The color I most often want is something to mark a token as a python identifier (or, more generally, to mark a string as Python code). If we didn't do any colorizing, we would probably have: - paragraphs (in which word-wrapping is legal, etc.) - literal blocks (which are displayed as-is) - doctest blocks (which are displayed as-is, or possibly colorized) - lists (ordered and unordered) - sections (and subsections) If there was no colorizing, I'm pretty sure we could get away with no escaping mechanism (with carefully chosen structuring rules, it would never be necessary). -Edward From klm@digicool.com Sat Apr 14 18:41:04 2001 From: klm@digicool.com (Ken Manheimer) Date: Sat, 14 Apr 2001 13:41:04 -0400 (EDT) Subject: [Doc-SIG] lightweight markup: bullets In-Reply-To: <200104141705.f3EH5Cp06968@gradient.cis.upenn.edu> Message-ID: On Sat, 14 Apr 2001, Edward D. Loper wrote: > Do you think that we should have any colorizing at all? If so, what > colors? People usually talk about *emphasis*, although I really very > rarely find it useful in docstrings (despite its usefulness in > *email*). The color I most often want is something to mark a token as > a python identifier (or, more generally, to mark a string as Python > code). Huh - me too. And certainly, emphasis is not significant when it comes to auto-documentation considerations like function-name and variable indexes! (Without any special interpretation, the rare occasions that i use "*" emphasis in my docstrings will still show through - as "*" emphasis. Handy, that.-) Marking tokens is another matter - but just marking them doesn't add much info to the auto-index situation. Significant info would require more elaborate structuring conventions, which we're nowhere near discussing yet. Why *don't* we start without coloring, and get plenty of the other useful stuff that you mention in place? We can defer the controversy, and maybe when we get around to it we'll know more what kind of structuring we really need... Excellent. Ken klm@digicool.com From edloper@gradient.cis.upenn.edu Sat Apr 14 19:44:38 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sat, 14 Apr 2001 14:44:38 EDT Subject: [Doc-SIG] lightweight markup: bullets In-Reply-To: Your message of "Sat, 14 Apr 2001 13:41:04 EDT." Message-ID: <200104141844.f3EIicp15860@gradient.cis.upenn.edu> > Why *don't* we start without coloring, and get plenty of the other > useful stuff that you mention in place? We can defer the > controversy, and maybe when we get around to it we'll know more what > kind of structuring we really need... Sounds like a good idea to me. The only slight problem is that, in theory, colorizing and structuring interact slightly. In particular, consider examples like:: - This is a list item talking about C{x - y}. C{x - y} is a good value. We all love it. Here, in principle, we should be able to tell from the fact that we're inside a colored region that the "-" is not a list bullet. But I'd be ok ignoring cases like that for now.. I've tried to be careful to design my structring rules so it can be as independant of colorizing as possible. -Edward From edloper@gradient.cis.upenn.edu Sat Apr 14 19:46:24 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sat, 14 Apr 2001 14:46:24 EDT Subject: [Doc-SIG] Syntax for fields Message-ID: <200104141846.f3EIkOp16056@gradient.cis.upenn.edu> I've been actually using my markup language to document some things, and my current syntax for fields seems somewhat problematic. I've been using expressions like:: author: Edward Loper param n: The size of the list to return. type n: C{int} return: A list containing all of the prime numbers between 2 and C{x}, inclusive. (The syntax is essentially the same as it is for list items, except that "\w+ \w+:" is the bullet instead of "-" or "(\d+\.)+") The problem is that there's too much overlap between the form of such expressions and the form of natural language expressions like:: Consider this: blah blah.. A problem: .sdf dfs... However: ... And I don't feel comfortable forbidding people to use expressions like that (among other things, it's just not something people will remember not to do). I was using the "param n:" style mainly because it's easy to read. One other option is to mimic javadoc, and do something like:: @author Edward Loper @param n The size of the list to return or:: @author: Edward Loper @param n: The size of the list to return or:: @author Edward Loper @param(n) The size of the list to return Another option might be to only allow captizlied field names:: AUTHOR: Edward Loper PARAM n: The size of the list to return Although this looks somewhat ugly to me. Yet another option would be to only count ":" or " :" as a field if is one of a small finite set of reserved words.. But that means we can never expand the tag set in a backwards-compatible way, and that we can't have alternative tag-sets for code written in different languages. Other ideas? Which (if any) of these alternatives are appealing? -Edward From guido@digicool.com Sat Apr 14 21:56:20 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 14 Apr 2001 15:56:20 -0500 Subject: [Doc-SIG] lightweight markup: bullets In-Reply-To: Your message of "Sat, 14 Apr 2001 13:05:12 EDT." <200104141705.f3EH5Cp06968@gradient.cis.upenn.edu> References: <200104141705.f3EH5Cp06968@gradient.cis.upenn.edu> Message-ID: <200104142056.PAA30488@cj20424-a.reston1.va.home.com> > For structuring, I think I have a set of such rules. I'll send out > mail about that when I've done more testing etc., but basically: > 1. all paragraphs *must* be left-justified > 2. all lists must be either indented or separated by a blank > line. > 3. The second and subsequent line of a list item must be indented > further than the bullet. All lines but the first must be > left-justified. > > Subsequent paragraphs in the same list item must line up with > that indentation level. I like this. > There are some more, but those are the basics required to avoid > mis-interpreting bullets.. The only true ambiguities you get with > rules like these are things like: > 1. This is a list item whose second line begins with the number > 1. Was that "1." a bullet or part of a sentence? Let's err on the side of caution and declare this is not a list item unless it's separated by a blank line. > > Really, it seems to me that such docstrings would make python code > > *less* readable, not more. > > Do you think that we should have any colorizing at all? If so, what > colors? People usually talk about *emphasis*, although I really very > rarely find it useful in docstrings (despite its usefulness in > *email*). The color I most often want is something to mark a token as > a python identifier (or, more generally, to mark a string as Python > code). Personally, I like having the *emphasis coloring*; I care less about coloring identifiers. My reasoning: sometimes it's *really* useful to be able to stress the importance of something without SHOUTING; but pieces of source code are easy enough to recognize without coloring: they just *look* different, e.g. foo(bar) is clearly a function call. When it's ambiguous, I'll put single or double quotes around it (e.g. when referencing the 'a' variable by itself) but I'm OK with seeing those quotes in the printed documentation as well; I'm *not* OK with seeing *emphasis* printed as "*emphasis*". One more thing: I'd like to argue against the use of a fixed-width font for in-line code examples. Typically this uses Courier, whose characters are *way* to wide for readability. I can understand why a fixed-width font is necessary in sample *blocks*, because *sometimes* (though not very often) there's code that is arranged in a tabular manner; but this argument doesn't apply to in-line code samples. > If we didn't do any colorizing, we would probably have: > - paragraphs (in which word-wrapping is legal, etc.) > - literal blocks (which are displayed as-is) > - doctest blocks (which are displayed as-is, or possibly colorized) > - lists (ordered and unordered) > - sections (and subsections) > > If there was no colorizing, I'm pretty sure we could get away with no > escaping mechanism (with carefully chosen structuring rules, it would > never be necessary). But I want an escaping mechanism. I want to be able to say e.g. "When I write "*foo*", all its characters are rendered, including the quotes and the stars, but when I write \*foo*, it is rendered in italics." (In other words, I want to be able to give an in-line example of the "*foo*" notation.) --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@beowolf.digicool.com Sat Apr 14 21:09:33 2001 From: fdrake@beowolf.digicool.com (Fred Drake) Date: Sat, 14 Apr 2001 16:09:33 -0400 (EDT) Subject: [Doc-SIG] [development doc updates] Message-ID: <20010414200933.0218628A09@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Final Python 2.1 documentation. From edloper@gradient.cis.upenn.edu Sat Apr 14 21:16:41 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sat, 14 Apr 2001 16:16:41 EDT Subject: [Doc-SIG] lightweight markup: bullets In-Reply-To: Your message of "Sat, 14 Apr 2001 15:56:20 CDT." <200104142056.PAA30488@cj20424-a.reston1.va.home.com> Message-ID: <200104142016.f3EKGfp24071@gradient.cis.upenn.edu> > > There are some more, but those are the basics required to avoid > > mis-interpreting bullets.. The only true ambiguities you get with > > rules like these are things like: > > 1. This is a list item whose second line begins with the number > > 1. Was that "1." a bullet or part of a sentence? > > Let's err on the side of caution and declare this is not a list item > unless it's separated by a blank line. Ok, I'll change it. But in any case, it will generate a warning, since it's potentially confusing (it will recommend that they move the "1." to the previous line or the "number" to the next line, or that they add a blank line if they intended to start a new list item). It also generates warnings for things like:: The following was probably a mistake: - This is not a list item. - Neither is this and:: The following was probably a mistake: - This is a list item; but this is a new paragraph, not a continuation of the list item. > > Do you think that we should have any colorizing at all? If so, what > > colors? People usually talk about *emphasis*, although I really very > > rarely find it useful in docstrings (despite its usefulness in > > *email*). The color I most often want is something to mark a token as > > a python identifier (or, more generally, to mark a string as Python > > code). > > Personally, I like having the *emphasis coloring*; I care less about > coloring identifiers. My reasoning: sometimes it's *really* useful to > be able to stress the importance of something without SHOUTING; Again, I very rarely find myself needing to do this in docstrings.. But maybe I'm not a representative sample. > but > pieces of source code are easy enough to recognize without coloring: > they just *look* different, e.g. foo(bar) is clearly a function call. > When it's ambiguous, I'll put single or double quotes around it > (e.g. when referencing the 'a' variable by itself) but I'm OK with > seeing those quotes in the printed documentation as well; It can be nice to have code colored for other reasons, but I don't think it's really a necessity.. > I'm *not* OK with seeing *emphasis* printed as "*emphasis*". How would you like to see *emphasis* rendered in a tty environment? Like "*this*"? Or just like "this", since emphasis should never really be *necessary* to make your point? This would apply to any tool that tries to print marked-up documentation from within Python, for example (similar to "help"). > One more thing: I'd like to argue against the use of a fixed-width > font for in-line code examples. Typically this uses Courier, whose > characters are *way* to wide for readability. I can understand why a > fixed-width font is necessary in sample *blocks*, because *sometimes* > (though not very often) there's code that is arranged in a tabular > manner; but this argument doesn't apply to in-line code samples. Yeah, I had been thinking about that, and I agree. But of course that's mainly a tool issue, not a markup language issue. (though not entirely). On a related note, I've been thinking that all spaces in in-line code should be soft. If you really need "x y" to come out with 2 spaces in it instead of one, you should use a literal block. I'm undecided about whether spaces in in-line code should be breakable.. Maybe leave that a tool issue. > > If we didn't do any colorizing, we would probably have: > > - paragraphs (in which word-wrapping is legal, etc.) > > - literal blocks (which are displayed as-is) > > - doctest blocks (which are displayed as-is, or possibly colorized) > > - lists (ordered and unordered) > > - sections (and subsections) I forgot to mention "fields," which allow you to do things like describe individual parameters, or the return value, or a class's instance variables, etc. > > If there was no colorizing, I'm pretty sure we could get away with no > > escaping mechanism (with carefully chosen structuring rules, it would > > never be necessary). > > But I want an escaping mechanism. I want to be able to say e.g. "When > I write "*foo*", all its characters are rendered, including the quotes > and the stars, but when I write \*foo*, it is rendered in italics." > (In other words, I want to be able to give an in-line example of the > "*foo*" notation.) Well, as I said *if there's no colorizing*, we don't need escaping. The second you introduce *emphasis* colorizing, or any other colorizing, we do need some type of escaping mechanism. And then we can talk about various escaping mechanisms (I've seen 3 workable ones: backslashing of some sort (e.g., \*); using X{..} notation (e.g., E{*} or E{lb}); and using a literal coloring (e.g. '*' or `*`).. Of course, the last one's not as complete, since you then can't include the literal character inline.. but at least that's 1 character instead of all the markup characters. But I think that, for now, it makes sense to postpone discussion of *both* colorizing and escaping (since they're clearly related) and to try to come up with a good definition for how we want structuring to work. Currently, the only open questions in my mind are where to draw the lines between errors and warnings, and how to write fields in such a way that they won't conflict with normal English usage. Any feedback would be much appreciated. I'll try to put up a link to my parser sometime soon, but it's getting towards the end of the semester, and I'm a bit swamped with projects. :) -Edward From guido@digicool.com Sun Apr 15 01:52:01 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 14 Apr 2001 19:52:01 -0500 Subject: [Doc-SIG] lightweight markup: bullets In-Reply-To: Your message of "Sat, 14 Apr 2001 16:16:41 EDT." <200104142016.f3EKGfp24071@gradient.cis.upenn.edu> References: <200104142016.f3EKGfp24071@gradient.cis.upenn.edu> Message-ID: <200104150052.TAA30895@cj20424-a.reston1.va.home.com> > > Let's err on the side of caution and declare this is not a list item > > unless it's separated by a blank line. > > Ok, I'll change it. But in any case, it will generate a warning, since > it's potentially confusing (it will recommend that they move the "1." > to the previous line or the "number" to the next line, or that they > add a blank line if they intended to start a new list item). It also > generates warnings for things like:: > > The following was probably a mistake: > - This is not a list item. > - Neither is this > > and:: > > The following was probably a mistake: > > - This is a list item; > but this is a new paragraph, not > a continuation of the list item. Good. > > Personally, I like having the *emphasis coloring*; I care less about > > coloring identifiers. My reasoning: sometimes it's *really* useful to > > be able to stress the importance of something without SHOUTING; > > Again, I very rarely find myself needing to do this in docstrings.. But > maybe I'm not a representative sample. Grep through Lib/*.py for ' \*[a-z][a-z]*\* '. Lots of examples (some in comments, but those are also documentation :-). > > but > > pieces of source code are easy enough to recognize without coloring: > > they just *look* different, e.g. foo(bar) is clearly a function call. > > When it's ambiguous, I'll put single or double quotes around it > > (e.g. when referencing the 'a' variable by itself) but I'm OK with > > seeing those quotes in the printed documentation as well; > > It can be nice to have code colored for other reasons, but I don't > think it's really a necessity.. Agreed. > > I'm *not* OK with seeing *emphasis* printed as "*emphasis*". > > How would you like to see *emphasis* rendered in a tty environment? > Like "*this*"? Or just like "this", since emphasis should never > really be *necessary* to make your point? This would apply to any > tool that tries to print marked-up documentation from within > Python, for example (similar to "help"). Since I went to the trouble of typing it, I'd like to see it rendered one way or another. Rendering as *foo* is fine. (Much better than inverse video!) > > One more thing: I'd like to argue against the use of a fixed-width > > font for in-line code examples. Typically this uses Courier, whose > > characters are *way* to wide for readability. I can understand why a > > fixed-width font is necessary in sample *blocks*, because *sometimes* > > (though not very often) there's code that is arranged in a tabular > > manner; but this argument doesn't apply to in-line code samples. > > Yeah, I had been thinking about that, and I agree. But of course that's > mainly a tool issue, not a markup language issue. (though not entirely). I know, I just wanted to throw it out while I was thinking of it. > On a related note, I've been thinking that all spaces in in-line > code should be soft. If you really need "x y" to come out with 2 > spaces in it instead of one, you should use a literal block. I'm undecided > about whether spaces in in-line code should be breakable.. Maybe leave > that a tool issue. Agreed, and I do think spaces in in-line code should be breakable. I write a lot of email with in-line code samples, and I often have no choice in letting it break -- and if I don't want it to be broken, I'll make it a block. > > > If we didn't do any colorizing, we would probably have: > > > - paragraphs (in which word-wrapping is legal, etc.) > > > - literal blocks (which are displayed as-is) > > > - doctest blocks (which are displayed as-is, or possibly colorized) > > > - lists (ordered and unordered) > > > - sections (and subsections) > > I forgot to mention "fields," which allow you to do things like describe > individual parameters, or the return value, or a class's instance > variables, etc. The Javadoc-style @ notation makes sense to me here -- as you showed, trying to do this without markup can be plain confusing. > But I think that, for now, it makes sense to postpone discussion of > *both* colorizing and escaping (since they're clearly related) and to > try to come up with a good definition for how we want structuring to > work. Currently, the only open questions in my mind are where to draw > the lines between errors and warnings, I say be strict. The tool should always be available and we should tweak all our docstrings until the tool is happy. > and how to write fields in such a > way that they won't conflict with normal English usage. Any feedback > would be much appreciated. I'll try to put up a link to my parser > sometime soon, but it's getting towards the end of the semester, and I'm > a bit swamped with projects. :) --Guido van Rossum (home page: http://www.python.org/~guido/) From edloper@gradient.cis.upenn.edu Sun Apr 15 01:14:32 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sat, 14 Apr 2001 20:14:32 EDT Subject: [Doc-SIG] __all__, and how it relates to doc tools Message-ID: <200104150014.f3F0EWp17579@gradient.cis.upenn.edu> I was looking through the changes made in Python 2.1, and noticed the "__all__" variable in modules. My understanding is that, when defined, this lists all of the variables that will be imported when you do "from xyz import *". Would it also be reasonable for a doc tool to look at this value, for an indication of which objects to document? This would be an easy way of preventing the doc tools from documenting: 1. "internal" objects (may be a good idea, may not be..) 2. imported modules and objects that were imported from other modules (most likely a good idea). (note that this is only an issue when we're documenting from within Python, not when we're parsing the file that we're documenting.) (I've had trouble documenting modules that run "from types import *", and seeing a bunch of Type objects defined by the module, etc..) Of course, if the __all__ variable is not defined, you'd still have to use whatever heuristics/rules you have to decide what to document.. And you'd probably want tools to have a flag that tells them to ignore the __all__ variable. But mainly I'm wondering whether this is consistant with the intended meaning of the __all__ variable? If not, tools shouldn't use it that way.. we've had enough trouble with people overloading variables already (pre-function-attribute __doc__ comes to mind). Also, would it be reasonable to only document the fields of a class listed in an __all__ class variable, if such a variable is defined? -Edward From edloper@gradient.cis.upenn.edu Sun Apr 15 01:15:27 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sat, 14 Apr 2001 20:15:27 EDT Subject: [Doc-SIG] lightweight markup: bullets Message-ID: <200104150015.f3F0FRp17667@gradient.cis.upenn.edu> Guido said: > Grep through Lib/*.py for ' \*[a-z][a-z]*\* '. Lots of examples (some > in comments, but those are also documentation :-). Ok, you're right. But I *still* think that we should defer that issue, given that we can. I'd like to get a markup language that we can all play with for a little while first, and then talk about how to add colorizing.. The only way that colorizing should be non-backwards- compatible is when you need to escape things. > > How would you like to see *emphasis* rendered in a tty environment? > [...] > Since I went to the trouble of typing it, I'd like to see it rendered > one way or another. Rendering as *foo* is fine. (Much better than > inverse video!) Agreed (not to mention that sometimes such fancy features as inverse video are not available). > I do think spaces in in-line code should be breakable. I > write a lot of email with in-line code samples, and I often have no > choice in letting it break -- and if I don't want it to be broken, > I'll make it a block. Agreed. Although if the tool wants to be nice, and try to avoid breaking in-line code, it's free to. But the markup language says that any in-line code *can* get broken at spaces. > > I forgot to mention "fields," which allow you to do things like describe > > individual parameters, or the return value, or a class's instance > > variables, etc. > > The Javadoc-style @ notation makes sense to me here -- as you showed, > trying to do this without markup can be plain confusing. Ok. Does anyone have objections to using Javadoc-style @ notation? Any votes on which of the various notations I wrote out that we should I was looking at the emacs java-mode, to see how they do colorizing in docstrings. It looks like they just colorize things like "@param x" and "@author" *wherever* they occur (assuming that these will appear rarely, if ever, in actual source code; and that when they do, it will be fairly harmless anyway). Would we be ok with doing something like that? Of course, IDLE could in theory be much smarter about it.. (And if we did eventually put something like this in emacs python mode, it would certainly be something you can turn on/off). > I say be strict. The tool should always be available and we should > tweak all our docstrings until the tool is happy. Ok. I'll err on the side of being strict then. One advantage of being strict is that it greatly reduces the need to hand-check the parser's output.. as long as running:: pytext.check_docstrings(module) or whatever succeeds, you're most likely fine. -Edward From ping@lfw.org Sun Apr 15 07:42:05 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Sun, 15 Apr 2001 01:42:05 -0500 (CDT) Subject: [Doc-SIG] Where to find the docs Message-ID: Where does the documentation normally reside on Unix and Mac platforms? (Or, alternate question: why isn't the documentation part of the main distribution archive on these platforms?) At the moment, pydoc is looking in: os.environ.get('PYTHONDOCS') Setting the environment variable has highest priority. Then: os.path.join(os.environ.get('PYTHONHOME'), 'doc') os.path.join(os.path.basename(sys.executable), 'doc') These work for Windows, and work for Mac *if* you choose to unpack the docs into the 'doc' subdirectory of Python's home -- but you have to rename the unpacked folder manually. Then, for Unix: '/usr/doc/python-docs-' + split(sys.version)[0] '/usr/doc/python-' + split(sys.version)[0] '/usr/doc/python-docs-' + sys.version[:3] '/usr/doc/python-' + sys.version[:3] The most logical place i would expect the docs to reside is /usr/doc/python-2.1. But the last Python documentation RPMs i installed used /usr/doc/python-docs-1.5.2 and /usr/doc/python-docs-2.0. Hence the above. Other RPMs i have seen put the documentation in /usr/share/doc/python-docs-2.0. Should this be added? Is there a standard place to look? Thanks, -- ?!ng From tim.one@home.com Sun Apr 15 08:14:50 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 15 Apr 2001 03:14:50 -0400 Subject: [Doc-SIG] __all__, and how it relates to doc tools In-Reply-To: <200104150014.f3F0EWp17579@gradient.cis.upenn.edu> Message-ID: [Edward D. Loper] > I was looking through the changes made in Python 2.1, and noticed > the "__all__" variable in modules. My understanding is that, when > defined, this lists all of the variables that will be imported > when you do "from xyz import *". Correct! That's it's only enforced semantics. More generally, it's meant to identify which names a module intends to export regardless of means, and in that larger sense it's more of a doc gimmick than a language feature. > Would it also be reasonable for a doc tool to look at this value, for > an indication of which objects to document? Absolutely. In fact, that's probably the best use. > ... > And you'd probably want tools to have a flag that tells them to > ignore the __all__ variable. I wouldn't: if a module lies about what it intends to export, that's a bug in the module. > ... > Also, would it be reasonable to only document the fields of a > class listed in an __all__ class variable, if such a variable is > defined? __all__ was a marginal idea even at the module level; I'd prefer not to see it spread. The practical problem at the module level was that import xyz also acts as an export of xyz (from "import *"'s POV), and usually an unintended export. There's no such problem at the class level. From edloper@gradient.cis.upenn.edu Sun Apr 15 08:53:32 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sun, 15 Apr 2001 03:53:32 EDT Subject: [Doc-SIG] __all__, and how it relates to doc tools In-Reply-To: Your message of "Sun, 15 Apr 2001 03:14:50 EDT." Message-ID: <200104150753.f3F7rWp28385@gradient.cis.upenn.edu> > > And you'd probably want tools to have a flag that tells them to > > ignore the __all__ variable. > > I wouldn't: if a module lies about what it intends to export, that's a bug > in the module. My guess is that most people won't put "private" functions/classes/ etc. in the __all__ list, but it still may be useful for a doc tool to be able to process the docstrings of the "private" objects.. This is similar to including a flag saying whether a doc tool should process private objects (ones starting with "_" or "__").. > __all__ was a marginal idea even at the module level; I'd prefer not > to see it spread. Ok. > The practical problem at the module level was that > > import xyz > > also acts as an export of xyz (from "import *"'s POV), and usually an > unintended export. There's no such problem at the class level. I would be surprised if people don't also use it to hide "private" objects. Is this something we want to discourage? (Of course, "private" objects are probably named with a _leading_backslash, and my understanding is that "from xyz import *" won't import such objects if __all__ is undefined.. so perhaps the question is moot..) Incidentally, if __all__ is defined, and it includes objects that begin with a "_", do those get imported (in "from xyz import *")? Or does the general rule that objects starting with "_" don't get imported override that? (I haven't had a chance to grab 2.1 and play with it yet..) -Edward From tim.one@home.com Sun Apr 15 09:26:04 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 15 Apr 2001 04:26:04 -0400 Subject: [Doc-SIG] __all__, and how it relates to doc tools In-Reply-To: <200104150753.f3F7rWp28385@gradient.cis.upenn.edu> Message-ID: [Edward D. Loper] > My guess is that most people won't put "private" functions/classes/ > etc. in the __all__ list, but it still may be useful for a doc tool to > be able to process the docstrings of the "private" objects.. This is > similar to including a flag saying whether a doc tool should process > private objects (ones starting with "_" or "__").. It's useful for a doc tool to have a notion of public and private class attributes, but naming conventions already exist to make those distinctions. It would be unPythonic to introduce another mechanism to do the same thing. > ... > I would be surprised if people don't also use it to hide "private" > objects. Is this something we want to discourage? Yes: the convention for module-private names has always been to begin them with an underscore. It wasn't the intent of __all__ to throw that rule away; although, frankly, I've never been clear on exactly why __all__ *was* added. The addition of "import name as _name" syntax made it convenient enough to do "non-exporting imports", as far as I was concerned. > ... > Incidentally, if __all__ is defined, and it includes objects that > begin with a "_", do those get imported (in "from xyz import *")? Yes, if an __all__ list is present, import* imports exactly the names it contains. From guido@digicool.com Sun Apr 15 14:12:23 2001 From: guido@digicool.com (Guido van Rossum) Date: Sun, 15 Apr 2001 08:12:23 -0500 Subject: [Doc-SIG] __all__, and how it relates to doc tools In-Reply-To: Your message of "Sun, 15 Apr 2001 03:14:50 -0400." References: Message-ID: <200104151312.IAA08960@cj20424-a.reston1.va.home.com> > > Would it also be reasonable for a doc tool to look at this value, for > > an indication of which objects to document? > > Absolutely. In fact, that's probably the best use. Hm. You may be right, but Ping told me that he had tried this in pyoc, and was unhapy with the result: too much stuff didn't get documented. So we should at least be willing to retract this idea. --Guido van Rossum (home page: http://www.python.org/~guido/) From dgoodger@atsautomation.com Mon Apr 16 14:39:24 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Mon, 16 Apr 2001 09:39:24 -0400 Subject: [Doc-SIG] backslashing Message-ID: Edward D. Loper wrote: > If you used E and E or E{lb} and E{rb} or something like that, > then regexps would generally look how they're supposed to (at least > when you print them). So would *any other* convention -- when you print them. The point is, what do they look like when you read them? Another point: mark up "x > y" as an inline literal. If you use C<>, you need to escape. If you use C{}, you need to escape for some other case. [me] > > Inline literals would be better:: > > > > The regexp `r"\."` matches a literal period. [Edward] > But then we have to say that inline literals can't ever contain "'".. > which in my mind is no better than saying that you can't backslash > '{' and '}'. No mention of "'" (single quote). I used "`" (backquote). Your email font can't distinguish the two. > Guido has objected to `literal` On 30 March, Guido wrote: > In many fonts, backtick is hard to distinguish from apostrophe! Two aspects: reading and writing. If you're reading the raw marked-up docstring/email/whatever, it doesn't *matter* if it shows up as a backquote or a single quote. As long as it appears quoted in some manner, the quoting has served its purpose. If you're reading the processed docstring, the `inline literal` (note: backquotes used) will be formatted in some way which makes the context obvious. If you're reading the raw text and debating about it, you'd better be using a font which distinguishes clearly between all ASCII characters (there are common fonts in which "(" and "{" are hardly distinguishable either). If you're *writing* the markup (or writing *about* it! :) you'd better be using a suitable font. After all, one day you might receive email saying:: What's wrong with this code? >>> hello = 'Dolly' >>> print `hello`, 'hello' Dolly hello It should print "hello Dolly"! [Guido] > This seems to come from a confusion between two similar, but > different goals: > > - It should be easy to read without any knowledge of the markup > language > > - It should be possible to author without knowing the whole markup > language and without changing your habits > > I can agree with the first one, but I think the second will continue > to get us into trouble. Agreed. So change your habits, change your mindset, everyone! Or at least, change your email font! ;-) Documentation is data, and markup is the equivalent of code. [Edward] > I think that if the reason we're rejecting X{} or X<> is > because it's "not readable," then there's no reason to accept #code#, > which to me is signifigantly less intuitive than C{code}. Yes. /DG From dgoodger@atsautomation.com Mon Apr 16 14:39:26 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Mon, 16 Apr 2001 09:39:26 -0400 Subject: [Doc-SIG] lists & blank lines (was re: backslashing) Message-ID: (subject changed to separate the issues) [I wrote:] > > Recently, I've come to the conclusion that requiring a blank line > > before the start of a list is reasonable and correct, even if we > > don't require blank lines between items. Minimizing ambiguity trumps > > minimizing vertical space. Edward D. Loper wrote: > That would make things easier. But we would also have to require that > sublists are surrounded by blank lines. [examples omitted] > Any objections to that? None. > The way my markup language currently works, > we don't have to worry about how to detect when a new list item > starts, because list item contents are required to be indented:: > > - this is a valid list > item. > > - This is not a valid > list item. Explicit wins the day. For list items, blank line & indentation ambiguity will bite us all someday, so removing ambiguity is good. /DG From edloper@gradient.cis.upenn.edu Mon Apr 16 16:39:28 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Mon, 16 Apr 2001 11:39:28 EDT Subject: [Doc-SIG] lists & blank lines (was re: backslashing) In-Reply-To: Your message of "Mon, 16 Apr 2001 09:39:26 EDT." Message-ID: <200104161539.f3GFdSp09663@gradient.cis.upenn.edu> > > The way my markup language currently works, > > we don't have to worry about how to detect when a new list item > > starts, because list item contents are required to be indented:: > > > > - this is a valid list > > item. > > > > - This is not a valid > > list item. > > Explicit wins the day. For list items, blank line & indentation > ambiguity will bite us all someday, so removing ambiguity is good. Um.. I'm not sure whether that means you're agreeing with me or disagreeing.. But the basic reasoning here was that there are a number of structural forms that are "ambiguous", in the sense that people use them to convey different structures.. For lists, the "ambiguous" structures that I thought of are: - xxxx x xxxx (one list item or a list item xx xx x xxxxx followed by a paragraph?) xxx xx xx xxx (one paragraph or a paragraph - xx x xxxxxx followed by a list item?) - xx x x xxxx (a list item with one para or with - x xxxxx x one para and one sublist?) - xx xx x xxx (one list item with a dash in its - x xx xx x x para or two list items?) (where "-" represents any bullet charcter, and "xxx" is text.) The problem is that people will decide which choice to read something as based on the text.. Which will lead to errors in writing formatted docstrings. The solution? Make all "ambigous" structures give either errors or warnings, and ask people to write them in unambiguous ways. To make any of them look like singe paras, simply re-word-wrap so that the "-" is not at the beginning of the line. To make them look like list items, indent list items and separate them with blank lines; and indent the contents of list items. This also makes it clear whether: - xx xx x xxx xx x xx xxx x is a list item with one para followed by a paragraph, or a list item with two paragraphs. The only ambiguity that this dosn't deal with is the last one I listed. But I decided that we could probably ignore that ambiguity, because if anyone *does* try to make it one paragraph with an embedded bullet, it's unreadable anyway: 1. This is a list 2. This list item talks about the number 1. 1. is a good number. 3. That was confusing. If you disagree, we could require blank lines between list items. The main disadvantage there is that it would add a fair amount of blank space (fields obey the same rules as lists, so you'd have to say:: @param x: ... @param y: ... @returns: ... @raises: ... instead of:: @param x: ... @param y: ... @returns: ... @raises: ... ) The other advantage of this set of rules is that it allows us to completely separate colorizing from structuring.. Which means that we can temporarily put colorizing aside, and concentrate on what we want our structuring rules to do. -Edward From edloper@gradient.cis.upenn.edu Mon Apr 16 17:09:42 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Mon, 16 Apr 2001 12:09:42 EDT Subject: [Doc-SIG] backslashing In-Reply-To: Your message of "Mon, 16 Apr 2001 09:39:24 EDT." Message-ID: <200104161609.f3GG9hp12691@gradient.cis.upenn.edu> >> If you used E and E or E{lb} and E{rb} or something like that, >> then regexps would generally look how they're supposed to (at least >> when you print them). > > So would *any other* convention -- when you print them. The point > is, what do they look like when you read them? Not when you print them with "print foo.__doc__"; only when you use some tool to interpret them and print them.. > Another point: mark up "x > y" as an inline literal. If you use C<>, > you need to escape. If you use C{}, you need to escape for some > other case. True. But the advantage of C{} is that we can say that X{} is markup for X=[A-Z], but any other *nested* {}s will be printed as {}s (so, e.g., you can say C{ {1:'a', 2:'b'} })... Which means that you *almost* never need to use explicit escaping. (you need it if you want to talk about "{"s and "}"s themselves, as opposed to objects defined with them.. or if you want to put a capital letter before a "{"). In particular, I searched for all "{"s in the standard library and the other packages I have installed on my system, and found no examples where they would need to be escaped.. Once you introduce "\" as an escape character, though, all sorts of "\"s now need to be escaped.. And I don't really like the convention of keeping "\"s if they appear before something that doesn't require escaping.. It taxes my brain too much. I guess I'm trying to go on the principle of keeping the need to escape characters to a minimum, because whatever escaping mechanism we have, it'll be somewhat ugly/difficult to read. -Edward From edloper@gradient.cis.upenn.edu Mon Apr 16 17:57:35 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Mon, 16 Apr 2001 12:57:35 EDT Subject: [Doc-SIG] backslashing In-Reply-To: Your message of "Mon, 16 Apr 2001 09:39:24 EDT." Message-ID: <200104161657.f3GGvZp17069@gradient.cis.upenn.edu> David Quoted: > [I said] > > Guido has objected to `literal` > > [Guido said] > > In many fonts, backtick is hard to distinguish from apostrophe! > > [response omitted] That wasn't the objection I was refering to... I was referring to: [Guido said] > I don't like `...`, because (a) it means something very specific in > Python (and in the Unix shell), (b) it's hard to distinguish from > '...' in some fonts, and (c) except for the `...` Python and shell > notation, I expect ` to be closed with '. (well, I guess part (b) is the same, and I agree that that's not a real objection.. The only time when you'll be looking at docstrings as raw text will most likely be either in your code editor or in Python. And hopefully your font for both those environments distinguishes apostrophe from backtick, because otherwise you'll have a lot of trouble coding..) We might be able to convince Guido to let go of (a) and (c). I personally strongly favor using `...` (backticks) over '...' (apostrophes), since apostrophes are fairly overloaded in natural language already.. -Edward From dgoodger@atsautomation.com Mon Apr 16 19:00:10 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Mon, 16 Apr 2001 14:00:10 -0400 Subject: [Doc-SIG] lists & blank lines (was re: backslashing) Message-ID: [Edward D. Loper] > > > The way my markup language currently works, > > > we don't have to worry about how to detect when a new list item > > > starts, because list item contents are required to be indented:: > > > > > > - this is a valid list > > > item. > > > > > > - This is not a valid > > > list item. [me] > > Explicit wins the day. For list items, blank line & indentation > > ambiguity will bite us all someday, so removing ambiguity is good. [Edward] > Um.. I'm not sure whether that means you're agreeing with me or > disagreeing. Agreeing. Remove ambiguity. Require blank lines & intentation to make lists explicit. > But the basic reasoning here was that there are > a number of structural forms that are "ambiguous", in the sense > that people use them to convey different structures.. For lists, > the "ambiguous" structures that I thought of are: > > - xxxx x xxxx (one list item or a list item > xx xx x xxxxx followed by a paragraph?) Item followed by paragraph, with warning. Or error. > xxx xx xx xxx (one paragraph or a paragraph > - xx x xxxxxx followed by a list item?) One paragraph, with warning. > - xx x x xxxx (a list item with one para or with > - x xxxxx x one para and one sublist?) One para, no sublist, with warning. > - xx xx x xxx (one list item with a dash in its > - x xx xx x x para or two list items?) Two list items (assuming the list was started properly, of course). > The problem is that people will decide which choice to read > something as based on the text. > The solution? Make all "ambigous" structures give either errors > or warnings, and ask people to write them in unambiguous ways. Yes. > This also makes it clear whether: > > - xx xx x xxx > > xx x xx xxx x > > is a list item with one para followed by a paragraph, or a list > item with two paragraphs. The former. > The only ambiguity that this dosn't deal with is the last one I > listed. Which one was that? Unclear. > If you disagree, we could require blank lines between list items. Unnecessary (ie, I agree; I *don't* disagree ;). We don't need blank lines between items if the rules for list item indentation are explicit. Blank lines were required by StructuredText because it makes parsing easy, but there were many complaints about wasted vertical space. These rules make an unambiguous solution. /DG From dgoodger@atsautomation.com Mon Apr 16 19:12:33 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Mon, 16 Apr 2001 14:12:33 -0400 Subject: [Doc-SIG] field syntax (was re: lists & blank lines) Message-ID: [Edward Loper] > @param x: ... I'm not a big fan of the JavaDoc @ syntax, but I don't know of a better inline syntax for keyword-tagged values. (I did propose a [directive-based syntax]_; search for "keyword".) I propose that until a clearly superior syntax is discovered/revealed, we leave these out of the discussion (unnecessary complication). /DG .. _directive-based syntax: http://mail.python.org/pipermail/doc-sig/2000-November/001241.html From edloper@gradient.cis.upenn.edu Mon Apr 16 19:28:57 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Mon, 16 Apr 2001 14:28:57 EDT Subject: [Doc-SIG] lists & blank lines (was re: backslashing) In-Reply-To: Your message of "Mon, 16 Apr 2001 14:00:10 EDT." Message-ID: <200104161828.f3GISwp24578@gradient.cis.upenn.edu> > > - xxxx x xxxx (one list item or a list item > > xx xx x xxxxx followed by a paragraph?) > > Item followed by paragraph, with warning. Or error. Yes. (currently a warning in my parser -- asks you to add a blank line) > > xxx xx xx xxx (one paragraph or a paragraph > > - xx x xxxxxx followed by a list item?) > > One paragraph, with warning. Yes. (currently a warning in my parser -- asks you to re-word wrap the paragraph, or to separate & indent if you intended to start a list) > > - xx x x xxxx (a list item with one para or with > > - x xxxxx x one para and one sublist?) > > One para, no sublist, with warning. Yes. (currently a warning in my parser -- asks you to re-word wrap the paragraph, or to add a blank line if you intended to start a sublist) > > - xx xx x xxx (one list item with a dash in its > > - x xx xx x x para or two list items?) > > Two list items (assuming the list was started properly, of course). Yes. And if the 2 bullets are of different types, it's a warning, because lists should be separated by blank lines. > > This also makes it clear whether: > > > > - xx xx x xxx > > > > xx x xx xxx x > > > > is a list item with one para followed by a paragraph, or a list > > item with two paragraphs. > > The former. Yes. > > The only ambiguity that this dosn't deal with is the last one I > > listed. > > Which one was that? Unclear. In theory, someone could read the following as a single list item, even though our rules say its two:: 1. I like the number e. This number is approximately equal to 2.71828182846. But it's irrational, so that's an approximation. Similarly with something like:: - To find the result, simply take C{x - y}. or even:: - I like numbers that are prime, like 2. I also like odd numbers. But I would argue that these are so hard to read, that we can basically ignore them.. Note that when I say "ambiguous," I don't mean ambiguous according to the markup language rules.. I mean that it seems possible that someone would read it one way or the other, given that they don't know the rules of the markup language. It's also related to the question of whether it's possible to make 's word-wrapping work properly with the formatted documentation strings. > (ie, I agree; I *don't* disagree ;). Good. Does anyone else disagree, or can we tentatively move on? :) -Edward From edloper@gradient.cis.upenn.edu Mon Apr 16 19:45:35 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Mon, 16 Apr 2001 14:45:35 EDT Subject: [Doc-SIG] field syntax (was re: lists & blank lines) In-Reply-To: Your message of "Mon, 16 Apr 2001 14:12:33 EDT." Message-ID: <200104161845.f3GIjZp26051@gradient.cis.upenn.edu> > [Edward Loper] > > @param x: ... > > I'm not a big fan of the JavaDoc @ syntax, but I don't know of a better > inline syntax for keyword-tagged values. (I did propose a [directive-based > syntax]_; search for "keyword".) As I've said before, I'm not terribly attatched to the JavaDoc syntax. But it seems to me to make sense to handle field lists as follows: 1. Each field begins with a "bullet", like "@returns" or "returns:" or ".. Keywords::" or whatever we agree on. These should be recognizable using a regexp that doesn't depend on the actual words (i.e., *not* "@returns|@param|...", but "@\w+\b"). 2. Fields act just like list items. 3. The field list must be the last thing in a docstring, and it must be separated by a blank line (unless it's the only thing in the docstring). If you want, we could require that it be indented -- then, its syntax would be essentially identical to list syntax. As I understand it, your "directive-based syntax" would mainly fit this model.. Except that I require the contents of each directive to be indented. Note that you are not required to start a paragraph on the line that a list bullet is on.. You can write list items like this if you want:: 1. Paragraph one for list item (1). Paragraph two for list item (1). The only other difference would be that, under my scheme, the contents of a directive have to be properly formatted formatted text; where under your scheme it seems like they can be anything. As a side note, you called this "inline syntax," but I think of "inline" as being things that occur within a paragraph.. This is "structural syntax" in my mind. The reason I'd support JavaDoc rather than something like ".." is because it's no less readable, and it's already a somewhat established conventions (there are a fair number of javadoc-clones out there for other languages). On the other hand, we might not want people to get confused, and think that our markup language is the same as javadoc's... :) Also, "@\w+" occurs very rarely under natural circumstances (although perhaps the same can be said of ".. \w+::". > I propose that until a clearly superior > syntax is discovered/revealed, we leave these out of the discussion > (unnecessary complication). This is a feature that I'm very interested in making sure that the markup language includes. As such, I'd like to keep it on the table, even if it's off to the side. :) (I see this feature as being more important than the ability to use lists or colorizing..) -Edward From dgoodger@atsautomation.com Mon Apr 16 21:14:22 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Mon, 16 Apr 2001 16:14:22 -0400 Subject: [Doc-SIG] field syntax (was re: lists & blank lines) Message-ID: [Edward Loper] > 3. The field list must be the last thing in a docstring Why? What about PEPs? > I require the contents of each directive > to be indented. Why? And what do you mean by "directive" here? (Note that my proposed directive syntax was for arbitrary language extension. The keyword-tagged values directive was just an example.) > Note that you are not required to start a paragraph > on the line that a list bullet is on.. You can write list items > like this if you want:: > 1. > Paragraph one for list item (1). > > Paragraph two for list item (1). I'm confused. So what? And why would we want this? > The only other difference would be that, under my scheme, the contents > of a directive have to be properly formatted formatted text; where > under your scheme it seems like they can be anything. Not "anything", but directive-dependent. In other words, for the keywords example, given the directive:: .. keywords:: The next lines are expected to be of the form "keyword: value". (Beyond that, I didn't specify; it was only an example of what could be done.) > As a side note, you called this "inline syntax," but I think > of "inline" > as being things that occur within a paragraph.. This is "structural > syntax" in my mind. Sorry, my bad. I meant character-based syntax (in this case "@"-based), as opposed to explicit directive-based. > This is a feature that I'm very interested in making sure that the > markup language includes. Keyword-tagged values have been discussed in the past on Doc-SIG. If they're that important to you, I'd suggest you go through the archives, list up all proposed alternatives, analyze & summarize. Otherwise, history repeats. > I see this feature as being more > important than the ability to use lists or colorizing..) I don't. Everyone has their own agenda, their own priorities. Beware that yours don't become a stumbling block for others' acceptance. :) One problem with getting a Setext/StructuredText derivative to satisfy everyone's needs is that the more characters we use as markup, the more complex it becomes. Another is that the available characters are limited. Are keyword-tagged values important enough to warrant the use of another character for their syntax? Edward's answer is obviously "yes". Mine was "no" (also because "@" isn't obvious/intuitive), and so I proposed a general explicit solution to future extension. /DG From edloper@gradient.cis.upenn.edu Mon Apr 16 21:35:33 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Mon, 16 Apr 2001 16:35:33 EDT Subject: [Doc-SIG] field syntax (was re: lists & blank lines) In-Reply-To: Your message of "Mon, 16 Apr 2001 16:14:22 EDT." Message-ID: <200104162035.f3GKZYp06669@gradient.cis.upenn.edu> > [Edward Loper] > > 3. The field list must be the last thing in a docstring > > Why? What about PEPs? I've been trying to design a ML for formatted docstrings; if it works for other domains, great, if not, too bad. I don't want to impede progress because we're trying to solve a more general problem than we really need to. > > I require the contents of each directive > > to be indented. > > Why? And what do you mean by "directive" here? (Note that my proposed > directive syntax was for arbitrary language extension. The keyword-tagged > values directive was just an example.) Why: so we know when they end. I think we may be talking about 2 different things here, though, and that may be a stumbling block. The functionality that I want is basically what JavaDoc implements with "@tags". I've used JavaDoc, and other similar systems for other programming languages, and it is *very* useful. E.g., it lets you have a little "section" to describe each parameter, the return value, etc. I'm not sure that this is the same thing that you're calling "keyword-tagged values." Maybe it is.. But I'd like to be able to say something semantically equivalant to:: @param elt The initial element. @param n The size of the list. in my docstrings for functions/methods. Also, I'd like to be able to include multiple paragraphs, lists, etc., in the description of a parameter. > > Note that you are not required to start a paragraph > > on the line that a list bullet is on.. You can write list items > > like this if you want:: > > 1. > > Paragraph one for list item (1). > > > > Paragraph two for list item (1). > > I'm confused. So what? And why would we want this? If it confuses you, ignore it; it's not really important. > > The only other difference would be that, under my scheme, the contents > > of a directive have to be properly formatted formatted text; where > > under your scheme it seems like they can be anything. > > Not "anything", but directive-dependent. Yes. But from the parser's point of view, it can be anything, because it doesn't know what extensions you'll be using. Some later stage (after the parser) will put restrictions on it.. > > This is a feature that I'm very interested in making sure that the > > markup language includes. > > Keyword-tagged values have been discussed in the past on Doc-SIG. If they're > that important to you, I'd suggest you go through the archives, list up all > proposed alternatives, analyze & summarize. Otherwise, history repeats. I've been going through the archives, on and off, and haven't seen that many *different* proposals that deal with what I'm trying to do.. But I guess I'll keep looking. > > I see this feature as being more > > important than the ability to use lists or colorizing..) > > I don't. Everyone has their own agenda, their own priorities. Beware that > yours don't become a stumbling block for others' acceptance. :) Fine, as long as we all agree that our main agenda is to develop a markup language for use with docstrings. > One problem with getting a Setext/StructuredText derivative to satisfy > everyone's needs is that the more characters we use as markup, the more > complex it becomes. Which is one of the reasons I'm trying to get as much milage out of indentation as I can.. :) > Another is that the available characters are limited. True. Although, it's not a character we're taking away. It's the ability to start a paragraph with "@\w+\b". Just like bullets, the @ will be treated as @ in any other circumstance. > Are keyword-tagged values important enough to warrant the use of another > character for their syntax? Edward's answer is obviously "yes". I believe that having keyword-tagged values, or whatever we want to call them, is worth removing the ablity to start paragraphs with "@\w+\b". > Mine was "no" (also because "@" isn't obvious/intuitive) The obvious/intuitive reason seems better to me, although I don't see starting paragraphs with ".." as being any more intuitive.. The problem is that if you use something intuitive, like:: author: Edward Loper param size: The radius of the planet, in miles Then you're much more likely to prevent people from saying things they want to say, like:: However: ... > and so I proposed a general explicit solution to future extension. Which may be a good thing (although I would argue that directives should end on a return to the indentation that introduced them, esp. since this is consistant with the other use of "::").. But I think that these "keyword-tagged values" are central enough to the task of writing docstrings (especially for functions and methods; but also for describing class variables, etc) that they can be given their own syntax.. (Well, not quite their own -- they're really just lists with funny looking bullets that must appear at the top level and at the end of the docstring) -Edward From dgoodger@atsautomation.com Mon Apr 16 21:48:18 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Mon, 16 Apr 2001 16:48:18 -0400 Subject: [Doc-SIG] field syntax (was re: lists & blank lines) Message-ID: [Edward Loper, referring to fields] > ... and at the end of the docstring) Again, why? Why restrict fields to the end of a docstring? Seems artificial to me. /DG From dgoodger@atsautomation.com Mon Apr 16 22:00:32 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Mon, 16 Apr 2001 17:00:32 -0400 Subject: [Doc-SIG] lists & blank lines (was re: backslashing) Message-ID: [Edward D. Loper] > In theory, someone could read the following as a single list item, > even though our rules say its two:: > > 1. I like the number e. This number is approximately equal to > 2.71828182846. But it's irrational, so that's an approximation. I'd say this is just another example of: > > > - xxxx x xxxx (one list item or a list item > > > xx xx x xxxxx followed by a paragraph?) > > > > Item followed by paragraph, with warning. Or error. > > Yes. (currently a warning in my parser -- asks you to add a > blank line) As is: > - I like numbers that are prime, like > 2. I also like odd numbers. This one is two bulleted items: > - To find the result, simply take C{x > - y}. (Unless the C{} syntax is used, in which case it's a single malformed item [second line should be indented] or an item followed by a paragraph [should be a blank line, and "C{x" should trigger an error]. In any case, it warrants a warning.) > Note that when I say "ambiguous," I don't mean ambiguous according to > the markup language rules.. I mean that it seems possible that someone > would read it one way or the other, given that they don't know the > rules of the markup language. Humans can parse text much more flexibly than software. Make the software (markup rules) quite strict, so that a text passing through the software without errors or warnings has no chance for ambiguity at the human-level. The best you can do is make the software say, "I don't understand what you mean here." Timbot's rule 12: "In the face of ambiguity, refuse the temptation to guess." /DG From edloper@gradient.cis.upenn.edu Mon Apr 16 22:23:19 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Mon, 16 Apr 2001 17:23:19 EDT Subject: [Doc-SIG] lists & blank lines (was re: backslashing) In-Reply-To: Your message of "Mon, 16 Apr 2001 17:00:32 EDT." Message-ID: <200104162123.f3GLNJp12024@gradient.cis.upenn.edu> > [Edward D. Loper] > > In theory, someone could read the following as a single list item, > > even though our rules say its two:: > > > > 1. I like the number e. This number is approximately equal to > > 2.71828182846. But it's irrational, so that's an approximation. > > I'd say this is just another example of: > > > > > - xxxx x xxxx (one list item or a list item > > > > xx xx x xxxxx followed by a paragraph?) > > > > > > Item followed by paragraph, with warning. Or error. > > As is: > > > > - I like numbers that are prime, like > > > 2. I also like odd numbers. But there's an important difference here. A parser will give a warning for the second and third examples, but won't for the first example. I would prefer to be able to say "if something might be ambiguous to people, then we either issue a warning or an error." But in the example about liking e, that rule doesn't hold. > > - To find the result, simply take C{x > > - y}. > > (Unless the C{} syntax is used, in which case it's a single > malformed item [second line should be indented] or an item followed > by a paragraph [should be a blank line, and "C{x" should trigger an > error]. In any case, it warrants a warning.) The intention was that C{..} was used to stand for whatever colorizing we decide we like. In that case, I agree that it should be 2 errors (mismatched delimiters) and possibly a warning. > Humans can parse text much more flexibly than software. Make the > software (markup rules) quite strict, so that a text passing through > the software without errors or warnings has no chance for ambiguity > at the human-level. The best you can do is make the software say, > "I don't understand what you mean here." Timbot's rule 12: "In the > face of ambiguity, refuse the temptation to guess." That's been my goal so far. But the problem is deciding what's ambiguous... -Edward From edloper@gradient.cis.upenn.edu Mon Apr 16 22:26:20 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Mon, 16 Apr 2001 17:26:20 EDT Subject: [Doc-SIG] field syntax (was re: lists & blank lines) In-Reply-To: Your message of "Mon, 16 Apr 2001 16:48:18 EDT." Message-ID: <200104162126.f3GLQKp12374@gradient.cis.upenn.edu> > [Edward Loper, referring to fields] > > ... and at the end of the docstring) > > Again, why? Why restrict fields to the end of a docstring? Seems > artificial to me. It is somewhat artificial. The reasoning was as follows: the position of the fields does not convey any semantic information; tools are likely to disregard the position when formatting their output. If we let people put them wherever they want in the docstring, then they may assume that they will appear in that position in the output of doc formatting tools (i.e., that their position *does* convey semantic information). This is dangerous, and should be stamped out. :) So, put all fields at the end, so no one will get confused. -Edward From dgoodger@atsautomation.com Mon Apr 16 23:23:35 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Mon, 16 Apr 2001 18:23:35 -0400 Subject: [Doc-SIG] lists & blank lines (was re: backslashing) Message-ID: [Edward D. Loper] > > > In theory, someone could read the following as a single list item, > > > even though our rules say its two:: > > > > > > 1. I like the number e. This number is approximately equal to > > > 2.71828182846. But it's irrational, so that's an > approximation. > > > > I'd say this is just another example of: > > > > > > > - xxxx x xxxx (one list item or a list item > > > > > xx xx x xxxxx followed by a paragraph?) > > > > > > > > Item followed by paragraph, with warning. Or error. > > But there's an important difference here. A parser will give a > warning for the second and third examples, but won't for the > first example. I would prefer to be able to say "if something > might be ambiguous to people, then we either issue a warning > or an error." But in the example about liking e, that rule > doesn't hold. Sure it does. It's an enumerated list item ("1.") followed by an unindented line, therefore another paragraph not part of the first item (this should trigger a warning unless it's another item in the same list). The second line is not an enumerated list item, since: (a) the label isn't of a standard pattern the same as the first item ("\d+\. "; no space after the "2."; I don't think we should allow floating-point enumerators, hm? :); (b) the label isn't sequential with the first item's label (1 + 1 != 2.718...); (c) if we permit nested lists through compound enumerators, sublists must start with "1" or equivalent, and this one doesn't. Convinced? If not, why *would* the parser pass the second line through unchallenged? Please show your work ;-) /DG From dgoodger@atsautomation.com Mon Apr 16 23:32:02 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Mon, 16 Apr 2001 18:32:02 -0400 Subject: [Doc-SIG] field syntax (was re: lists & blank lines) Message-ID: We've monopolized Doc-SIG all day, might as well continue... [Edward Loper, referring to fields] > > > ... and at the end of the docstring) > > > > Again, why? Why restrict fields to the end of a docstring? Seems > > artificial to me. > > It is somewhat artificial. The reasoning was as follows: the > position of the fields does not convey any semantic information; > tools are likely to disregard the position when formatting > their output. If we let people put them wherever they want in > the docstring, then they may assume that they will appear in > that position in the output of doc formatting tools (i.e., that > their position *does* convey semantic information). This is > dangerous, and should be stamped out. :) So, put all fields > at the end, so no one will get confused. I don't see why your first assumption should hold true. It is the foundation of the rest of your argument. I think you need to define your concept of "fields" better for us here on the SIG (note: assume no previous knowledge of JavaDoc). Give a detailed example. Why isn't position significant? What about field order? Sounds like you're describing a dictionary-like structure associated with each docstring. Can a field be used more than once, or must each field be unique per docstring? /DG From dgoodger@atsautomation.com Mon Apr 16 23:41:41 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Mon, 16 Apr 2001 18:41:41 -0400 Subject: [Doc-SIG] backslashing Message-ID: [Edward D. Loper] > >> If you used E and E or E{lb} and E{rb} or > something like that, > >> then regexps would generally look how they're supposed to (at least > >> when you print them). > > > > So would *any other* convention -- when you print them. The point > > is, what do they look like when you read them? > > Not when you print them with "print foo.__doc__"; only when you use > some tool to interpret them and print them.. Not following you. This argues in favour of plaintext-transparent markup like backquotes, not E etc. Perhaps some examples of what you mean? > Once you introduce "\" as an escape character, though, all sorts of > "\"s now need to be escaped.. And I don't really like the convention > of keeping "\"s if they appear before something that doesn't require > escaping.. It taxes my brain too much. Please show (with examples requiring internal escaping) the alternatives. > I guess I'm trying to go on the principle of keeping the need to > escape characters to a minimum, because whatever escaping mechanism we > have, it'll be somewhat ugly/difficult to read. I think that's inevitable. Please prove me wrong! However, although I'm sure an escape mechanism is needed, I'm also sure it will only rarely be needed. /DG From edloper@gradient.cis.upenn.edu Mon Apr 16 23:50:25 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Mon, 16 Apr 2001 18:50:25 EDT Subject: [Doc-SIG] field syntax (was re: lists & blank lines) In-Reply-To: Your message of "Mon, 16 Apr 2001 18:32:02 EDT." Message-ID: <200104162250.f3GMoPp19516@gradient.cis.upenn.edu> > > > 1. I like the number e. This number is approximately equal to > > > 2.71828182846. But it's irrational, so that's an > (a) the label isn't of a standard pattern the same as the first item > ("\d+\. "; no space after the "2."; I don't think we should allow > floating-point enumerators, hm? :); > (b) the label isn't sequential with the first item's label > (1 + 1 != 2.718...); > (c) if we permit nested lists through compound enumerators, sublists must > start with "1" or equivalent, and this one doesn't. > > Convinced? If not, why *would* the parser pass the second line through > unchallenged? Please show your work ;-) Sorry, you're right, I wasn't being explicit enough. I was assuming that ordered list bullets were "(\d+\.)+", because that's what we decided last time around the loop.. The idea was that people might want to say "2.1." or something. But I don't have any problem with restricting ordered list bullets to "\d+\.". But the problem still exists, albeit in a more rare form: 1. I like the number 3. It comes right after the number 2. It comes right before the number 4. But I think that really we agree. I'm just saying that *in principle* it's ambiguous to a reader, but that any sane reader would complain about it anyway.. so we can ignore that ambiguity. On a side note, I'm not sure whether we should enforce (b) and (c). I guess my gut instinct would be to generate a warning for them, but not an error.. They prevent people from having an enumerated list that's intersperced with text (e.g., that's normally done in math papers with the math formulas..). I guess that's not a great loss, though, in the context of writing docstrings. -Edward From edloper@gradient.cis.upenn.edu Tue Apr 17 17:27:28 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Tue, 17 Apr 2001 12:27:28 EDT Subject: [Doc-SIG] field syntax (was re: lists & blank lines) In-Reply-To: Your message of "Mon, 16 Apr 2001 18:32:02 EDT." Message-ID: <200104171627.f3HGRSp28953@gradient.cis.upenn.edu> > I think you need to define your concept of "fields" better for us > here on the SIG (note: assume no previous knowledge of > JavaDoc). Give a detailed example. Why isn't position significant? > What about field order? Sounds like you're describing a > dictionary-like structure associated with each docstring. Can a > field be used more than once, or must each field be unique per > docstring? Sorry, you're right. If you have time, you can look at the JavaDoc home page: or at a sample of the output of JavaDoc: >From the JavaDoc page: A doc comment is made up of two parts -- a description followed by zero or more tags, with a blank line (containing a single asterisk "*") between these two sections: /** * This is the description part of a doc comment * * @tag Comment for the tag */ The first part describes the object being documented; the second part essentially sets up a multi-map from keys to formatted doc strings. - Certain tags are paramatrized, such as "@param", which takes a parameter, and gives a description of it. - Some tags can be repeated (e.g., "@see"); others can't (e.g., you can't have 2 "@param"'s with the same parameter. - It is assumed that when these "fields" (=tag+value) are output, they will be put in special sections. (see thesample output of JavaDoc). An example of a formatted doc string with a field (from the formatted doc string parser I've been writing) is:: def _tokenize_literal(lines, start, block_indent, tokens, warnings): """ Construct a C{Token} containing the literal block starting at C{lines[start]}, and append it to C{tokens}. C{block_indent} should be the indentation of the literal block. Any warnings generated while tokenizing the literal block will be appended to C{warnings}. @param lines: The list of lines to be tokenized. @param start: The index into C{lines} of the first line of the literal block to be tokenized. @param block_indent: The indentation of C{lines[start]}. This is the indentation of the literal block. @param warnings: A list of the warnings generated by parsing. Any new warnings generated while tokenizing this literal block will be appended to this list. @return: The line number of the first line following the literal block. @type lines: C{list} of C{string} @type start: C{int} @type block_indent: C{int} @type warnings: C{list} of C{ParseError} @rtype: C{int} """ It doesn't matter to me what syntax we use. Another alternative that's been suggested is to do something like:: ... Arguments: lines -- The list of lines to be tokenized. start -- The index into C{lines} of the first line of the literal block to be tokenized. block_indent -- The indentation of C{lines[start]}. This is the indentation of the literal block. warnings -- A list of the warnings generated by parsing. Any new warnings generated while tokenizing this literal block will be appended to this list. return -- The line number of the first line following the literal block. ... But semantically, the idea is to associate a description with each of a number of pre-defined entities, such as the parameters of a method. Tags defined by Javadoc are: @see (a single see-also link; can repeat) @author (an author; can repeat) @version (the object's version) @param (a function/method param; takes an argument) @return (the return value of a function/method) @exception (a description of an exception that a function/ method can raise; takes an argument (the exception)) @since (minimum version needed to use it) @depreciated (object is depreciated; description of why) I think there are a few more, but that's probably a representative sample.. I find that the output you can produce with fields is easier to read/use than the output you can produce without them. (See the HTML and LaTeX versions of the Java library API).. Of course, we don't really *need* them. In my mind, the only necessary features for a formatted docstring language are: - paragraphs - literal blocks - maybe doctest blocks But I'd like to see them included. Of course, you don't have to use them if you don't want to. But I think that most people will find them useful if they try using them.. -Edward From edloper@gradient.cis.upenn.edu Tue Apr 17 21:04:29 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Tue, 17 Apr 2001 16:04:29 EDT Subject: [Doc-SIG] backslashing In-Reply-To: Your message of "Mon, 16 Apr 2001 18:41:41 EDT." Message-ID: <200104172004.f3HK4Tp20629@gradient.cis.upenn.edu> Basically what I'm trying to avoid here is having the escaping mechanism itself be responsible for most of the cases where we need to use escaping. I would argue that with the following rules, you almost never need escaping. Of course, this is only relevant if we end up using colorizing like C{this}; if we decide on some other colorizing mechanism, then the following is moot.. 1. All curley braces ({}) *must* be properly nested 2. If an open curly brace is preceeded by a capital letter, then it and its matching brace signify colorizing. 3. If an open curly brace is not preceeded by a capital letter, then it and its matching brace should be rendered as braces. Given these rules, when do we need to do escaping? 1. If we want to use unmatched curly braces. Generally this is only true when we're talking about the braces themselves, not when we're using them to talk about something else (e.g., using them to write a Python dictionary). 2. If we want to preceed an open brace by a capital letter. I can't think of any case where this would be necessary, other than when you're talking about the markup language itself, or something similar? How can we escape in these situations? 1. By putting the entity in a literal block. This seems to me more applicable to (2) above than to (1). 2. By using an escape like E{lb}. How do we evaluate whether this is a good solution? 1. How ugly is it to do escaping? 2. How often do we need to do escaping? I would argue that this solution has a relatively high value for (1) (higher than backslashing, anyway), but a very low value for (2). In particular, I was unable to find *any* occurances in the docstrings in the standard library of characters that needed to be escaped.. Now compare to backslashes. There are 2 possible ways to do backslashing: 1. Preceeding anything by a backslash escapes it. 2. Preceeding any escapable character by a backslash escapes it. Preceeding anything else by a backslash gives a literal backslash (similar to Python's way of doing things). Note, however, that the stated reasons for Python doing it that way have to do with making it easier to see mistakes in your strings. My problem with (2) is that it taxes my brain to remember which characters I can put one backslash before, and which I have to put two backslashes before. But in any case, for both (1) and (2), "\\" translates to a single backslash. I believe (though I haven't yet had time to check) that "\\" does occur in docstrings. It certainly wouldn't be that uncommon if one wanted to talk about regexps, or to use regexps to talk about something else, for example. So now, we evaluate backslashing as an escaping solution, using the same criteria: 1. How ugly is it to do escaping? 2. How often do we need to do escaping? I believe that backslashing has a lower value for (1), but a higher value for (2). So how do we decide? Well, not it's objective, because we need to decide how much we care about each criteria, how *much* higher/lower we think backslashing scores, etc. But in my opinion, I'd rather use the {} solution.. But again, as I said, all this is moot if we're not using {} to do colorizing. > > I guess I'm trying to go on the principle of keeping the need to > > escape characters to a minimum, because whatever escaping mechanism we > > have, it'll be somewhat ugly/difficult to read. > > I think that's inevitable. Please prove me wrong! I don't know whether this was a coinvincing enough case, but I tried to show that escaping is needed *less* often with the E{} approach than with backslashing.. > However, although I'm sure an escape mechanism is needed, I'm also sure it > will only rarely be needed. I agree, which is why I want to be wary about the escape mechanism itself becoming the major reason for using the escape mechanism (backslashing backslashes, etc). -Edward From dgoodger@atsautomation.com Wed Apr 18 16:42:33 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Wed, 18 Apr 2001 11:42:33 -0400 Subject: [Doc-SIG] directives and fields Message-ID: [Edward D. Loper] > > > The only other difference would be that, under my scheme, > the contents > > > of a directive have to be properly formatted formatted text; where > > > under your scheme it seems like they can be anything. > > > > Not "anything", but directive-dependent. > > Yes. But from the parser's point of view, it can be anything, because > it doesn't know what extensions you'll be using. Some later stage > (after the parser) will put restrictions on it.. Not true. I'd like to clear up this concept of directives. It's completely different from your proposed field concept, though not necessarily incompatible. Directives are a parser-control mechanism and can be used as an extension mechanism. The reStructuredText directive proposal is similar to extension modules in Python. Inevitably, someone will want to add a feature or some behaviour to the reStructuredText parser which cannot be easily added through character-construct syntax, because: 1. There's no natural or obvious candidate characters or constructs for syntax. 2. We've run out of characters to use as syntax. 3. The new feature or behaviour is too narrowly application- or domain-dependent. 4. The new feature or behaviour cannot be added to the standard due to lack of consensus (basically the same as case 3). With one construct (regexp '^\.\. ', which comes from Setext) we have comments, internal hyperlink targets, external URL hyperlinks, footnotes, and directives. Directives were proposed as a mechanism for adding explicit syntax that the parser can recognize, triggering parser extension code. Say we add an 'SQL' extension to the parser, which performs a database query and inserts the results. The extension would consist of an entry in the directives dispatch table and support code to handle the query itself. This code would be run by the parser as it is parsing, not afterward. The semantics of the extension construct are up to the extension, but they could easily include the processing of properly formatted text. For example, we could add a set of admonition extensions:: .. warning:: Don't *ever* press the `Self-Destruct` button. If you do, you'll be sorry. The 'warning' extension would tell the parser to process its text block as usual, and simply wrap it in a new DOM object (hypothetically). The *emphasis* and `literals` would be processed as usual. Your field concept could be implemented using the '@' syntax as proposed, or using the extension mechanism. If it's important enough, *and* the syntax is natural enough, using the JavaDoc '@' syntax is no problem. The '@' syntax doesn't strike me as natural though. The cornerstone of the Setext/StructuredText-like approach is that the raw text should be as readable as possible, even to the uninitiated. To quote Jim Fulton's StructuredTextWiki, If you don't buy into this idea, you're probably wasting your time. I think that '@' and especially 'C<>' stray from this ideal. I don't think they belong in a Setext/StructuredText-like markup language. (Note: I'm not saying you're wasting your time, Edward; far from it, these discussions have been very helpful in many ways.) /DG From edloper@gradient.cis.upenn.edu Wed Apr 18 18:24:30 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 18 Apr 2001 13:24:30 EDT Subject: [Doc-SIG] Re: directives and fields In-Reply-To: Your message of "Wed, 18 Apr 2001 11:42:33 EDT." Message-ID: <200104181724.f3IHOUp28200@gradient.cis.upenn.edu> > > Yes. But from the parser's point of view, it can be anything, because > > it doesn't know what extensions you'll be using. Some later stage > > (after the parser) will put restrictions on it.. > > Not true. I'd like to clear up this concept of directives. It's completely > different from your proposed field concept, though not necessarily > incompatible. Well at least there should be rules in the "generic parser" that say when directives end, so that a parser can ignore a directive if it doesn't understand it. As I understood your original proposal, directives ended with blank lines. I think that they should end with a dedent back to the indent they started at, because then they can include blank lines.. And I think that it should be *possible* to handle directives in a second pass. I.e., I don't think we should have any directives that change the syntax of subsequent parts of the string, like:: This is *emph* .. switch-emph-and-literal This is *literal* Basically, it seems like you should be able to make a "generic" parser which outputs a DOM tree for the formatted docstring, with "directive" elements containing #CDATA (=character data, i.e., a string) like:: ... Then a specialized parser could run the generic parser, and then replace all the directive elements with some other elements.. > Inevitably, someone will want to add a feature or some behaviour to > the reStructuredText parser which cannot be easily added through > character-construct syntax, because: I think we should -try- to keep feature-adding to a minimum, because it tends to result in incompatibilities.. But that said, it does make sense to me to have a generic extention mechanism, as long as we keep in mind that we should be careful about not over-using it. Also, people adding new directives should keep in mind that "raw text should be as readable as possible." (or whatever variant of that we decide we like; see below). I saw fields as being an extension mechanism, but a *much* more constrained one than directives. I think it makes sense to put *some* constraints on directives (e.g., that they don't affect anything outside themselves). But maybe just using fields places too many constraints. > 1. There's no natural or obvious candidate characters or constructs > for syntax. > 2. We've run out of characters to use as syntax. > 3. The new feature or behaviour is too narrowly application- or > domain-dependent. The only domain I care about is formatted docstrings. Given, there are subdomains of formatted docstrings (some types of programs/programming style will make use of some features, others not). But I'm not sure that they vary enough that we want a nearly arbitrarily powerful extension mechanism.. As for running out of characters to use as syntax, that's one of the reasons I don't like *colorizing* `like this`... > With one construct (regexp '^\.\. ', which comes from Setext) we > have comments, internal hyperlink targets, external URL hyperlinks, > footnotes, and directives. Directives were proposed as a mechanism > for adding explicit syntax that the parser can recognize, triggering > parser extension code. I think that my target is a much more lightweight markup language than you're talking about.. or at least less powerful. I really don't see the need for most of those things in docstrings. > Say we add an 'SQL' extension to the parser, which performs a > database query and inserts the results. Wouldn't this totally violate making the docstring readable? And when would you ever want to use this when writing a docstring?? > .. warning:: > > Don't *ever* press the `Self-Destruct` button. > If you do, you'll be sorry. This could be implemented as a field. I think that external URL hyperlinks should be implemented with colorizing, if at all. I don't think that internal hyperlink targets make sense for docstrings. I don't think that comments are necessary for docstrings. If you really want, you can include a Python comment before or after the docstring. Alternatively, comments could be done via colorizing.. > Your field concept could be implemented using the '@' syntax as > proposed, or using the extension mechanism. If it's important > enough, *and* the syntax is natural enough, using the JavaDoc '@' > syntax is no problem. The '@' syntax doesn't strike me as natural > though. I agree that the "@" syntax isn't very natural (except for the extent to which it's natural simply because it's an established convention; similar to the way that "\" is a "natural" way to escape a character). I'd be just as happy writing fields like:: .. param size: The number of elements in the list. or:: .. parameters:: size: The number of elements in the list. Although that seems no less readable to me than "@". But I question whether we want/need something as powerful as directives... > The cornerstone of the Setext/StructuredText-like approach is that > the raw text should be as readable as possible, even to the > uninitiated. I don't see how directives win here. If anything, it seems like they will make it harder to read by the uninitiated, given the power of directives to use almost arbitrary syntax.. However, the idea that "raw text should be as readable as possible, even to the uninitiated" is a *goal* of mine, but not a cornerstone. Perhaps a cornerstone would be:: Raw text should be readable, even by the uninitiated. There are a lot of conflicting goals in designing a markup language, and making it as readable as possible is by no means my most fundamental goal. In the case of colorizing, I believe that colorizing should *never* be necessary to the understanding of a docstring.. i.e., you should be able to strip away all colorizing, and still understand what it says. I think that the uninitiated will be able to do that (and indeed I think it would be their first instinct). When I first read perldoc comments, I didn't know what the C<..>s meant, but I ignored them, and was able to read the comments with no trouble (well, the =.. directives were a bit confusing). I guess that perhaps what it comes down to is that I am *not* necessarily trying to design a Setext/StructuredText-like language. I'm trying to design a markup language that is optimal for writing Python docstrings. The problem with colorizing like *this* is that there are very few conventions about what such colorizing means. Indeed, I'd say that *emph*, _underline_, and "quoting" 'of' `some' `sort` are the only contentional ways of colorizing (well, maybe angle braces for ). And none of the quoting mechanisms have conventional "colors" associated with them. In my mind, the only advantage of using `quotes` over C{curly braces} is that quotes are easier to ignore.. In both cases, the uninitiated will (maybe) know that the region is "colorized" in some way, but not what way it's colorized in. -Edward From mwh21@cam.ac.uk Wed Apr 18 19:06:22 2001 From: mwh21@cam.ac.uk (Michael Hudson) Date: 18 Apr 2001 19:06:22 +0100 Subject: [Doc-SIG] Where to find the docs In-Reply-To: Ka-Ping Yee's message of "Sun, 15 Apr 2001 01:42:05 -0500 (CDT)" References: Message-ID: Since noone else has responded... Ka-Ping Yee writes: > Is there a standard place to look? http://www.python.org/doc/current ? Works for me, though those without permanent 'net connections may feel differently. More seriously: no. I haven't got built html docs anywhere on my system at the moment; if I did they'd be in /usr/local/src/python/dist/src/Doc/html/, which is hardly canonical. I doubt you can come up with sufficiently clever heuristics to get all cases - the one's you posted sounded reasonable. You could always fall back to the python.org URLs... Cheers, M. -- M-x psych[TAB][RETURN] -- try it From edloper@gradient.cis.upenn.edu Wed Apr 18 20:30:00 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Wed, 18 Apr 2001 15:30:00 EDT Subject: [Doc-SIG] Structuring: a summary; and an attempt at EBNF.. Message-ID: <200104181930.f3IJU0p10299@gradient.cis.upenn.edu> I figured I'd give a summary of all the structuring features that I think we've agreed on, so we can tentatively take those as a given. If anyone objects, please say so.. 1. Paragraphs are left-justified and separated by blank lines. 2. Literal blocks start with a paragraph that ends with "::" and continue to the next line whose indentation is equal to or less than that of the paragraph that started them. Literal blocks should be indented and separated by blank lines. 3. Doctest blocks start with ">>> " and continue to the next blank line. Doctest blocks should be indented and separated by blank lines. 4. Lists should be indented and separated by blank lines. List items within a list don't need to be separated by blank lines. List items start with bullets, which are either "-" or a single number followed by a period, like "1." or "12.". 5. The second and subsequent lines of a list item are indented. This includes list items with multiple paragraphs, sublists, etc. 6. Sections begin with headings, which are underlined with "=", "-", or "~" (for level 1, 2, or 3 headings, respectively). 7. Colorizing takes place entirely within paragraphs, and does not interact with structuring. In my mind, the major questions left to resolve are: 1. how to do colorizing? Two main proposals: like C{this} and like `this`/*this*. 2. how to do escaping? 3. do we need any other structuring constructs (e.g., fields, directives, footnotes, etc)? If so, which ones, and how should we add them? ===== Below is my first attempt at an EBNF-like formalism for these rules. You should probably pay more attention to the "one-minute summary" above than to the rules below -- I almost certainly didn't get the rules below quite right (although if you want to point out ways that I got it wrong, please do! :) ). IND and DED are indent and dedent (by a sinlge space); I use the notation IND[n] to mean n IND tokens. Note that the rule:: x = a IND[n] b DED[n] c is really just shorthand for:: x = a y c y = IND y DED | b However, I also use the foo[n] notation in one place where it can't be simplified. That's because in list items like: - this is a list item. Here's a second paragraph. there are crossing dependancies. In particular, the IND/DED need to match up, but assuming that we want "this is a list item" to be result of a "paragraph" production, they can't. Don't worry if you don't understand what I just said, I think it should still be relatively easy to understand the EBNF below. I assume that, as part of the preprocessing, all indents/dedents have been changed to IND/DED tokens. This process ignores blank lines, which are simply reduced to be empty. ================================================================ The top-level production:: # pytext = (BlankLine NL)* # IND[n] # (Para | List | Section | DocTestBlk) # ((COLON COLON NL LitBlk) | # (NL BlankLine NL (Para | List | Section | DocTestBlk)))* # DED[n] # (NL BlankLine)* (pytext is just a convenient name, we'll probably want another) This production assumes that the first-line-might-not-be-indented problem has already been taken care of. It says that a formatted docstring consists of any number of blank lines, followed by an indented section containing at least one paragraph, list, section, or doctest block, followed by zero or more literal blocks, paragraphs, lists, sections, or doctest blocks.. And there can be extra blank lines at the end. The productions "Para", "List", "Section", etc. generally do *not* include thier trailing NL, because that makes it easier to detect paragraphs that end with COLON COLON. Some useful types of lines are: - BlankLine: consists only of spaces. - TextLine: non-blank line. - StartLine: doesn't start with a Python prompt or a bullet - ContLine: anything - EndLine: doesn't end with "::"; doesn't include trailing spaces? - StartEndLine: doesn't start with PyPrompt or Bullet, and doesn't end with "::". We can define them as:: # BlankLine = (empty) # TextLine = [^ NL IND DED]+ # StartLine = (?! PyPrompt | Bullet) TextLine # EndLine = [^ NL IND DED]* [^ NL IND DED COLON] [^ NL IND DED] | # [^ NL IND DED]* [^ NL IND DED] [^ NL IND DED COLON] | # [^ NL IND DED] # StartEndLine = (?! PyPrompt | Bullet) EndLine As I said above, paragraphs don't include the trailing newline. Paragraphs ending in "::" don't include the "::".:: # SimplePara = StartLine (NL ContLine)* EndLine | # StartEndLine Lists are indented (n>1):: # List = IND[n] LI (BlankLine+ LI)* DED[n] We need special list-starting paragraphs. These don't include trailing newlines, either:: # LS_IndPara[n] = ContLine NL IND[n] ContLine (NL ContLine)* # LS_OneLinePara = EndLine There are 3 types of list item:: # LI = LI1 | LI2 | LI3 This production gives the contents of a list item, *after* its first paragraph:: # LI_Rest = ((COLON COLON NL LitBlk) | # (NL BlankLine NL (Para | List | DocTestBlk)))+ List Item, form 1: start with a one-line pagraph, then indentation, contents, and corresponding dedents. The indentation/contents/dedent is optional, so this also covers list items with just a one-line para (no indent):: # LI1 = Bullet LS_OneLinePara # (IND[n] # (BlankLine+ (Para | List | DocTestBlock | LitBlk))+ # DED[n])? List Item, form 2: start with a paragrpah containing indentation, then contents, then corresponding dedent:: # LI2 = Bullet IndPara[n] # (BlankLine+ (Para | List | DocTestBlock | LitBlk))+ # DED[n] List Item, form 3: this is used when the bullet's on a line by itself:: # LI3 = Bullet NL # (IND[n] # (BlankLine+ (Para | List | DocTestBlock | LitBlk))+ # DED[n])? Sections consist of a heading, followed by an indended section that can contain anything (i.e., epytext):: # Section = Heading NL epytext DocTestBlocks are terminated by blank lines. They must be indented:: # DocTestBlk = IND[n] PyPrompt (ContLine NL)+ DED[n] Literal blocks. Within the literal block, all indents/dedents must be matched:: # LitBlk = IND LitBlkContents DED # LitBlkContents = [^ IND DED]+ | IND LitBlkContents DED ================================================================ Anyway, I'm sure I didn't get that quite right, but it's a start, anyway. -Edward From tim.one@home.com Thu Apr 19 01:08:40 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 18 Apr 2001 20:08:40 -0400 Subject: [Doc-SIG] __all__, and how it relates to doc tools In-Reply-To: <200104151312.IAA08960@cj20424-a.reston1.va.home.com> Message-ID: [Edward] > Would it also be reasonable for a doc tool to look at this value, for > an indication of which objects to document? [Tim] > Absolutely. In fact, that's probably the best use. [Guido] > Hm. You may be right, but Ping told me that he had tried this in > pyoc, and was unhapy with the result: too much stuff didn't get > documented. So we should at least be willing to retract this idea. Well, every time you or I test pydoc under Windows, the first thing we do is type "random" at it. Because "_" appears early in the alphabet, the first four methods it displays are: _Random__whseed __getstate__ __init__ __setstate__ The first 8 functions: _acos _cos _exp _log _sin _sqrt _test _test_generator and then vrbls like _e, _inst and _pi. Almost none of that is of any interest to end users, while random.__all__ lists exactly what *is* interesting to users. However, random.__all__ is redundant, because random.py uses the underscore *convention* with care, and __all__ merely contains the names "import *" would import if __all__ didn't exist. Some old modules are much sloppier in their use of underscores, and Skip put a lot of work (when adding __all__ to them) into figuring out which names they *did* intend to export. pydoc can't do a better job of guessing *that* than Skip did by hand, and by ignoring both __all__ *and* the underscore conventions, pydoc shows too much irrelevant implementation detail. You eventually need an option to show "private" stuff too, but that's a poor default choice except for people working on a module's implementation. I'm happy to live with the underscore conventions alone to make the public-private distinction, but since history shows that few others are willing to live with that, something like __all__ does serve a purpose and should be respected. From ping@lfw.org Thu Apr 19 03:06:44 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 18 Apr 2001 21:06:44 -0500 (CDT) Subject: [Doc-SIG] __all__, and how it relates to doc tools In-Reply-To: Message-ID: > > [Edward] > > Would it also be reasonable for a doc tool to look at this value, for > > an indication of which objects to document? > > [Tim] > > Absolutely. In fact, that's probably the best use. > > [Guido] > > Hm. You may be right, but Ping told me that he had tried this in > > pyoc, and was unhapy with the result: too much stuff didn't get > > documented. So we should at least be willing to retract this idea. Tim Peters wrote: > Well, every time you or I test pydoc under Windows, the first thing we do is > type "random" at it. ...why, because "random" has those weird bound methods at the top-level that used to throw pydoc for a loop? :) > Because "_" appears early in the alphabet, the first > four methods it displays are: > > _Random__whseed > __getstate__ > __init__ > __setstate__ Well, you definitely want to know about __init__. I can see why you might not want to see private methods like _Random__whseed, though. As for __getstate__ and __setstate__, it's probably nice to know that they exist ("oh, it's possible to pickle this"). > and by ignoring both __all__ *and* the underscore > conventions, pydoc shows too much irrelevant implementation detail. I should note that pydoc *did* try both of those things already. In a previous incarnation, pydoc avoided top-level names beginning with _, but Guido was unhappy that it did this at the module level and not at the class level, so i changed it. In an even earlier incarnation, pydoc only displayed names listed in __all__, and so many things were missing from the output that it wasn't useful any more (e.g. errors in httplib, useful functions in cgi, constants like keyword.kwlist). Perhaps if the value of __all__ were different (or if it's changed in the past couple of weeks) it would be okay, but at the moment it just hides too much. -- ?!ng From pf@artcom-gmbh.de Thu Apr 19 08:11:33 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Thu, 19 Apr 2001 09:11:33 +0200 (MEST) Subject: Meta: EBNF notation (was Re: [Doc-SIG] Structuring: a summary; and an attempt at EBNF..) In-Reply-To: <200104181930.f3IJU0p10299@gradient.cis.upenn.edu> from "Edward D. Loper" at "Apr 18, 2001 3:30: 0 pm" Message-ID: Hi, Edward D. Loper: [...] > Below is my first attempt at an EBNF-like formalism for these rules. [...] > IND and DED are indent and dedent (by a sinlge space); I use=20 > the notation IND[n] to mean n IND tokens. Note that the rule:: [...] Why don't you simply use INDENT and DEDENT tokens, which may represent any arbitrary number of spaces as long as they match up? Don't forget: This is Python and anyone seriously interested in Python should be already familar with this concept from the Python Grammar file and will probably understand this at the first glance. This might help to get rid of your `[n]' meta notation. In EBNF the square brackets `[' and `]' are normally used as meta symbols to enclose optional terms (see below). So the notation you invented here irritates because it suggests that `IND[n]' is an `IND' token followed by an optional term `n' ;-). For your entertainment I like to quote a small passage from science report No.36 written by Niklaus Wirth, ETH Eidgen=F6ssische Technische Hochschule Z=FCrich, Institut f=FCr Informatik, introducing the programmi= ng language MODULA-2 in March 1980: """Notation for syntactic description ---------------------------------- To describe the syntax, an Extended Backus-Naur Formalism called EBNF is used. .. Each factor F is either a (terminal or non-terminal) symbol, or it is of the form [ E ] denoting the union of the set E and the empty senten= ce, or { E } denoting the union of the empty sequence and E, EE, EEE, ... = =2E Parentheses may be used for grouping terms and factors. .. EBNF is capable of describing its own syntax. We use it here as an example: syntax =3D { production } . production =3D NTSym "=3D" expression "." . expression =3D term {"|" term} . term =3D factor {factor} . factor =3D TSym | NTSym | "(" expression ")" | "[" expression "]" | "{" expression "}"=20 """ As a student I was very impressed by this short and precise description of the EBNF formalism. =20 The most common variations of this notation are to use `::=3D', `:=3D' or `<-' instead of `=3D' in productions or to use=20 `(' expression `)+' instead of the square brackets to mark optional terms or to use `(' expression `)*' instead of the curly braces to mark [0..n] repetition. For example the Python Grammar file uses=20 the asterisk notation for repetitions. IMO the {} notation as used by N.Wirth is easier to read. > Anyway, I'm sure I didn't get that quite right, but it's a > start, anyway. Yes. That's fine. I will try to have a deeper look into it later. Regards, Peter --=20 Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 422= 2950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germa= ny) From tim.one@home.com Thu Apr 19 08:41:53 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 19 Apr 2001 03:41:53 -0400 Subject: [Doc-SIG] __all__, and how it relates to doc tools In-Reply-To: Message-ID: [Tim] >> Well, every time you or I test pydoc under Windows, the first >> thing we do is type "random" at it. [Ping] > ...why, because "random" has those weird bound methods at the > top-level that used to throw pydoc for a loop? :) Na, it's that when pydoc was busted completely on Windows, I shouted out to Guido "hey, bring up the pydoc GUI and search for a module". "Which module?" "Doesn't matter -- pick one at random." "OK, I pick random." Now it's a ritual . > ... > I should note that pydoc *did* try both of those things already. > In a previous incarnation, pydoc avoided top-level names beginning > with _, but Guido was unhappy that it did this at the module level > and not at the class level, so i changed it. Changed it to what? To avoid them at both levels, or to avoid them at neither? I expect he intended the former, not the latter. Names that both begin and end with (at least) two underscores don't count as "beginning with '_'" for this purpose, though (as you said but I snipped, things like __init__ and __getstate__ are potentially interesting to end users). > In an even earlier incarnation, pydoc only displayed names listed > in __all__, and so many things were missing from the output that > it wasn't useful any more (e.g. errors in httplib, useful functions > in cgi, constants like keyword.kwlist). Perhaps if the value of > __all__ were different (or if it's changed in the past couple of > weeks) it would be okay, but at the moment it just hides too much. __all__ is supposed to list all and only the "public" names in the module. When it doesn't, that's a bug to be fixed in the module. I agree there are lots of bugs. In the meantime, it would be better to suppress names that pass name[:1] == "_" and not name[:2] == "__" == name[-2:] That would, e.g., expose httplib's error classes, but suppress its internal state-machine constants (_CS_IDLE etc) and non-user-callable methods (like HTTPConnection._set_hostport). Longer term, we should fix __all__ or get rid of it; the former is better, because the latter leaves us documenting accidental exports (like httplib.mimetools) forever; but the former is also real work. From ping@lfw.org Thu Apr 19 09:36:46 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 19 Apr 2001 03:36:46 -0500 (CDT) Subject: [Doc-SIG] __all__, and how it relates to doc tools In-Reply-To: Message-ID: On Thu, 19 Apr 2001, Tim Peters wrote: > > In a previous incarnation, pydoc avoided top-level names beginning > > with _, but Guido was unhappy that it did this at the module level > > and not at the class level, so i changed it. > > Changed it to what? To avoid them at both levels, or to avoid them at > neither? Neither, as you can see now. I didn't think we had the time to debate the starts-with-one-underscore-but-not-two rule then. > name[:1] == "_" and not name[:2] == "__" == name[-2:] Yup, looks like a good rule to me. -- ?!ng From edloper@gradient.cis.upenn.edu Thu Apr 19 09:15:30 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 19 Apr 2001 04:15:30 EDT Subject: Meta: EBNF notation (was Re: [Doc-SIG] Structuring: a summary; and an attempt at EBNF..) In-Reply-To: Your message of "Thu, 19 Apr 2001 09:11:33 +0200." Message-ID: <200104190815.f3J8FUp19153@gradient.cis.upenn.edu> > Why don't you simply use INDENT and DEDENT tokens, which may > represent any arbitrary number of spaces as long as they match up? > Don't forget: This is Python and anyone seriously interested in > Python should be already familar with this concept from the Python > Grammar file and will probably understand this at the first glance. Because these assume that there is no single indent that corresponds to multiple dedents. Which is true in Python, but not necessarily in the markup language we're talking about. In particular, consider:: - This is a list item. - This is a sublist item. This is another paragraph in the main list item. According to python's rules for generating INDENT and DEDENT tokens, the dedent before "this is another..." would be illegal because it doesn't line up with anything. But according to my EBNF (assuming that I got it right), it comes out correctly:: IND IND - this is a list item IND IND - this is a sublist item. DED DED - This is another paragraph in the main list item. DED DED Also, I should apologize for being very fast and loose with notation. I'll clean that up before I make anything formal (e.g., before putting anything in a PEP). There are indeed several variations on EBNF. The basic one I was using uses the kleene star (x*) to mean 0 or more repetitions of x, and the kleene cross (x+) to mean 1 or more repetitions of x; I think I may have also used x? to mean 0 or 1 x's.. Basically the productions I wrote should read roughly as regexps (with the VERBOSE flag). I agree that x[n] isn't the best choice of notation, especially given that I think I may have used things like "[^ NL S]" to mean "any character that's not a newline or a space.. Perhaps x? One thing to note here is that the language I'm using is strictly more powerful than EBNF. The reason, as I said before, is because I have crossing dependancies. It would be possible to express the same *string* language without crossing dependancies, but only if we allow the first paragraph of a list item to be split across two different nonterminals. Also, incidentally, I used "(?! ..)", too, which is also strictly more powerful than EBNFs (it's not context free; you can generate a^n b^n c^n with it)... But I used it just as a matter of convenience -- everything I wrote with it could be re-written without it. -Edward From hernan@orgmf.com.ar Thu Apr 19 10:24:10 2001 From: hernan@orgmf.com.ar (Hernan Martinez Foffani) Date: Thu, 19 Apr 2001 11:24:10 +0200 Subject: [Doc-SIG] got a Mac and 20 minutes? In-Reply-To: <200104190815.f3J8FUp19153@gradient.cis.upenn.edu> Message-ID: If you found this a bit off topic, please apologize (me, not you :-) and just ignore it. If anybody around there got a Mac with Internet Explorer (version 4.x or 5.x) can I ask you to download the "Python Shelf" at http://www.orgmf.com.ar/condor/pytstuff.html (it's a zip file that's almost 5MB) and see if the Microsoft HTML Help files (the ones with extension .chm) work? I didn't found any official reference that it should work. Apparently the format is platform independent (but coming from Microsoft...) In case it does work, any suggestion about installing those files on a Mac are welcome. (mmm... why i'm feeling pessimistic?) Thanks in advance, -Hernán -- Hernán Martínez Foffani hernan@orgmf.com.ar http://www.orgmf.com.ar/condor/ From tony@lsl.co.uk Thu Apr 19 11:27:27 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 19 Apr 2001 11:27:27 +0100 Subject: [Doc-SIG] Ho hum - back to work... In-Reply-To: <200104181930.f3IJU0p10299@gradient.cis.upenn.edu> Message-ID: <004b01c0c8bb$54ad0a80$f05aa8c0@lslp7o.int.lsl.co.uk> Well, I recovered from my flu (eventually) and am now back to "normal". One of the interesting side-effects of the flu, though, was its ability to purge the mind. I'm afraid I've come out of the illness with much less interest in the Doc-SIG than I went in with - it's very difficult to see, from the standpoint of now, why I was insane enough to devote so much time to something that, perhaps, not so many people really care about, when I could instead have been reading, ironing, making my Debian system work, talking to Joan - oh, all sorts of things. This means I am unlikely to be as active as I was, particularly since I'm expecting to be quite busy with some interesting things at work as well. It's also why I've been refraining from comment on the "structure" discussion - I just don't have the time at the moment to spend an hour in the morning on Doc-SIG. Anyway, to the point. I'm taking tomorrow (and maybe a day next week) off to do *some* work for the effort. It's a bit short notice to ask this, but given all the work that Edward and David are doing (I don't necessarily *agree* with them, but that's another matter), I figured I'd seek an opinion on how my time might best be spent. There are two main options: 1. My original promise - get a version of docutils/fat.py working as a testbed. It would come with lost of command line switches to try out various ideas, and would try to incorporate some of Edward's structuring options (although note that *I* am not going to code *anything* that supports C{something} or C markup, as I consider this an abomination). It would be suitable for running over the standard library to see how well *that* renders when passed through a markup engine (this seems like a very important point to me!). 2. Work on the Doc-SIG archives, to try to produce summaries of the arguments from its lifetime. Note that (technically) we may need this for any PEPs we produce! (and it would clearly be useful to be able to *point* to who said what and why, given the history of the group). Option 1 (a) probably needs doing anyway, and (b) fat.py is probably likely to be the only tool that supports multiple ways of doing things, to allow users to *compare* them (which seems valuable to me). Option 2 is actually more tempting (I've done this sort of thing before, and it's a lot of work, but can be very worthwhile). I think this *needs* doing at some point - we don't want to lose useful wisdom from the past. Two separate questions, as well (if answering these, please start a separate thread for each?) A. Content markup pedagogy. I still don't understand why Edward (and Guido, although I think he's less likely to answer!) object to "simple" markup like ST and relatives use - why they consider it a Bad Thing to (a) use punctuation characters for markup, and (b) use them in a context dependent manner. The last, in particular, bugs me, as I *really* don't understand what the problem is (after all, I *read* text in a context dependent manner). An explanation of the object, in simple terms, would be a nice thing to have for me, and might be useful pedadgogy in the eventual PEP discussions. (As a subpoint, I don't *quite* understand why Edward wants to separate structuring and colourising so much - this seems to me to be implementation detail (for this purpose, I consider the EBNF to be "implementation" as well) - real people don't have trouble with fuzzy distinctions about such things.) B. Reasons to be doing this The Types SIG defines several different (possible) reasons for wanting to produce type annotation, etc. I think it might be useful to produce similar distinctions for Doc-SIG. So here is a tentative list of *why* we might do this work: NOT --- We might *not* do this work because we think that informal plain text, with pydoc *guessing* what to link to what, is sufficient. This is a not entirely unreasonable point, as pydoc does a reasonably decent job (I've been looking at the HTML it produces, and why it's too small to read, which is why I didn't say "excellent job"!) of presenting the plain text from doc strings. DOC --- I personally want to be able to markup the text to get across more meaning (e.g., I *do* want emphasis, but I also want to be able to annotate an argument list as such, and indicate what is literal text, etc.). This is tool independent. It is an advantage to standardise on one form of markup, even for DOC, because that makes it easier for other people to read my marked up text. REP --- It is nice to be able to present a DOC string with a little more intelligence than is possible if it is treated as just plain text. The main thing I want here is actually distinction of literal text (be it inline or not) from "plain" text. Given I like to have emphasis, it would be nice if that is recognised as well. Note Eddy's point that we are *not* after "professional" quality of presentation here - just something easier on the eye than plain text. STRUC --- One might imagine that there are uses for marked up text, since one could extract information from it. This relies on use of "Arguments:" and other tags, as well as (perhaps) using hints like `#..#` to indicate what one *does* want links generated for. Of course, it is only if the markup scheme is widely adopted (and used consistently) that one gets much benefit from this. Have I missed any options? To me, DOC is the most important, with REP following. I'm not sure I actually believe that we're going to get a lot from STRUC (*except* making it easier to guess that I *didn't* want this "London" to refer to a class, but instead just meant it as plain text). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From hernan@orgmf.com.ar Thu Apr 19 12:08:23 2001 From: hernan@orgmf.com.ar (Hernan Martinez Foffani) Date: Thu, 19 Apr 2001 13:08:23 +0200 Subject: [Doc-SIG] pydoc small letters. was: Ho hum - back to work... In-Reply-To: <004b01c0c8bb$54ad0a80$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: >de Tony J Ibbs (Tibs): > >..., as pydoc does a >reasonably decent job >(I've been looking at the HTML it produces, and why it's too small to >read, which is why I didn't say "excellent job"!) of presenting the >plain text from doc strings. > It used to be "more" difficult to increase the font size on Ping's pydoc HTML output. (By "more" I mean that you have to look for the tag around the code.) In the pydoc.py that's included in 2.1 it's only a one line change that's logically located: the "small()" function. On line 382 of Lib/pydoc.py change: def small(self, text): return '%s' % text by def small(self, text): return text ...and you'll see the difference. :-) Regards, -Hernán -- Hernán Martínez Foffani hernan@orgmf.com.ar http://www.orgmf.com.ar/condor/ From hernan@orgmf.com.ar Thu Apr 19 12:28:41 2001 From: hernan@orgmf.com.ar (Hernan Martinez Foffani) Date: Thu, 19 Apr 2001 13:28:41 +0200 Subject: [Doc-SIG] RE: pydoc small letters. was: Ho hum - back to work... In-Reply-To: Message-ID: I said: >... >the "small()" function. > It's a method obviously... -H. From tony@lsl.co.uk Thu Apr 19 13:17:29 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 19 Apr 2001 13:17:29 +0100 Subject: [Doc-SIG] pydoc small letters. was: Ho hum - back to work... In-Reply-To: Message-ID: <004c01c0c8ca$b3bf1f90$f05aa8c0@lslp7o.int.lsl.co.uk> Hernan Martinez Foffani wrote: > It used to be "more" difficult to increase the font size on Ping's > pydoc HTML output. (By "more" I mean that you have to look for the > tag around the code.) It was the use of tags that I disliked - they're a Bad Idea! > In the pydoc.py that's included in 2.1 it's only a one line change > that's logically located: > the "small()" method Ah - thanks. I haven't got that version yet (still using 1.5.2 Python, and haven't updated pydoc for a little while). One day I'll "officially" grumble that using is Bad, and should not be the default (but only when I've worked out why he wanted it, and what one can do to alleviate the "problem" that was trying to be solved!). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From hernan@orgmf.com.ar Thu Apr 19 14:00:45 2001 From: hernan@orgmf.com.ar (Hernan Martinez Foffani) Date: Thu, 19 Apr 2001 15:00:45 +0200 Subject: [Doc-SIG] pydoc small letters. was: Ho hum - back to work... In-Reply-To: <004c01c0c8ca$b3bf1f90$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: >de Tony J Ibbs (Tibs) > >I haven't got that version yet (still using 1.5.2 Python, and haven't >updated pydoc for a little while). One day I'll "officially" grumble >that using is Bad, and should not be the default >(but only when >I've worked out why he wanted it, and what one can do to >alleviate the >"problem" that was trying to be solved!). > Taken from a pydoc's comment: # Note: this module is designed to deploy instantly and run under any # version of Python from 1.5 and up. That's why it's a single file and # some 2.0 features (like string methods) are conspicuously absent. So it seems that you can download the 2.1 version from CVS and use it to browse your 1.5.2 Python. Since I'm leaving my office now, I'm emailing you the "unrequested" pydoc.py version 2.1 file. Regards, -Hernan From dgoodger@atsautomation.com Thu Apr 19 14:41:38 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Thu, 19 Apr 2001 09:41:38 -0400 Subject: [Doc-SIG] Structuring: a summary; and an attempt at EBNF.. Message-ID: [Edward D. Loper] > I figured I'd give a summary of all the structuring features that > I think we've agreed on, so we can tentatively take those as > a given. If anyone objects, please say so.. OK, I'll bite! > 3. Doctest blocks start with ">>> " and continue to the next blank > line. Doctest blocks should be indented and separated by blank > lines. Do Doctest blocks have to be preceeded by "::"? I.E, are Doctest blocks simply a special case of literal blocks, or are they detected by indentation & ">>> " alone? > 4. Lists should be indented and separated by blank lines. Why should lists be indented? What's wrong with - a list - like this? No indentation is necessary. I suggest that if there *is* indentation, an alternate interpretation is possible. > 7. Colorizing takes place entirely within paragraphs, and does not > interact with structuring. (As an aside: where does this term "colourizing" come from? It was first used on Doc-SIG by Tibs last November. I've otherwise never seen it used in this sense wrt markup. I have seen it used in the sense of syntax colouring (i.e. IDEs changing the colour of text in code). I believe the correct term here would be something like "inline markup" or "mixed markup".) /DG From tony@lsl.co.uk Thu Apr 19 15:12:12 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Thu, 19 Apr 2001 15:12:12 +0100 Subject: [Doc-SIG] Structuring: a summary; and an attempt at EBNF.. In-Reply-To: Message-ID: <005401c0c8da$ba5963a0$f05aa8c0@lslp7o.int.lsl.co.uk> Goodger, David wrote: > (As an aside: where does this term "colourizing" come from? I think it may have been mind-pollution from my initial looking at STNG code (it's probably the only thing that remained! - certainly none of the algorithms). If it *does* come from ST, it *may* have been around in the Doc-SIG for a while (I think the initial few messages of Doc-SIG went something like: 1. I'm here - who else is? 2. What are we doing? 3. Here's setext 4. Here's StructuredText (that was Jim Fulton) 5. Yuck - why not use (I think it was TeX) - that one was me... There were doubtless a few other messages mixed in there, too...) I use the term interchangably with "markup", and although the latter is probably more standard a term, I quite like it (as to whether one is "colourising" in the IDE sense, well, that would be one use of the resultant information). I'm sure there's probably some half-assed pun in the back of things, but I can't see it for the nonce. I probably hadn't come across its use in that manner before, either (although colour analogies in data structures are not new things). Edward has certainly tended to use "inline markup" when he's being formal, I believe. (of course, it's also nice to use a word whose spelling is unlikely to be agreed on - but that's an incidental benefit...) [nb: my personal vote is that *obviously* doctest blocks don't need a "::" in front of them. Their detection should be *identical* to the means used by doctest.py - otherwise people really *will* get confused... - hmm, of course, that actually doesn't work already, as doctest.py will happily "see" a ">>>" inside a literal block. Ho hum.] Whilst I'm here... > 2. Literal blocks start with a paragraph that ends with "::" Pedantry - they start with the first non-blank line *after* the "::" paragraph, *if* it is indented more than that paragraph (and presumably in Edward's terms, a relatively unindented paragraph after a "::" paragraph would be an error - unless he wants to allow indentation 0). So:: This here:: Is clearly OK but what about:: This here:: Is this literal? and:: Some text. This here:: Is this literal? In the first case, we're OK. In the second, it's either non-literal (and for Edward an error?), or literal with indentation 0. In the third, it's clearly non-literal - but does Edward want an error or not? > and continue to the next line whose indentation is equal > to or less than that of the paragraph that started them. Surely that should (for a start) be "next non-blank line" (and possibly even "next non-blank line following a blank line", for pedantry). And terms like "the paragraph that started them" is why I like terms like "parent paragraph" - it's a lot easier to work with. > Literal blocks should be indented and separated by blank lines. So that answers the "indentation by 0" question. But they can't be separated by blank lines, 'cos those are part of the literal block (this is *quite* important - as is preservation of the correct *number* of (internal) blank lines). Damn - I was trying not to get involved... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From dgoodger@atsautomation.com Thu Apr 19 16:22:52 2001 From: dgoodger@atsautomation.com (Goodger, David) Date: Thu, 19 Apr 2001 11:22:52 -0400 Subject: [Doc-SIG] RE: directives and fields Message-ID: [Edward D. Loper] > Well at least there should be rules in the "generic parser" that say > when directives end, so that a parser can ignore a directive if it > doesn't understand it. As I understood your original proposal, > directives ended with blank lines. I think that they should end with > a dedent back to the indent they started at, because then they can > include blank lines.. >From the reStructuredText spec, first draft: """ A comment/directive block is a text block: - whose first line begins with '.. ' in column 1, - whose second and subsequent lines are indented relative to the first, and - which ends with a blank or unindented line. ... Actions taken in response to directives and the interpretation of data in the directive block or subsequent text block(s) are directive- and implementation-dependent. """ I would only change the third list item to 'which ends with an unindented line'. > And I think that it should be *possible* to handle directives in a > second pass. Sure, if that's what the extension wants to do. The extension itself is called during parsing. If it's tied to a post-parse process, that's its own business. There are essentially two types of directives: extensions, which apply to their blocks only; and plugins, which may change the behaviour of the parser for some defined part of the input (may be for the adjacent text block, may be globally). Justification for plugins: it would be useful to modify the parser's behaviour on the fly, without having to subclass. For example, a 'fields' plugin could add support for the '@' syntax, allowing experimentation & testing. Kind of like the 'from __future__ import' hack. ;-> > I.e., I don't think we should have any directives that > change the syntax of subsequent parts of the string, like:: > > This is *emph* > > .. switch-emph-and-literal > > This is *literal* Of course, no such directive would be part of the standard package. Only a lunatic would play games like this. But it would be a great way for people to play with alternate syntax. > Basically, it seems like you should be able to make a "generic" parser > which outputs a DOM tree for the formatted docstring, with "directive" > elements containing #CDATA (=character data, i.e., a string) like:: > > ... > > Then a specialized parser could run the generic parser, and then > replace all the directive elements with some other elements.. If the extension/directive wants to do this, fine. But what if it just wants to wrap the normal behaviour of the parser with a new tag? > The only domain I care about is formatted docstrings. That's a big enough domain with enough controversy to make the feature necessary. See the archives. See this discussion! :-) It's been going on for years, you know. > As for running out of characters to use as syntax, that's one of the > reasons I don't like *colorizing* `like this`... Then implement a POD-like language or a JavaDoc-like language or whatever. This is clearly the dividing line: do you "buy in" to the Setext/StructuredText concept or not? > I think that my target is a much more lightweight markup language than > you're talking about.. or at least less powerful. I really don't see > the need for most of those things in docstrings. Again, read through the archives. Everyone has different opinions, everyone wants different levels of control. If you don't want to use a particular feature, don't. But someone else does. Please don't limit *me*. It is my opinion that incomplete, minimal markup schemes are doomed to failure, because *your* minimal set of features doesn't match *my* set or *anybody else's*. At least at the discussion level. ;-) > > Say we add an 'SQL' extension to the parser, which performs a > > database query and inserts the results. > > Wouldn't this totally violate making the docstring readable? And when > would you ever want to use this when writing a docstring?? Just an example, not a serious proposal. C'mon, lighten up! > > .. warning:: > > > > Don't *ever* press the `Self-Destruct` button. > > If you do, you'll be sorry. > > This could be implemented as a field. Then fields can't be restricted to the ends of docstrings -- I want a warning in the middle! And what do fields *do*? Seems to me they're simply descriptive, not functional. Maybe they are all we need, but please come up with a more complete description! > I think that external URL > hyperlinks should be implemented with colorizing, if at all. They're definitely required. I used readability as the overriding criterion in making that decision. Which is more readable? 1. A hyperlink in StructuredText, inline:: I love using the "Python":http//www.python.org programming language! (The URL has to be stuck next to the reference, whether it flows or not. The raw text looks very different from the processed!) 2. A hyperlink in reStructuredText (based on the Setext style), indirect:: I love using the Python_ programming language! (Note that the URL can be anywhere: next to the reference, at the end of the section, or at the end of the document. And the URL can be referred to multiple times: Python_.) .. _Python: http://www.python.org > I don't > think that internal hyperlink targets make sense for docstrings. This comes back to the semantics or usage of docstrings, something that I'm trying to avoid. How long can a docstring be? > I don't think that comments are necessary for docstrings. If you really > want, you can include a Python comment before or after the docstring. Comments are a freebie from the '.. ' syntax. Not necessary, but useful. > Alternatively, comments could be done via colorizing.. Please, no. > > The cornerstone of the Setext/StructuredText-like approach is that > > the raw text should be as readable as possible, even to the > > uninitiated. > > I don't see how directives win here. > > If anything, it seems like they > will make it harder to read by the uninitiated, given the power of > directives to use almost arbitrary syntax.. You seem to think that typing '.. some-directive::' will magically make something happen. Not so. You'd have to first *implement* the directive, not a trivial task. I was referring to '@' and (especially) 'X<>', about the readability cornerstone. OTOH, directives are readable by way of being explicit. If we want a digibloofer construct, we say '.. digibloofer::' (having paid the price for such impertinence by implementing the digibloofer-parsing extension first, of course ;-). > However, the idea that "raw text should be as readable as possible, > even to the uninitiated" is a *goal* of mine, but not a cornerstone. > Perhaps a cornerstone would be:: > > Raw text should be readable, even by the uninitiated. I don't see the distinction. > There are a lot of conflicting goals in designing a markup language, > and making it as readable as possible is by no means my most > fundamental goal. I'd say, for the Setext/StructuredText approach, it *is* the most fundamental goal. If it's not yours, you'll save yourself a lot of grief by using XML or TeX. > In the case of colorizing, I believe that > colorizing should *never* be necessary to the understanding of a > docstring.. i.e., you should be able to strip away all colorizing, and > still understand what it says. In the Setext/StructuredText approach, you shouldn't *have* to strip away anything. It should just be obvious, or at least unobtrusive. > I guess that perhaps what it comes down to is that I am *not* > necessarily trying to design a Setext/StructuredText-like language. Aha! :-) > I'm trying to design a markup language that is optimal for writing > Python docstrings. A noble goal. Please use a different name for what you're doing and let's be done with it. Lots of room for competition (the field's wide open right now! ;-). The more the merrier. > In my mind, the only advantage of using > `quotes` over C{curly braces} is that quotes are easier to ignore.. Precisely. Also, `quotes` have the connotation of, well, quoting. ... And a vigorous debate was had by all. Me and Edward, anyway. Thank you, sir. /DG From edloper@gradient.cis.upenn.edu Fri Apr 20 04:06:15 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 19 Apr 2001 23:06:15 EDT Subject: [Doc-SIG] Re: Ho hum - back to work... In-Reply-To: Your message of "Thu, 19 Apr 2001 11:27:27 BST." <004b01c0c8bb$54ad0a80$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200104200306.f3K36Fp19203@gradient.cis.upenn.edu> > Well, I recovered from my flu (eventually) and am now back to > "normal". That's good to hear. I was beginning to worry that you didn't like us anymore. :) > Anyway, to the point. I'm taking tomorrow (and maybe a day next > week) off to do *some* work for the effort. It's a bit short notice > to ask this, but given all the work that Edward and David are doing > (I don't necessarily *agree* with them, but that's another matter), > I figured I'd seek an opinion on how my time might best be spent. It does seem like it would be nice to have a parser with which we can try a number of different rules.. And since you've already spent a fair amount of time on that, that seems like a reasonable thing to work on. > 2. Work on the Doc-SIG archives, to try to produce summaries of the > arguments from its lifetime. Note that (technically) we may need > this for any PEPs we produce! (and it would clearly be useful to be > able to *point* to who said what and why, given the history of the > group). I tried to do this a few weeks back, (including copius pointers to individual articles), but gave up because I don't have *that* much free time. :) But it would be *really* useful to have, I think, and it you're more familiar with the archives, then maybe it wouldn't take as long.. At least getting a start on it would be nice. Overall, I'd say to work on docutils/fat.py, but mainly because you've already invested a fair amount of work in it. Maybe we can convince someone else to do the doc-sig summary stuff? :) > I still don't understand why Edward (and Guido, although I think > he's less likely to answer!) object to "simple" markup like ST and > relatives use - why they consider it a Bad Thing to (a) use > punctuation characters for markup, and (b) use them in a context > dependent manner. I actually don't object to either (a) or (b), strictly speaking. What I object to is markup that I think will be "unsafe." For example, I have no problem with using *one* *word* *emph*, or saying that backticks around any valid Python identifier mark it as a Python object. My biggest pet peve about ST-like markup is having a markup be context-dependant, with a basically unbounded context.. For example, if "*" starts an emph region only if there's another "*" later in the string somewhere; and otherwise is an asterisk. This seems very dangerous to me. I want to be able to tell (under most circumstances), by looking at a character and its immediate context, whether it's markup or not.. So, as long as we keep our contexts relatively small, I don't object to context-dependant markup. (In fact, both bullets and "::" are definitely context-sensitive markup, and I think they're very intuitive.) As for using punctuation characters, that's fine (what else would you use??), but if possible we should try to keep the need for escaping to a minimum, because escaping will be ugly and non-intuitive, no matter how we do it. So we should try to keep the number of punctuation characters we use to a minimum. > (As a subpoint, I don't *quite* understand why Edward wants to > separate structuring and colourising so much - this seems to me to > be implementation detail (for this purpose, I consider the EBNF to > be "implementation" as well) - real people don't have trouble with > fuzzy distinctions about such things.) There are really 2 reasons: 1. A general divide-and-conquor approach to the problem of coming up with a markup language. I'm more confident that we'll be able to come to consensus on smaller issues/domains than larger ones. This reason has nothing to do with the final markup language, and everything to do with how we get there. 2. A side-effect of dividing structuring and colorizing is eliminating a number of issues, such as how to tell whether a line in a paragraph starting with "1." is a bullet or a continuation of the previous line. 3. I think that the markup language will be easier to understand if colorizing and structuring don't interact much. My original reasons were (1) and (3). (2) was something that happily fell out. > B. Reasons to be doing this [Summarized:] - NOT: we don't need to invent a markup language - DOC: we want to be more expressive in our docstrings - REP: we want to be smarter about displaying docstrings - STRUC: we want to be able to do smart things with our docstrings (other than displaying them). I would say that for me, your REP and DOC would be my most important reasons for this work, probably in that order. The reason that I put REP above DOC is because I think that the need for standardization is much less for DOC than it is for REP. > I'm not sure I actually believe that we're going to get a lot from > STRUC One thing we get is the ability to check for certain completeness criteria in our documentation.. e.g., did I specify a return value/type for everything that returns something? did I describe every parameter? -Edward From edloper@gradient.cis.upenn.edu Fri Apr 20 04:25:20 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Thu, 19 Apr 2001 23:25:20 EDT Subject: [Doc-SIG] Structuring: a summary; and an attempt at EBNF.. In-Reply-To: Your message of "Thu, 19 Apr 2001 09:41:38 EDT." Message-ID: <200104200325.f3K3PKp21027@gradient.cis.upenn.edu> > Do Doctest blocks have to be preceeded by "::"? I.E, are Doctest > blocks simply a special case of literal blocks, or are they detected > by indentation & ">>> " alone? I would say that they're detected by indentation and ">>>" alone (and should be separated by leading and trailing blank lines). That's to be consistant with the doctest module. We could also say that they have to appear in literal blocks, but then I'd want the parser to generate a warning whenever it sees a paragraph that starts with ">>>". > Why should lists be indented? What's wrong with > > - a list > - like this? In theory, *either* indentation *or* separation by a blank line would suffice. In fact, right now my parser would accept your example with a warning. I think that we should enforce both because it makes the docstrings easier to read, and is more consistant. The one problem with this is if you want to include a list directly after a literal block. There's no way to do it if lists are required to be indented. Maybe we could allow it in that case. :) > No indentation is necessary. I suggest that if there *is* > indentation, an alternate interpretation is possible. When I read them, *I* don't interpret them differently (as an uninitiated reader). So I don't think we should be encoding any semantic content in the difference, if we *do* allow unindented lists. Doing so seems to me to go against the principle of making sure that the uninitiated can understand the docstring.. Well, actually, there is one obvious interpretation: actually indent the list in the output. But I don't think that we should be giving people that much control over the output. If they *need* that much control, they should be using LaTeX or something like it.. I like to *think* that the markup language we're talking about is mainly a semantic one.. > As an aside: where does this term "colourizing" come from? I picked it up here on doc-sig (or maybe on some page that I was pointed to from here). I'll try to remember to talk about "inline markup" or "local markup" or "intraparagraph markup" or some such when writing up a PEP (if we ever get there...). From edloper@gradient.cis.upenn.edu Fri Apr 20 05:17:09 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 20 Apr 2001 00:17:09 EDT Subject: [Doc-SIG] Re: directives and fields In-Reply-To: Your message of "Thu, 19 Apr 2001 11:22:52 EDT." Message-ID: <200104200417.f3K4H9p25988@gradient.cis.upenn.edu> [These responses are a bit out-of-order.. They're in the order that I felt like responding to them in.] [David said:] > Please use a different name for what you're doing and let's be done > with it. Lots of room for competition (the field's wide open right > now! ;-). The more the merrier. I don't think I've used any name related to ST to refer to the markup language I'm talking about for two or three weeks (ever since we decided that we didn't need to try to maintain compatibility with ST). If you like, we can call "my" language "epytext," because that's what I called the parser module I've been writing (edloper's version of pytext). But really, I'm not just trying to design a markup language that *I* like. If that were my goal, I'd write a parser and be done with it. My goal is to produce a markup language for docstrings that the Python community can embrace as a whole. You may say that it's not going to happen, but at least 2 languages (Perl and Java) have managed it, and I don't see why we can't come up with a good standard ML for Python. Of course, not everyone would use the same features of the markup langauge as everyone else. Some people might use emph inline markup, some might not; some might use fields, and some might not. But they would all be using the same markup language.. Just like I can write in LaTeX and decide not to use \emph{}. And as a result, people can write tools for the markup language. >>> "raw text should be as readable as possible, even to the >>> uninitiated" > > I'd say, for the Setext/StructuredText approach, it *is* the most > fundamental goal. If it's not yours, you'll save yourself a lot of > grief by using XML or TeX. In designing a good markup language for docstrings, I think we really need to balance a number of goals. XML and TeX do well with some goals (e.g., they're formal, and XML is simple).. But not so well with other goals (they're not very easy to write, and not easy for the uninitiated to read). I think it's dangerous to concentrate too much on any one goal. > Then implement a POD-like language or a JavaDoc-like language or > whatever. This is clearly the dividing line: do you "buy in" to the > Setext/StructuredText concept or not? Do I have to "buy in" to all of it? For example, to things like saying that "*" is an asterisk if it appears once in a paragraph, but an emph delimiter if it appears twice? I appreciate many of the features of ST-like languages. I think that there's great potential for clean/simple structuring, using them. I think there's good potential for simple colorizing, as long as we restrict it so that it's "safe." I think that we could potentially use one of those without using the other. > Again, read through the archives. Everyone has different opinions, > everyone wants different levels of control. If you don't want to use > a particular feature, don't. But someone else does. Please don't > limit *me*. But constructing a standard embraced by the community is really all about limiting *you* (the user). Without limitations on the user, we can't write compatible tools. One option, if you like it, is to say that any paragraph starting with ".. " will generate an error unless it starts a directive that a parser knows about... And anyone who uses directives should know that they are making their docstrings less standard and less portable across tools.. And, perhaps, "standard" directives can be added to the language as time goes on, which will *not* result in less standard/portable docstrings. > That's a big enough domain with enough controversy to make the > feature necessary. See the archives. See this discussion! :-) It's > been going on for years, you know. I know it has. I thought maybe we could end it. :) But if you manage to convince me otherwise, I guess I *will* go off and write my own parser/docstring tools. ;) > > I think that external URL > > hyperlinks should be implemented with colorizing, if at all. > > They're definitely required. I used readability as the overriding > criterion in making that decision. Which is more readable? I would argue for either:: I love using the Python programming langauge (http://www.python.org). or:: I love using the Python programming language (U{http:/www.python.org}). ... But I know you'll disagree. :) > It is my opinion that incomplete, minimal markup schemes are doomed > to failure, because *your* minimal set of features doesn't match > *my* set or *anybody else's*. At least at the discussion level. ;-) I was trying to base my minimal set on previous successful docstring markup langagues (POD and JavaDoc).. If you think we need to add more features, then we should talk about what features to add. But only if we're still trying to work towards coming up with a "community standard" markup language (i.e., something we can put in a PEP). Otherwise, we might as well just go off and implement our own little markup languages. :) But at the end of the day, (perhaps I should say end of the year? ;), I would like to have a simple, streight-forward, *bounded* markup language. > - whose first line begins with '.. ' in column 1, > - whose second and subsequent lines are indented relative to the first, and > - which ends with a blank or unindented line. Hm.. My bad. I skipped past the "Comments and Directives" section to the "Directives" subsection. From the example in that subsection (which was presumably not correctly formatted), I assumed that the second and subsequent lines didn't need to be indented:: .. keywords:: Author: Anne Elk (Miss) Revision: 1 So I guess we basically agree. Is it ok with you to change that to "and ends with an unindented line" for now (in our discussion of directives)? > There are essentially two types of directives: extensions, which > apply to their blocks only; and plugins, which may change the > behaviour of the parser for some defined part of the input. I would like to *only* allow "extensions." If we allow "plugins," then a parser that doesn't recognize a directive really has no choice but to fail. I really don't see the need for plugins.. One advantage of just using fields is that we can deal with unknown fields in a reasonable way: put thier contents in a section labeled with the name of the field. -Edward From edloper@gradient.cis.upenn.edu Fri Apr 20 05:32:38 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Fri, 20 Apr 2001 00:32:38 EDT Subject: [Doc-SIG] Structuring: a summary; and an attempt at EBNF.. In-Reply-To: Your message of "Thu, 19 Apr 2001 15:12:12 BST." <005401c0c8da$ba5963a0$f05aa8c0@lslp7o.int.lsl.co.uk> Message-ID: <200104200432.f3K4Wcp27309@gradient.cis.upenn.edu> > [nb: my personal vote is that *obviously* doctest blocks don't need > a "::" in front of them. Their detection should be *identical* to > the means used by doctest.py - otherwise people really *will* get > confused... - hmm, of course, that actually doesn't work already, as > doctest.py will happily "see" a ">>>" inside a literal block. Ho > hum.] I agree that the detection of ">>>" should be identical to doctest's algorithms. But I think that if we ever *do* manage to come up with a standard markup language, and get a PEP accepted, etc, we could probably get the doctest module changed so it ignores any ">>>" that's within a literal block (it should be pretty easy to scan for that). So I wouldn't worry about the "doctest in literal block" problem for now. > > 2. Literal blocks start with a paragraph that ends with "::" > > Pedantry - they start with the first non-blank line *after* the "::" Um, yeah, that's what I meant. And actually I think we should strip leading and trailing blank lines (but not internal blank lines) from literal blocks. > So:: > > This here:: > > Is clearly OK > Literal block. > but what about:: > > This here:: > > Is this literal? Maybe a warning, more likely an error. > and:: > > Some text. > > This here:: > > Is this literal? Maybe a warning, more likely an error. In my EBNF, the 2nd and 3rd would be errors. >> and continue to the next line whose indentation is equal >> to or less than that of the paragraph that started them. > > Surely that should (for a start) be "next non-blank line" Yes.. let's change it to:: 2. Literal blocks start after a paragraph that ends with "::", and end before the next (non-blank) line whose indentation is less than or equal to that of the paragraph that introduced them. or something like that, anyway.. The language could still use some cleaning-up. If anyone on the group doesn't understand, say so, and I'll try to explain it better. > And terms like "the paragraph that started them" is why I like terms > like "parent paragraph" - it's a lot easier to work with. But it's not great for a 1-minute overview that's supposed to be accessible to anyone. :) (And, actually, I would say that that's the previous sister paragraph, not the parent, at least in the resultant DOM tree). > So that answers the "indentation by 0" question. But they can't be > separated by blank lines, 'cos those are part of the literal block > (this is *quite* important - as is preservation of the correct > *number* of (internal) blank lines). As I said, I think that leading and trailing blank lines should be stripped.. (but not any internal blank lines) Do you disagree? What do other people think? > Damn - I was trying not to get involved... Don't let us drag you into this too much.. Feel free not to respond to anything I send... Sanity can be a nice thing. (I've been close to going sane, myself, a few times over the last month). -Edward From edloper@gradient.cis.upenn.edu Sat Apr 21 21:19:29 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sat, 21 Apr 2001 16:19:29 EDT Subject: [Doc-SIG] Finding cannonical names for objects Message-ID: <200104212019.f3LKJTp10196@gradient.cis.upenn.edu> When writing a documentation tool, it would be nice to be able to figure out what the "parent" of an object, where by parent, I mean: - for a module in a package, its package - for a function or class, the module it was originally defined in - for a member function, its class Among other things, this is useful for trying to establish a unique "cannonical" name for something that we're documenting, so we can make sure that inter-documentation pointers are correct (e.g., if we're converting docs to HTML). However, it's not clear how to do this in several cases. The cases where it *is* stright-forward to do it are: - for non-builtin modules, extract the package information from the __name__ field. Will this work for built-in packages, too? What's an example of a built-in package? - for non-builtin classes, consult the __module__ field - for non-builtin member functions, consult the im_class field - built-in classes all seem to have a __module__ field (e.g., exception.Exception or sys.last_type). Is this always true? In the case of non-builtin functions, I can think of two ways to do it:: def find_function_module_1(func): for module in sys.modules.values(): if func.func_globals == module.__dict__: return module.__name__ raise ValueError("Couldn't find the module for this function") def find_function_module_2(func): from os.path import basename, splitext try: return splitext(basename(inspect.getabsfile(func)))[0] except: raise ValueError("Couldn't find the module for this func") Is one of these approaches preferable? Will they ever give different results? Is there a reason that non-builtin functions don't have a __module__ field, like classes do? (Or a reason that built-in methods *do* have the __module__ field?) The other difficult cases are built-in objects. In general, I don't see any way to get parents for built-in objects. The relevant built-in objects that I know of are: - built-in functions (e.g., len, min, sys.settrace) - built-in methods (e.g., [].append, file(...).read) - non-builtin methods with underlying builtin functions (e.g., Exception.__str__) Is there any way to get the "parents" for these objects? (It would be *nice* if doctools could process built-in objects as well as non-builtin ones.) Another possible approach to finding cannonical names for objects is to use their ids (as returned by the builtin function id()). This wouldn't be as nice, since it would result in basically arbitrary names, but at least everything we document could be given a unique, cannonical name (within a given session). But I'm somewhat confused about id(). In particular, it seems to return a value for integers.. But since the returned value is an integer, it seems like that implies that at least 2 *different* values will have the same id.. Am I missing something? Is there somewhere I can read about what guarantees are given about whether two values' ids will be different? (e.g., if a value is GC'ed, can its id be recycled? I assume yes..) -Edward p.s., Is there a reason that __builtins__.__name__ == '__builtin__' instead of '__builtins__'? From fdrake@acm.org Sun Apr 22 03:11:52 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sat, 21 Apr 2001 22:11:52 -0400 (EDT) Subject: [Doc-SIG] Finding cannonical names for objects In-Reply-To: <200104212019.f3LKJTp10196@gradient.cis.upenn.edu> References: <200104212019.f3LKJTp10196@gradient.cis.upenn.edu> Message-ID: <15074.15848.187550.147390@cj42289-a.reston1.va.home.com> Edward D. Loper writes: > p.s., Is there a reason that __builtins__.__name__ == '__builtin__' > instead of '__builtins__'? Yes; the same reason that "import __builtin__" works but "import __builtins__" does not. ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From edloper@gradient.cis.upenn.edu Sun Apr 22 03:59:58 2001 From: edloper@gradient.cis.upenn.edu (Edward D. Loper) Date: Sat, 21 Apr 2001 22:59:58 EDT Subject: [Doc-SIG] Finding cannonical names for objects In-Reply-To: Your message of "Sat, 21 Apr 2001 22:11:52 EDT." <15074.15848.187550.147390@cj42289-a.reston1.va.home.com> Message-ID: <200104220259.f3M2xwp15626@gradient.cis.upenn.edu> > Edward D. Loper writes: > > p.s., Is there a reason that __builtins__.__name__ == '__builtin__' > > instead of '__builtins__'? > > Yes; the same reason that "import __builtin__" works but "import > __builtins__" does not. ;-) Ok. I guess maybe my question should have been why the default global (?) variable to access them is called "__builtins__" rather than "__builtin__":: Python 2.1 (#1, Apr 21 2001, 20:23:34) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> dir() ['__builtins__', '__doc__', '__name__'] >>> type(__builtins__), __builtins__.__name__ (, '__builtin__') -Edward From fdrake@acm.org Sun Apr 22 06:21:42 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sun, 22 Apr 2001 01:21:42 -0400 (EDT) Subject: [Doc-SIG] Finding cannonical names for objects In-Reply-To: <200104220259.f3M2xwp15626@gradient.cis.upenn.edu> References: <15074.15848.187550.147390@cj42289-a.reston1.va.home.com> <200104220259.f3M2xwp15626@gradient.cis.upenn.edu> Message-ID: <15074.27238.314564.581602@cj42289-a.reston1.va.home.com> Edward D. Loper writes: > Ok. I guess maybe my question should have been why the default global > (?) variable to access them is called "__builtins__" rather than > "__builtin__":: __builtins__ is an implementation detail, nothing more. It is used to obtain the built-in functions; for all namespaces other than __main__, __builtins__ is a dictionary rather than a module. The identity of the __builtins__ dict is also used to determine if code is running in restricted execution mode; if the name bound to __builtins__ is not the __builtin__ module or the dictionary for that module, the restricted execution rules are in place. The name is different so it doesn't clash, and allows a minor performance improvement over using the __builtin__ module when accessing the built-in namespace. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@beowolf.digicool.com Sun Apr 22 07:08:22 2001 From: fdrake@beowolf.digicool.com (Fred Drake) Date: Sun, 22 Apr 2001 02:08:22 -0400 (EDT) Subject: [Doc-SIG] [maintenance doc updates] Message-ID: <20010422060822.A3E4428A0B@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/maint-docs/ First attempt to push maintenance docs to the SourceForge site. From fdrake@beowolf.digicool.com Sun Apr 22 07:12:15 2001 From: fdrake@beowolf.digicool.com (Fred Drake) Date: Sun, 22 Apr 2001 02:12:15 -0400 (EDT) Subject: [Doc-SIG] [maintenance doc updates] Message-ID: <20010422061215.5C87D28A0B@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/maint-docs/ Second attempt to push maintenance docs to the SourceForge site. From fdrake@beowolf.digicool.com Sun Apr 22 07:15:52 2001 From: fdrake@beowolf.digicool.com (Fred Drake) Date: Sun, 22 Apr 2001 02:15:52 -0400 (EDT) Subject: [Doc-SIG] [maintenance doc updates] Message-ID: <20010422061552.5A99628A0B@beowolf.digicool.com> The development version of the documentation has been updated: http://python.sourceforge.net/maint-docs/ Third attempt to push maintenance docs to the SourceForge site. Sheesh! From tony@lsl.co.uk Mon Apr 23 11:54:29 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 23 Apr 2001 11:54:29 +0100 Subject: [Doc-SIG] Structuring: a summary; and an attempt at EBNF.. In-Reply-To: <200104200432.f3K4Wcp27309@gradient.cis.upenn.edu> Message-ID: <006d01c0cbe3$c54349f0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > 2. Literal blocks start after a paragraph that ends with "::", > and end before the next (non-blank) line whose indentation is > less than or equal to that of the paragraph that introduced them. > > or something like that, anyway.. The language could still use some > cleaning-up. If anyone on the group doesn't understand, say so, and > I'll try to explain it better. I think it still needs work (!) but it's getting there... > > And terms like "the paragraph that started them" is why I like terms > > like "parent paragraph" - it's a lot easier to work with. > > But it's not great for a 1-minute overview that's supposed to be > accessible to anyone. :) (And, actually, I would say that that's the > previous sister paragraph, not the parent, at least in the resultant > DOM tree). Well, in the DOM tree, yes (I forgot that). The term "preceding non-literal paragraph" is indeed a bit cumbersome... - perhaps we're best off with "the '""' paragraph", which is (sort of) fairly obvious. > As I said, I think that leading and trailing blank lines should be > stripped.. (but not any internal blank lines) Do you disagree? No - it is clearly correct to strip preceding and trailing blank lines, mea culpa for not being pedantic about that! I also think that we agree on the "awkward examples" (which is why I didn't copy them again). > > Damn - I was trying not to get involved... > > Don't let us drag you into this too much.. Feel free not to respond to > anything I send... Sanity can be a nice thing. (I've been close to > going sane, myself, a few times over the last month). I'm trying to maintain the attitude that I can be quieter on the grounds that you and David are working on things - with TWO people arguing (erm, discussing) their way towards something, I'm sort-of happy. Not that I'm necessarily happy with the final result *in total*, but if I don't have time to argue and/or implement, that's just tough luck. Besides, I think the *structuring* is getting there (although I am a little worried that an implicit goal of being able to cope with "text as it were wrote" that already exists may be being lost - not a problem if it's not an aim, but grist for an email at another time, I think). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Apr 23 12:16:49 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 23 Apr 2001 12:16:49 +0100 Subject: Context dependent markup (was [Doc-SIG] Re: Ho hum - back to work...) In-Reply-To: <200104200306.f3K36Fp19203@gradient.cis.upenn.edu> Message-ID: <006e01c0cbe6$e42aa310$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote (well, in response to me): > > I still don't understand why Edward (and Guido, although I think > > he's less likely to answer!) object to "simple" markup like ST and > > relatives use - why they consider it a Bad Thing to (a) use > > punctuation characters for markup, and (b) use them in a context > > dependent manner. > > I actually don't object to either (a) or (b), strictly speaking. What > I object to is markup that I think will be "unsafe." For example, I > have no problem with using *one* *word* *emph*, or saying that > backticks around any valid Python identifier mark it as a Python > object. My biggest pet peve about ST-like markup is having a markup > be context-dependant, with a basically unbounded context.. For > example, if "*" starts an emph region only if there's another "*" > later in the string somewhere; and otherwise is an asterisk. Which I don't think I've ever proposed... > This > seems very dangerous to me. I want to be able to tell (under most > circumstances), by looking at a character and its immediate context, > whether it's markup or not.. But my problem is that I think this has always been possible (and I think you disagree) so there is clearly some leeway on this clarity, which is what I'm trying to track down. > So, as long as we keep our contexts > relatively small, I don't object to context-dependant markup. (In > fact, both bullets and "::" are definitely context-sensitive markup, > and I think they're very intuitive.) > > As for using punctuation characters, that's fine (what else would you > use??), but if possible we should try to keep the need for escaping to > a minimum, because escaping will be ugly and non-intuitive, no matter > how we do it. So we should try to keep the number of punctuation > characters we use to a minimum. Ooh - there's that nasty POD-clone (hmm - bad puns about pod-people narrowly averted) > > (As a subpoint, I don't *quite* understand why Edward wants to > > separate structuring and colourising so much - this seems to me to > > be implementation detail (for this purpose, I consider the EBNF to > > be "implementation" as well) - real people don't have trouble with > > fuzzy distinctions about such things.) > > There are really 2 reasons: > > 1. A general divide-and-conquor approach to the problem of coming > up with a markup language. I'm more confident that we'll be able > to come to consensus on smaller issues/domains than larger ones. > This reason has nothing to do with the final markup language, and > everything to do with how we get there. > > 2. A side-effect of dividing structuring and colorizing is > eliminating a number of issues, such as how to tell whether > a line in a paragraph starting with "1." is a bullet or a > continuation of the previous line. > > 3. I think that the markup language will be easier to understand > if colorizing and structuring don't interact much. > > My original reasons were (1) and (3). (2) was something that happily > fell out. I like 1. I suspect that 3 doesn't always split *quite* that way, but point taken. I think 2 addresses exactly that issue about how it doesn't always split that way (and I tend to agree with Tim Peters' point some while back that "if it looks like markup..." (to paraphrase aggressively)). OK. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Apr 23 12:16:51 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 23 Apr 2001 12:16:51 +0100 Subject: Reasons to do this (was [Doc-SIG] Re: Ho hum - back to work...) In-Reply-To: <200104200306.f3K36Fp19203@gradient.cis.upenn.edu> Message-ID: <006f01c0cbe6$e5484810$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > > B. Reasons to be doing this > [Summarized:] > - NOT: we don't need to invent a markup language > - DOC: we want to be more expressive in our docstrings > - REP: we want to be smarter about displaying docstrings > - STRUC: we want to be able to do smart things with our > docstrings (other than displaying them). > > I would say that for me, your REP and DOC would be my most important > reasons for this work, probably in that order. The reason that I put > REP above DOC is because I think that the need for standardization is > much less for DOC than it is for REP. Interesting. For me, the order is quite close anyway, so I think this is agreement. > > I'm not sure I actually believe that we're going to get a lot from > > STRUC > > One thing we get is the ability to check for certain completeness > criteria in our documentation.. e.g., did I specify a return > value/type for everything that returns something? did I describe > every parameter? Which is *not* the classic thing people claim to want from STRUC - they normally seem to be asking for the ability to extract information for querying. *If* we are primarily interested in DOC and REP (survey of 2 people, so terribly significant) then I think that has repercussions. I need to develop my ideas on this a bit more. (It seems worth noting, to me, that if STRUC were the main reason, then the use of __version__, __author__ (and potential others), and the way that the Types SIG looks like allowing string annotation of typed argument values, would seem to make its case a LOT less strong. And if REP/DOC is our main aim, we need to consider what it is that we object to about what pydoc does (which is a brutal rendering of the text as written, with URIs guessed at) - an extreme position would say that structuring was unnecessary and only the markup/colourising was needed... Hmm - I'll think about trying to expand on these things, since they seem to be useful potential ammo for (a) convincing ourselves, (b) convincing others...) Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From tony@lsl.co.uk Mon Apr 23 12:16:55 2001 From: tony@lsl.co.uk (Tony J Ibbs (Tibs)) Date: Mon, 23 Apr 2001 12:16:55 +0100 Subject: [Doc-SIG] Re: Ho hum - back to work... In-Reply-To: <200104200306.f3K36Fp19203@gradient.cis.upenn.edu> Message-ID: <007001c0cbe6$e7749df0$f05aa8c0@lslp7o.int.lsl.co.uk> Edward D. Loper wrote: > > Well, I recovered from my flu (eventually) and am now back to > > "normal". > > That's good to hear. I was beginning to worry that you didn't like us > anymore. :) Oh no - silence is golden, and all that. Seriously, I have lost a lot of the drive to steal time away from other things (and we had a *major* backlog of ironing, too), so I suspect that I am going to do much less implementation work. Which should not be a problem if you and David are willing to code. > It does seem like it would be nice to have a parser with which we can > try a number of different rules.. And since you've already spent a > fair amount of time on that, that seems like a reasonable thing to > work on. Unfortunately, I used some of Friday up on non-Python things (isn't that just the way), and it's clear that there is more than a couple of days work needed on fat.py. I may get round to work on it, but it's likely to be slow... (if someone else wants the current code, I'll update the web stuff - but of course working on someone else's code at this stage in development isn't necessarily unalloyed joy) > > 2. Work on the Doc-SIG archives, to try to produce summaries of the > > arguments from its lifetime. Note that (technically) we may need > > this for any PEPs we produce! (and it would clearly be useful to be > > able to *point* to who said what and why, given the history of the > > group). > > I tried to do this a few weeks back, (including copius pointers to > individual articles), but gave up because I don't have *that* much > free time. :) But it would be *really* useful to have, I think, and it > you're more familiar with the archives, then maybe it wouldn't take as > long.. At least getting a start on it would be nice. In the end that's what I started work on. It would be easier if my modem/internet connection were reliable to download the whole archive, but I've got up to beginning of December 2000 as one file, and have got *most* of the way through removing non-relevant messages (damn - why do so many of them have to be interesting) and starting to populate the inside of my head with what previous arguments have said. I think there will be some serious issues (particularly about the grand scope/applicability of what Doc-SIG is trying to do for docstrings) that emerge, so it does seem important to do. > Overall, I'd say to work on docutils/fat.py, but mainly because you've > already invested a fair amount of work in it. Maybe we can convince > someone else to do the doc-sig summary stuff? :) Oh well, I chose the other one for now. The issues to be resolved include: * strip out the support for the "initial one line summary" stuff that Guido *doesn't* need, after all * remove them from docstrings (both of these are trivial, of course) * change the literal quote character to be backtick (trivial) * add support for underlined headers (hmm) * add (optional) support for your "lists must be indented after the first line" * add (optional) support for "blank lines between list items" (trivial) * add (optional) support for "blank lines needed before and after lists" (more complex, and I'm not convinced useful) * add support for your "markup significant regardless of placement" (strikes me as hard given the way the program currently works - needs more thought) * add (optional) support for requiring URIs to be in "<" and ">" I toyed with doing everything except the markup one (since I'm still not convinced on that issue), but it looked like more work than I'd obviously have time for, so I put it off. Other issues in other threads... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.) From gherman@darwin.in-berlin.de Wed Apr 25 08:36:49 2001 From: gherman@darwin.in-berlin.de (Dinu Gherman) Date: Wed, 25 Apr 2001 09:36:49 +0200 Subject: [Doc-SIG] Issues with 2.1 doc PDF files Message-ID: <3AE67E91.2DADD89D@darwin.in-berlin.de> Hello, I've just noticed that there are considerable differences between the PDF files documenting Python 2.1 and those for 2.0. Basically, these are: - file sizes are much bigger for 2.1 (50-100%) - fonts are approximated with pixelized bitmaps As a result you get: - much longer page building times in PDF readers - considerable longer search times - much longer print times (probably - haven't checked) I've not verified this for each PDF file (it's prominently obvious for the tutorial, though), but I assume the same effects can be observed for all of them, when comparing with corresponding files from the previous release. So, I'm really curious what the reason for this phenomenon could be? If there isn't any, I suggest reproducing the files to no longer show the described effects as they will definitely distract people from reading the files online and simply lead to a bad overall impression about their generation process, if not even their content. Regards, Dinu -- Dinu C. Gherman ReportLab Consultant - http://www.reportlab.com ................................................................ "The only possible values [for quality] are 'excellent' and 'in- sanely excellent', depending on whether lives are at stake or not. Otherwise you don't enjoy your work, you don't work well, and the project goes down the drain." (Kent Beck, "Extreme Programming Explained") From gherman@darwin.in-berlin.de Wed Apr 25 11:08:11 2001 From: gherman@darwin.in-berlin.de (Dinu Gherman) Date: Wed, 25 Apr 2001 12:08:11 +0200 Subject: [Doc-SIG] Re: Issues with 2.1 doc PDF files References: <3AE67E91.2DADD89D@darwin.in-berlin.de> Message-ID: <3AE6A20B.D44D745F@darwin.in-berlin.de> I wrote: > > I've just noticed that there are considerable differences > between the PDF files documenting Python 2.1 and those for > 2.0. [...] After running a diff over tut.tex for both versions I get an even more interesting difference concerning the *content* (left 2.0, right 2.1, excerpts only): 2603c2620 < '[31.4, 40000]' --- > '[31.400000000000002, 40000]' 2611c2628 < "(31.4, 40000, ('spam', 'eggs'))" --- > "(31.400000000000002, 40000, ('spam', 'eggs'))" Being curious, I typed the following into Pythonwin and am quite baffled by the results: PythonWin 2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 bit (Intel)] [...] >>> x = 10 * 3.14 >>> x 31.400000000000002 >>> x == 31.4 0 >>> 3.14 3.1400000000000001 >>> Aparently the doc for 2.0 might have been generated with 1.5.2, but I don't have a running 1.5.2 to cross-check this quickly. I guess for many this must look like something unexpected, isn't it? But is it a reason to panic? I don't find anything explaining this behaviour in the FAQ. If it has to do with internal floating point representation limits it might be an issue of general interest and worth being documented somewhere. If this observation was discussed long ago on comp.lang.python please forgive me (and point me to it), but I haven't had the time to follow this group very much recently... Regards, Dinu -- Dinu C. Gherman ReportLab Consultant - http://www.reportlab.com ................................................................ "The only possible values [for quality] are 'excellent' and 'in- sanely excellent', depending on whether lives are at stake or not. Otherwise you don't enjoy your work, you don't work well, and the project goes down the drain." (Kent Beck, "Extreme Programming Explained") From gherman@darwin.in-berlin.de Wed Apr 25 20:36:36 2001 From: gherman@darwin.in-berlin.de (Dinu Gherman) Date: Wed, 25 Apr 2001 21:36:36 +0200 Subject: [Doc-SIG] Re: Issues with 2.1 doc PDF files References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <3AE6A20B.D44D745F@darwin.in-berlin.de> Message-ID: <3AE72744.8FF44DD7@darwin.in-berlin.de> I wrote: > > If this observation was discussed long ago on comp.lang.python > please forgive me (and point me to it), but I haven't had the > time to follow this group very much recently... Ok, I learned that this is said to be expected behaviour in 2.x. Still, I think it should be documented in the main FAQ and not only in this one: http://www.python.org/cgi-bin/moinmoin/FrequentlyAskedQuestions#line24 Dinu From guido@digicool.com Wed Apr 25 21:41:03 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 25 Apr 2001 15:41:03 -0500 Subject: [Doc-SIG] Re: Issues with 2.1 doc PDF files In-Reply-To: Your message of "Wed, 25 Apr 2001 12:08:11 +0200." <3AE6A20B.D44D745F@darwin.in-berlin.de> References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <3AE6A20B.D44D745F@darwin.in-berlin.de> Message-ID: <200104252041.PAA15107@cj20424-a.reston1.va.home.com> > Being curious, I typed the following into Pythonwin and am > quite baffled by the results: > > PythonWin 2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 bit (Intel)] > [...] > >>> x = 10 * 3.14 > >>> x > 31.400000000000002 > >>> x == 31.4 > 0 > >>> 3.14 > 3.1400000000000001 > >>> Yes, this is baffling at first, and a FAQ, if you knwo where to look: http://www.python.org/cgi-bin/moinmoin/FrequentlyAskedQuestions#line24 --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Wed Apr 25 21:48:55 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 25 Apr 2001 15:48:55 -0500 Subject: [Doc-SIG] Re: Issues with 2.1 doc PDF files In-Reply-To: Your message of "Wed, 25 Apr 2001 21:36:36 +0200." <3AE72744.8FF44DD7@darwin.in-berlin.de> References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <3AE6A20B.D44D745F@darwin.in-berlin.de> <3AE72744.8FF44DD7@darwin.in-berlin.de> Message-ID: <200104252048.PAA15230@cj20424-a.reston1.va.home.com> > Ok, I learned that this is said to be expected behaviour > in 2.x. Still, I think it should be documented in the > main FAQ and not only in this one: > > http://www.python.org/cgi-bin/moinmoin/FrequentlyAskedQuestions#line24 Rather than complaining, you can do it yourself. The main FAQ's password is "Spam". --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Wed Apr 25 21:10:34 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 25 Apr 2001 16:10:34 -0400 (EDT) Subject: [Doc-SIG] Issues with 2.1 doc PDF files In-Reply-To: <3AE67E91.2DADD89D@darwin.in-berlin.de> References: <3AE67E91.2DADD89D@darwin.in-berlin.de> Message-ID: <15079.12090.738565.710150@cj42289-a.reston1.va.home.com> Dinu Gherman writes: > I've just noticed that there are considerable differences > between the PDF files documenting Python 2.1 and those for > 2.0. Basically, these are: > > - file sizes are much bigger for 2.1 (50-100%) Yeah, I thought they looked a little large, but wasn't sure why. I've also noticed (and this isn't new) that the A4 versions are quite a bit larger than the US-Letter versions: about 50% for the PDF, not so much for the PostScript. I have no idea why this would be the case. I presume you were looking at the A4 version. > - fonts are approximated with pixelized bitmaps That's not good. > As a result you get: > > - much longer page building times in PDF readers > - considerable longer search times > - much longer print times (probably - haven't checked) > > I've not verified this for each PDF file (it's prominently > obvious for the tutorial, though), but I assume the same > effects can be observed for all of them, when comparing with > corresponding files from the previous release. This would be something to look at -- recall that we added some magic to the tutorial to control interpretation of the document encoding, so that some Latin-1 characters would be typeset correctly. (This was at your prodding, as I recall! ;-) Could you please look at at least one of the other documents to see if they exhibit the same symptoms? > So, I'm really curious what the reason for this phenomenon > could be? If there isn't any, I suggest reproducing the > files to no longer show the described effects as they will If you can tell me how to control these things, I'm sure we can build another distribution. I have no idea how to control this -- this goes deeper into the LaTeX/pdfLaTeX magic than I'm familiar with. > definitely distract people from reading the files online > and simply lead to a bad overall impression about their > generation process, if not even their content. Do people really use the PDF onscreen? I've always imagined Windows users use them to print from, since PostScript printers are less common under Windows than under Linux & Unix. I'd be curious as to whether onscreen display or printing is widespread for the PDF version -- for onscreen viewing I'd expect a very different layout. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@acm.org Wed Apr 25 21:15:14 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 25 Apr 2001 16:15:14 -0400 (EDT) Subject: [Doc-SIG] Re: Issues with 2.1 doc PDF files In-Reply-To: <3AE6A20B.D44D745F@darwin.in-berlin.de> References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <3AE6A20B.D44D745F@darwin.in-berlin.de> Message-ID: <15079.12370.524256.761513@cj42289-a.reston1.va.home.com> Dinu Gherman writes: > After running a diff over tut.tex for both versions I get an > even more interesting difference concerning the *content* > (left 2.0, right 2.1, excerpts only): The Python 2.0 documentation was not properly updated. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From gherman@darwin.in-berlin.de Wed Apr 25 22:05:55 2001 From: gherman@darwin.in-berlin.de (Dinu Gherman) Date: Wed, 25 Apr 2001 23:05:55 +0200 Subject: [Doc-SIG] Re: Issues with 2.1 doc PDF files References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <3AE6A20B.D44D745F@darwin.in-berlin.de> <3AE72744.8FF44DD7@darwin.in-berlin.de> <200104252048.PAA15230@cj20424-a.reston1.va.home.com> Message-ID: <3AE73C33.98DFD478@darwin.in-berlin.de> Guido van Rossum wrote: > > > Ok, I learned that this is said to be expected behaviour > > in 2.x. Still, I think it should be documented in the > > main FAQ and not only in this one: > > > > http://www.python.org/cgi-bin/moinmoin/FrequentlyAskedQuestions#line24 > > Rather than complaining, you can do it yourself. The main FAQ's > password is "Spam". Great, I didn't know that! What is still unclear to me is if there is a real need for two FAQs (and maybe more in the future)? Especially, as the MoinMoin one seems to be unreachable from python.org and python.org/search. To me right now it looks like leading people astray with- out a good reason. Also, given more frequent releases, would it make sense, perhaps, to indicate the Python version that a specific feature/module/whatever is available from in some of the standard Python documentation files? Dinu From dfan@harmonixmusic.com Wed Apr 25 22:08:29 2001 From: dfan@harmonixmusic.com (Dan Schmidt) Date: 25 Apr 2001 17:08:29 -0400 Subject: [Doc-SIG] Issues with 2.1 doc PDF files In-Reply-To: <15079.12090.738565.710150@cj42289-a.reston1.va.home.com> References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <15079.12090.738565.710150@cj42289-a.reston1.va.home.com> Message-ID: "Fred L. Drake, Jr." writes: | Dinu Gherman writes: | | > definitely distract people from reading the files online | > and simply lead to a bad overall impression about their | > generation process, if not even their content. | | Do people really use the PDF onscreen? I've always imagined | Windows users use them to print from, since PostScript printers are | less common under Windows than under Linux & Unix. I'd be curious | as to whether onscreen display or printing is widespread for the PDF | version -- for onscreen viewing I'd expect a very different layout. I don't view the Python docs with PDF, but I read many other PDF files 'onscreen' rather than printing them out. -- http://www.dfan.org From fdrake@acm.org Wed Apr 25 22:25:44 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 25 Apr 2001 17:25:44 -0400 (EDT) Subject: [Doc-SIG] Issues with 2.1 doc PDF files In-Reply-To: References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <15079.12090.738565.710150@cj42289-a.reston1.va.home.com> Message-ID: <15079.16600.483240.541905@cj42289-a.reston1.va.home.com> Dan Schmidt writes: > I don't view the Python docs with PDF, but I read many other PDF files > 'onscreen' rather than printing them out. Would you *like* to be able to read the Python PDF version onscreen, or is one of the other versions preferable for you? I guess what I'd like to figure out is how many people would like to use a version that they're not using because there's some impediment (files are too large to d/l, lacks functionality, has formatting problems, etc.). -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From gherman@darwin.in-berlin.de Wed Apr 25 22:30:32 2001 From: gherman@darwin.in-berlin.de (Dinu Gherman) Date: Wed, 25 Apr 2001 23:30:32 +0200 Subject: [Doc-SIG] Issues with 2.1 doc PDF files References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <15079.12090.738565.710150@cj42289-a.reston1.va.home.com> <15079.16600.483240.541905@cj42289-a.reston1.va.home.com> Message-ID: <3AE741F8.2C4DF4FF@darwin.in-berlin.de> "Fred L. Drake, Jr." wrote: > > Dan Schmidt writes: > > I don't view the Python docs with PDF, but I read many other PDF files > > 'onscreen' rather than printing them out. > > Would you *like* to be able to read the Python PDF version onscreen, > or is one of the other versions preferable for you? > I guess what I'd like to figure out is how many people would like to > use a version that they're not using because there's some impediment > (files are too large to d/l, lacks functionality, has formatting > problems, etc.). I'll add only that much here: I like viewing PDFs onscreen because unlike HTML they allow to search stuff and nicely print parts quickly. I'm less bothered by the paper layout which is not ideal for the screen. I'll come back to your previous comments tomorrow... Dinu From fdrake@acm.org Wed Apr 25 22:32:18 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 25 Apr 2001 17:32:18 -0400 (EDT) Subject: [Doc-SIG] Re: Issues with 2.1 doc PDF files In-Reply-To: <3AE73C33.98DFD478@darwin.in-berlin.de> References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <3AE6A20B.D44D745F@darwin.in-berlin.de> <3AE72744.8FF44DD7@darwin.in-berlin.de> <200104252048.PAA15230@cj20424-a.reston1.va.home.com> <3AE73C33.98DFD478@darwin.in-berlin.de> Message-ID: <15079.16994.274877.42370@cj42289-a.reston1.va.home.com> Dinu Gherman writes: > Also, given more frequent releases, would it make sense, > perhaps, to indicate the Python version that a specific > feature/module/whatever is available from in some of the > standard Python documentation files? There is an increasing number of annotations providing versioning information in the documentation. If you find anything specific that lacks information that would have been helpful, please let me know. This is sufficient reason to file a documentation bug report: http://sourceforge.net/tracker/?func=add&group_id=5470&atid=105470 Be sure to set the "Category" field to "Documentation"; the bug report will be automatically assigned to me. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From dfan@harmonixmusic.com Wed Apr 25 22:37:49 2001 From: dfan@harmonixmusic.com (Dan Schmidt) Date: 25 Apr 2001 17:37:49 -0400 Subject: [Doc-SIG] Issues with 2.1 doc PDF files In-Reply-To: <15079.16600.483240.541905@cj42289-a.reston1.va.home.com> References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <15079.12090.738565.710150@cj42289-a.reston1.va.home.com> <15079.16600.483240.541905@cj42289-a.reston1.va.home.com> Message-ID: "Fred L. Drake, Jr." writes: | Dan Schmidt writes: | > I don't view the Python docs with PDF, but I read many other PDF files | > 'onscreen' rather than printing them out. | | Would you *like* to be able to read the Python PDF version onscreen, | or is one of the other versions preferable for you? | | I guess what I'd like to figure out is how many people would like to | use a version that they're not using because there's some impediment | (files are too large to d/l, lacks functionality, has formatting | problems, etc.). I guess the HTML version is the most useful for me right now. The Info version would be if it were up to date. So in the specific case of Python, I don't really need the pdf. However, as a general principle, when I do download a .pdf file, I expect to be able to read it on-screen. That may not be relevant to the question you're asking, though. -- http://www.dfan.org From guido@digicool.com Thu Apr 26 00:19:01 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 25 Apr 2001 18:19:01 -0500 Subject: [Doc-SIG] Re: Issues with 2.1 doc PDF files In-Reply-To: Your message of "Wed, 25 Apr 2001 23:05:55 +0200." <3AE73C33.98DFD478@darwin.in-berlin.de> References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <3AE6A20B.D44D745F@darwin.in-berlin.de> <3AE72744.8FF44DD7@darwin.in-berlin.de> <200104252048.PAA15230@cj20424-a.reston1.va.home.com> <3AE73C33.98DFD478@darwin.in-berlin.de> Message-ID: <200104252319.SAA15920@cj20424-a.reston1.va.home.com> > Guido van Rossum wrote: > > > > > Ok, I learned that this is said to be expected behaviour > > > in 2.x. Still, I think it should be documented in the > > > main FAQ and not only in this one: > > > > > > http://www.python.org/cgi-bin/moinmoin/FrequentlyAskedQuestions#line24 > > > > Rather than complaining, you can do it yourself. The main FAQ's > > password is "Spam". > > Great, I didn't know that! What is still unclear to me > is if there is a real need for two FAQs (and maybe more > in the future)? Especially, as the MoinMoin one seems to > be unreachable from python.org and python.org/search. > To me right now it looks like leading people astray with- > out a good reason. The MoinMoin was an experiment because the FAQ appeared cumbersome to maintain. I'm not convinced that it worked. The problem is, somebody needs to own the FAQ and it ain't gonna be me, so until someone picks it up, it's like my backyard -- a wasteland with great potential but mostly collecting piles of dead leaves... > Also, given more frequent releases, would it make sense, > perhaps, to indicate the Python version that a specific > feature/module/whatever is available from in some of the > standard Python documentation files? Carefully study the official Python docs -- they already indicates the version where something is introduced or changed. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu Apr 26 00:26:56 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 25 Apr 2001 18:26:56 -0500 Subject: [Doc-SIG] Issues with 2.1 doc PDF files In-Reply-To: Your message of "Wed, 25 Apr 2001 17:25:44 -0400." <15079.16600.483240.541905@cj42289-a.reston1.va.home.com> References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <15079.12090.738565.710150@cj42289-a.reston1.va.home.com> <15079.16600.483240.541905@cj42289-a.reston1.va.home.com> Message-ID: <200104252326.SAA15953@cj20424-a.reston1.va.home.com> > Would you *like* to be able to read the Python PDF version onscreen, > or is one of the other versions preferable for you? I would surmise that for almost everybody, HTML wins big over PDF onscreen, and PDF wins big over HTML for printing. Soon, I bet we won't have to distribute PostScript any more, because everyone can use PDF for printing. But for on-screen viewing, the pagination of PDF quickly gets annoying. This is quite independent from the content, and applies to any kind of documentation, not just Python's. Now, some *producers* of information prefer to only give you PDF even for on-screen viewing, because it gives them more control over fonts and lay-out. Occasionally (e.g. with detailed drawings where zooming in is actually useful) I see the point, but usually browsing PDF just annoys me. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Wed Apr 25 23:31:35 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 25 Apr 2001 18:31:35 -0400 (EDT) Subject: [Doc-SIG] Issues with 2.1 doc PDF files In-Reply-To: <200104252326.SAA15953@cj20424-a.reston1.va.home.com> References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <15079.12090.738565.710150@cj42289-a.reston1.va.home.com> <15079.16600.483240.541905@cj42289-a.reston1.va.home.com> <200104252326.SAA15953@cj20424-a.reston1.va.home.com> Message-ID: <15079.20551.752990.597538@cj42289-a.reston1.va.home.com> Guido van Rossum writes: > I would surmise that for almost everybody, HTML wins big over PDF One reason for asking questions like these is to determine how useful what we surmise is, compared with other peoples' expectations. I have a pretty good idea what you & I surmise on this topic, but that's different from knowing what others are looking for. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mwh21@cam.ac.uk Wed Apr 25 23:54:31 2001 From: mwh21@cam.ac.uk (Michael Hudson) Date: 25 Apr 2001 23:54:31 +0100 Subject: [Doc-SIG] Issues with 2.1 doc PDF files In-Reply-To: Guido van Rossum's message of "Wed, 25 Apr 2001 18:26:56 -0500" References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <15079.12090.738565.710150@cj42289-a.reston1.va.home.com> <15079.16600.483240.541905@cj42289-a.reston1.va.home.com> <200104252326.SAA15953@cj20424-a.reston1.va.home.com> Message-ID: Guido van Rossum writes: > I would surmise that for almost everybody, HTML wins big over PDF > onscreen, and PDF wins big over HTML for printing. Soon, I bet we > won't have to distribute PostScript any more, because everyone can use > PDF for printing. Well, if I were to print out the python docs anytime soon (I'm not) I'd definitely reach for the postscript. OTOH, I'd also probably build it locally. But as long as it's not much burden to have both, it's not that much of an issue. Cheers, M. -- I have a feeling that any simple problem can be made arbitrarily difficult by imposing a suitably heavy administrative process around the development. -- Joe Armstrong, comp.lang.functional From guido@digicool.com Thu Apr 26 01:13:06 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 25 Apr 2001 19:13:06 -0500 Subject: [Doc-SIG] Issues with 2.1 doc PDF files In-Reply-To: Your message of "25 Apr 2001 23:54:31 +0100." References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <15079.12090.738565.710150@cj42289-a.reston1.va.home.com> <15079.16600.483240.541905@cj42289-a.reston1.va.home.com> <200104252326.SAA15953@cj20424-a.reston1.va.home.com> Message-ID: <200104260013.TAA16112@cj20424-a.reston1.va.home.com> [me] > > I would surmise that for almost everybody, HTML wins big over PDF > > onscreen, and PDF wins big over HTML for printing. Soon, I bet we > > won't have to distribute PostScript any more, because everyone can use > > PDF for printing. [MH] > Well, if I were to print out the python docs anytime soon (I'm not) > I'd definitely reach for the postscript. Yeah, but you're lucky to have a PS capable printer. Heck, you're probably on Linux. Most Windows and even many Mac users don't! PDF can be printed from anywhere. Even if your printer talks PostScript, on Windows it's a pain to figure out how to print a PS file! AcroRead does it for you with PDF. > OTOH, I'd also probably build it locally. Lucky you. This may be news for you, but most Python users don't know how to use those tools any more, even if they have access. Python is a success -- meaning it has lots of unsophisticated users! (Unsophisticated in their hacking abilities, not in their intelligence, for sure -- but people who don't want to waste time figuring out how to do something that the computer should be able to do without their help.) > But as long as it's not much burden to have both, > it's not that much of an issue. Agreed. --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh21@cam.ac.uk Thu Apr 26 00:31:45 2001 From: mwh21@cam.ac.uk (Michael Hudson) Date: 26 Apr 2001 00:31:45 +0100 Subject: [Doc-SIG] Issues with 2.1 doc PDF files In-Reply-To: Guido van Rossum's message of "Wed, 25 Apr 2001 19:13:06 -0500" References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <15079.12090.738565.710150@cj42289-a.reston1.va.home.com> <15079.16600.483240.541905@cj42289-a.reston1.va.home.com> <200104252326.SAA15953@cj20424-a.reston1.va.home.com> <200104260013.TAA16112@cj20424-a.reston1.va.home.com> Message-ID: Guido van Rossum writes: > [me] > > > I would surmise that for almost everybody, HTML wins big over PDF > > > onscreen, and PDF wins big over HTML for printing. Soon, I bet we > > > won't have to distribute PostScript any more, because everyone can use > > > PDF for printing. > > [MH] > > Well, if I were to print out the python docs anytime soon (I'm not) > > I'd definitely reach for the postscript. > > Yeah, but you're lucky to have a PS capable printer. Heck, you're > probably on Linux. Most Windows and even many Mac users don't! PDF > can be printed from anywhere. Even if your printer talks PostScript, > on Windows it's a pain to figure out how to print a PS file! AcroRead > does it for you with PDF. Oh, I know, I know. But your comment I was replying to said "Soon, I bet we won't have to distribute PostScript any more...". > > OTOH, I'd also probably build it locally. > > Lucky you. This may be news for you, but most Python users don't know > how to use those tools any more, even if they have access. Wot, typing "make ps"? I'm aware I'm hardly a typical Python user. Cheers, M. -- Those who have deviant punctuation desires should take care of their own perverted needs. -- Erik Naggum, comp.lang.lisp From tim.one@home.com Thu Apr 26 00:43:20 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 25 Apr 2001 19:43:20 -0400 Subject: [Doc-SIG] Issues with 2.1 doc PDF files In-Reply-To: <200104252326.SAA15953@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > ... > But for on-screen viewing, the pagination of PDF quickly gets > annoying. This is quite independent from the content, and applies > to any kind of documentation, not just Python's. I confess PDF grows on me over time, and especially since I figured out how to tell Acrobat Reader to view stuff in "continuous mode" (== the page boundaries are still there, but scrolling pays no attention to them). The one great advatage of PDF over HTML is whole-document searching, although (like it or not ) Microsoft .chm format adds a form of that to HTML-based docs too. From gherman@darwin.in-berlin.de Thu Apr 26 09:19:34 2001 From: gherman@darwin.in-berlin.de (Dinu Gherman) Date: Thu, 26 Apr 2001 10:19:34 +0200 Subject: [Doc-SIG] Issues with 2.1 doc PDF files References: <3AE67E91.2DADD89D@darwin.in-berlin.de> <15079.12090.738565.710150@cj42289-a.reston1.va.home.com> Message-ID: <3AE7DA16.8387BB44@darwin.in-berlin.de> Hi Fred, ok, I did some investigation into that topic, basically some comparisons of PDF files across different realeases. The pre- liminary result is, maybe, quite interesting. "Fred L. Drake, Jr." wrote: > > Dinu Gherman writes: > > > > - file sizes are much bigger for 2.1 (50-100%) > > Yeah, I thought they looked a little large, but wasn't sure why. > I've also noticed (and this isn't new) that the A4 versions are quite > a bit larger than the US-Letter versions: about 50% for the PDF, not > so much for the PostScript. I have no idea why this would be the > case. I presume you were looking at the A4 version. I think there shouldn't be any reason why this has to be like that! In fact, if you compare a file like ref.pdf for 2.0 and 2.1 by listing the used fonts in Acrobat Reader via the menu File -> Document Info -> Fonts you'll see that in the A4 ver- sions the 2.1 file lists Helvetica, Helvetica-Oblique and Times-Roman. But the 2.0 version lists a whole bunch of CM* fonts which is TeX's ancient Computer Modern family. You can actually *see* the difference if you sufficiently magnify, say, the document title on the front page and the version number below it. For me the reason why this 2.1 A4 version of ref.pdf is about twice the size of the corresponding 2.0 file is that these CM fonts are embedded in the PDF! The reason why this effect cannot be observed for the corres- ponding PDF letter verions of the same document is that they both do not use embedded CM fonts! I haven't verified this for all other documents, but it seems like a good reason to me explaining the general difference in size between A4 and letter PDFs. Now, *why* the A4 files do contain embedded fonts is an entirely different question! ;-) > > - fonts are approximated with pixelized bitmaps > > That's not good. This is also an entirely different issue, as the tut.pdf in A4 for 2.1 doesn't contain any normal fonts at all, but only bitmaps as you can also find out doing the same research in Acrobat Reader! > > As a result you get: > > > > - much longer page building times in PDF readers > > - considerable longer search times > > - much longer print times (probably - haven't checked) > > > > I've not verified this for each PDF file (it's prominently > > obvious for the tutorial, though), but I assume the same > > effects can be observed for all of them, when comparing with > > corresponding files from the previous release. > > This would be something to look at -- recall that we added some > magic to the tutorial to control interpretation of the document > encoding, so that some Latin-1 characters would be typeset correctly. > (This was at your prodding, as I recall! ;-) Could you please look > at at least one of the other documents to see if they exhibit the same > symptoms? Apparently, the only line you added to do this is this one: \usepackage[T1]{fontenc} which should do the job. I also use this: \usepackage[latin1]{inputenc} but I just found out I can do without. In general I use this line when running pdf(La)TeX over the sources: \usepackage[pdftex, plainpages=false, colorlinks=true, bookmarks=true, bookmarksnumbered=true, linkcolor=blue]{hyperref} I haven't seen anything equivalent in the official LaTeX sources, though, so I'm not quite sure how these are build... > > So, I'm really curious what the reason for this phenomenon > > could be? If there isn't any, I suggest reproducing the > > files to no longer show the described effects as they will > > If you can tell me how to control these things, I'm sure we can > build another distribution. I have no idea how to control this -- > this goes deeper into the LaTeX/pdfLaTeX magic than I'm familiar > with. Well, I can tell you what I do to create these documents on my box. I'm using vanilla MiKTeX 2 on Win2K, which gives me the following version number for pdftex: C:\>pdftex This is pdfTeX, Version 3.14159-14f-released-20000525 (MiKTeX 2) ** Using this I get PDFs without any pixelized or mebedded fonts, and without applying any additonal magic. For all the offical PDF documentation files I've checked I get via File -> Document Info -> General this: pdfTeX-0.13d. Those that I'm producing myself say pdfTeX-0.14f. This might be a reason and it might be not - I don't know, probably not. In any case something strange was going on when the A4 PDFs were produced (leading to embedded CM fonts) and something very strange happened for the 2.1 A4 PDF tutorial (bitmaps throughout). BTW, the former (including file size differen- ces) can be also observed on the official PS files. This is about all I can do in a reasonable time frame to pro- vide a good starting point for further research. I'm not con- sidering myself a TeX guru or something like that, so I'll need to pass this on to somebody else here... Regards, Dinu From hernan@orgmf.com.ar Fri Apr 27 10:32:58 2001 From: hernan@orgmf.com.ar (Hernan Martinez Foffani) Date: Fri, 27 Apr 2001 11:32:58 +0200 Subject: [Doc-SIG] Issues with 2.1 doc PDF files In-Reply-To: Message-ID: de Tim Peters > .... The one great advatage of PDF over HTML is whole-document > searching, although > (like it or not ) Microsoft .chm format adds a form of that to > HTML-based docs too. That's why I built them. The other great advantage of PDF is that it is portable. It's a pity that the Microsoft .chm format isn't portable even to the platforms that can run their browser. (Or at least, that's what I've been told.) I tried JavaHelp and it is too slow for practical use on big packages. May be in the near future Mozilla can drive a suitable online Help engine. -H.