One useful thing we can accomplish, as others have pointed out, is to come up with clear and simple rules for referring to other objects in documentation. To start things off, let me try proposing a set of lookup rules for people to evaluate and shoot down. I list the rules to be applied in order of decreasing priority. When a rule applies but the referent object does not exist, we proceed to test any lower-priority rules that apply. Note that simple bare words are never interpreted as references. To refer to something, an identifier must be capitalized and match a class name, or be followed with an open-parenthesis and match a function or method name, or be preceded by "self.". 1. Text: "self." <identifier> "(" In: class or method docstring Example: "self.foo(" mentioned in class "Zot" a. Refers to: method Zot.foo() b. Refers to: method foo() inherited by Zot 2. Text: "self." <identifier> In: class or method docstring Example: "self.foo" mentioned in class "Zot" a. Refers to: attribute "foo" of Zot instances 3. Text: [ <identifier> "." ]+ <capitalized-identifier> In: any docstring Example: "pkg.bar.Zot" a. Refers to: class pkg.bar.Zot 4. Text: [ <identifier> "." ]+ <identifier> "(" In: any docstring Example: "pkg.bar.foo(" a. Refers to: function pkg.bar.foo() 5. Text: <capitalized-identifier not preceded by "."> In: any docstring Example: "Zot" mentioned in module "bar" a. Refers to: class bar.Zot 6. Text: <identifier not preceded by "."> "(" In: module or function docstring Example: "foo(" mentioned in module "bar" a. Refers to: function bar.foo() 7. Text: <identifier not preceded by "."> "(" In: class or method docstring Example: "foo(" mentioned in class "Zot" in module "bar" a. Refers to: method Zot.foo() b. Refers to: method foo() inherited by Zot c. Refers to: function bar.foo() I have attempted to make this set of rules complete while having the "obvious" behaviour. Is the suggested behaviour sufficiently obvious? I am aiming to eliminate the possibility that these rules ever fail in an unexpected way. #7 may be a bit much, but i would like to hear your opinions. I think i am comfortable with #7a,b,c. Have a look at each rule and see if you can imagine a case where it will do the wrong thing. I hope to be able to claim that each rule "just has to be right". Thanks for your time and input! -- ?!ng
Hi! Ka-Ping Yee schrieb:
One useful thing we can accomplish, as others have pointed out, is to come up with clear and simple rules for referring to other objects in documentation.
Agreed. [...]
Note that simple bare words are never interpreted as references.
Why not? Of course this may go wrong, if you use very common words like 'a', 'is', 'so' ... as identifiers. But I don't believe, that this will happen too much. [...seven rules deleted...] your rules sound a little bit complicated. I think we should model the reference rules after the python namespace visibility rules, which apply to all identifiers in the python object to document, with the exceptional handling of 'self.' within the doc-strings of class-objects: When processing the doc-string of class 'Zot' simply imagine a virtual object instance of class 'Zot' named 'self'. This behaviour seems "obvious" to me. The outcome may not be so very different from what you described in your ruleset. Regards, Peter -- Peter Funk, Oldenburger Str.86, 27777 Ganderkesee, Tel: 04222 9502 70, Fax: -60
Note that simple bare words are never interpreted as references.
Why not? Of course this may go wrong, if you use very common words like 'a', 'is', 'so' ... as identifiers. But I don't believe, that this will happen too much.
umm ... I believe the reverse. * A string-processing function with an argument which decides whether to capitalise the string is almost certain to use the verb capitalise (in its Natural Language sense) in the course of its account of the argument which it will, naturally, call `capitalise'. * Many I/O routines use buffers whose size the caller can control via a parameter, usually called buffer; an account of whose meaning will routinely involve refering (in the NL sense) to the buffer whose size is controlled by the argument. * ... It is *in general* Good Practice to name parameters (especially those which the caller is expected to give in name=value form) using plain words which fit close to the NL words relating to the purpose of the parameter. Any account of the purpose of the parameter is, consequently, liable to use the NL sense of the word as well as using the word to refer to the parameter. IMO, this forces us to have some equivalent of the *emphasis* markup. As to *what* should serve in this role: well, it's going to enclose things which are to be read as lumps of python code in the doc string, within which any identifiers are to be hyperlinked to the definition of the object referenced by that identifier; so we can't go using any character which can appear in an inline-in-the-doc code fragment (e.g. regarding 'quoted text' as code fragment is unacceptable, because my code fragment may wish to include a 'quoted string'). I don't think $, @ and ! are good candidates, but they could work; however, how do folk feel about # as a marker ? Since we're in a doc-string, it doesn't have any special meaning; nor do I think it sensible to include an end-of-line-comment in an inline-in-the-doc code fragment (as opposed to a code fragment displayed using a Code: block or
). Thus #any.valid(python + ' code') is guaranteed(to, work)#.
[Ka Ping's] rules sound a little bit complicated.
Then we get: Arguments: file ! string -- the name of the file to be opened. mode ! string -- the file access mode string ... If #mode#'s first letter is #'r'#, #file# must exist. buffer ! int or None [None] -- size of the I/O buffer to use. A size #< 0# indicates that no buffering should be done (raw I/O). If #buffer# is given as #None#, a `sensible' buffer size will be chosen. and only treat words in the description (after --) as special if explicitly marked as such. Of course, since this is the Arguments: block, treating the key (before !) as special comes naturally without needing to mark it using #. Exceptions: IOError -- raised if #file# does not exist, #mode# is unrecognised or if #buffer#'s value is inappropriate for the given #file# and #mode#. This fits the `easy to type' requirement; how do folk feel about `easy to read' ? Note that I, at least, find it has one bonus under `easy to read'; I know when a word is used in its NL sense and when it is used to refer to something in the python code; that makes it easier to understand what I read. As to creating cross-references: identifiers appearing inside #...# or in the body of a Code: block can be sought using the namespace lookup machinery python would use from within the body of the thing whose doc-string we're parsing; where the lookup finds something with a doc-string, it is natural to generate an href to that doc-string (or, rather, what it will become when parsed). This should only need to happen for things marked as code. Does this fit the requirement for clear and simple rules ? Summary: * Within text (not within Code: &c. blocks), # is used as delimiter for (start and end of) in-text code fragments. * Code:, Example: and >>> blocks and in-text code fragments are parsed; identifiers within them are looked up as if by the interpreter when executing the suite which the doc-string begins; if these lookups yield something to which the generated docs can make an HREF, they do so. Note that, * in #open('peter', 'r')#, peter does not appear as an identifier (it appears as a string) so no attempt is made to look peter up in any namespaces. * use of #code(fragments)# is viable within comments, either in real python code (outside the doc-string) or in Code: &c. blocks. Given the latter, it would make sense for the parse-and-href idiom to ignore end-of-line comments in Code: &c. blocks, but to recognise code fragments in such comments and subject these to parse-and-href. they are amenable to some simplification, though. Eddy. -- Actually, starting /* and ending */ would do fine ... the funny thing is, in C, that */ would be the obvious way to *start* comments, since it cannot appear in valid code. As ever, K&R didn't do the obvious.
Tony J Ibbs wrote:
that sure sounds to me like it could be a "simple bare word" - if I have a class called London that handles data relating to the city of London, then I'll certainly use the noun and the class name in the same text.
[...]
The company I work for have a customer who are the Ordnance Survey for Great Britain (as opposed to the Ordnance Survey for Northern Ireland, who are *not* the same people). This is commonly abbreviated OS(GB) (strangely
Edward Welbourne wrote:
umm ... I believe the reverse. * A string-processing function with an argument which decides whether to capitalise the string is almost certain to use the verb capitalise (in its Natural Language sense) in the course of its account of the argument which it will, naturally, call `capitalise'.
So here are three examples of what Tony and Edward might consider "unintended" references. Let me ask a radical question: if you had a class named London in your module, and you happened to mention the city of London in your docstring somewhere -- what would be so wrong with linking that mention to the class named London? Or if you have an argument named "capitalise" and somewhere in the documentation you use the word "capitalise" -- is it really a problem that that word is interpreted as referring to the argument? Surely if you "just happen" to use exactly the name of a class in your module documentation somewhere, it's somehow related... ? -- ?!ng
Ka-Ping Yee wrote, on 08 February 2000 21:45:
Let me ask a radical question: if you had a class named London in your module, and you happened to mention the city of London in your docstring somewhere -- what would be so wrong with linking that mention to the class named London?
Or if you have an argument named "capitalise" and somewhere in the documentation you use the word "capitalise" -- is it really a problem that that word is interpreted as referring to the argument?
Surely if you "just happen" to use exactly the name of a class in your module documentation somewhere, it's somehow related... ?
My immediate (and thus emotional) response to that is "NO!". This seems to be for two reasons: 1. I don't want spurious extra references that I don't intend (this is important to me, but I can see from the above you might not care). 2. It's quite possible to use common words in ways which are *not* applicable for the cross-reference. Hmm - thinks a bit... OK, here we go with an example: def BLOCKS(text,delimiter): """Given text and a paragraph delimiter, return paragraph BLOCKS. burble burble burble on a flowline model burble. burble think of it as pieces of text flowing along a pipeline burble if something goes wrong such that the pipeline BLOCKS up burble burble. """ Not a *very* contrived example. I've "highlighted" the word "blocks" as best I can in email. Note that the second usage of the word in the doc string is *not* a candidate for cross referencing. Given the nature of the english language (I can't speak for others) such multiple binding of meaning for a single word is common. trying-to-give-constructive-criticism,honest-guv Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.demon.co.uk/ 'Tim happens. Get used to it'. (David Ascher, on the Doc-SIG) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)
"Tony J Ibbs (Tibs)" wrote:
...
Not a *very* contrived example. I've "highlighted" the word "blocks" as best I can in email. Note that the second usage of the word in the doc string is *not* a candidate for cross referencing. Given the nature of the english language (I can't speak for others) such multiple binding of meaning for a single word is common.
Okay, but what is the real cost of the mis-identification? -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself The great era of mathematical physics is now over; the 300-year effort to represent the material world in mathematical terms has exhausted itself. The understanding it was to provide is infinitely closer than it was when Isaac Newton wrote in the late seventeenth centruy, but it is still infinitely far away. - The Advent of the Algorithm (pending), by David Berlinski
Paul Prescod wrote, on 09 February 2000 16:34:
"Tony J Ibbs (Tibs)" wrote:
Note that the second usage of the word in the doc string is *not* a candidate for cross referencing. Given the nature of the english language (I can't speak for others) such multiple binding of meaning for a single word is common.
Okay, but what is the real cost of the mis-identification?
Erm - it's wrong?[1] But seriously, if I'm reading a document and come across a cross reference, is it *really* too much to ask that it be relevant? I know I'm a pedant (well, of some sort), but this is text that's meant to be *helpful* to people, and if it consistently contains (or may contain) misleading cross references, then that engenders distrust of the text - one never knows whether it will be worth *following* a reference (I can hear it now - "drat, that *!@X&! has cocked up their referencing again - I thought I'd find something *useful* there!"). Also, it's pride - if I'm writing documentation in a doc string, then it's MY text, and I don't want it to be mucked up by an (otherwise) useful tool. Putting in *misleading* references would be such mucking up. As for misleadingness - there *are* words in the english language that can have entirely opposite, and certainly antagonistic, meanings in different contexts. If the misreference in such a circumstance, life could get truly confusing. Tibs [1] The pedant in me wants to leave it at that, on the grounds that the rest is trivially derived from this, but since Paul asked I guess it *does* need explanation... -- Tony J Ibbs (Tibs) http://www.tibsnjoan.demon.co.uk/ .. Haskell is the most Pythonic of all the languages that are entirely .. unlike Python <0.9 wink> (Tim Peters) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)
On Thu, 10 Feb 2000, Tony J Ibbs (Tibs) wrote:
Paul Prescod wrote, on 09 February 2000 16:34:
Okay, but what is the real cost of the mis-identification?
Erm - it's wrong?[1]
But seriously, if I'm reading a document and come across a cross reference, is it *really* too much to ask that it be relevant?
Mmm. Okay. Well, first of all, let's carefully define and reduce the issue. Here is the list of kinds of auto-referencing again: 1. dotted.references to classes or functions in other modules 2. dotted.references to class methods or attributes 3. references() to class methods or functions 4. references to class names in the local module The question to ask, then, is: Of these cases, for which are there likely to exist situations where the cross-reference is misleading? So, first, i posit that #1, #2, and #3 are unambiguous enough not to bother anyone. That is, i hypothesize that no one would write <identifier>.<identifier> that happens to match the name of a module member or class method and *not* be referring to that thing. Does anyone have a problem with that much? #4 might be considered a little dangerous; more dangerous is 5. references to argument names in function and method docstrings since these are likely to be just ordinary unmarked words. It's #4 and #5 that concern you, right? Well, there is a spectrum of comfort zones for this kind of automatic interpretation. I think i have placed these categories in order, from #1 which i think is very solid, to #5 which i think is the most shaky. And #5 i would agree is approaching the limit of my comfort level. I didn't think that #4 would be too unreasonable to propose. Does it make you uncomfortable? (survey for all, not just Tony)
references, then that engenders distrust of the text - one never knows whether it will be worth *following* a reference (I can hear it now - "drat, that *!@X&! has cocked up their referencing again - I thought I'd find something *useful* there!").
Right, well at least i don't think there ought to be a situation where the reader doesn't know whether it will be worth following a reference -- because i think it is a basic requirement that it always be transparently obvious what the reference will lead to. And i think that for #1 through #4 above this is true. (Even in cases where the introduced reference is spurious, it would still be quite obvious where the hyperlink will take you.)
Also, it's pride - if I'm writing documentation in a doc string, then it's MY text, and I don't want it to be mucked up by an (otherwise) useful tool. Putting in *misleading* references would be such mucking up.
Yup. This is an entirely justified reason even if it may sound emotionally driven at first blush. I don't want anyone messing up *my* text either. :)
[1] The pedant in me wants to leave it at that, on the grounds that the rest is trivially derived from this, but since Paul asked I guess it *does* need explanation...
I'm glad that you broke it down into specific reasons. Thanks. -- ?!ng "Je n'aime pas les stupides gar�ons, m�me quand ils sont intelligents." -- Roople Unia
Ka-Ping Yee reiterated a list of possible things to autodetect:
1. dotted.references to classes or functions in other modules 2. dotted.references to class methods or attributes 3. references() to class methods or functions 4. references to class names in the local module 5. references to argument names in function and method docstrings
The question to ask, then, is: Of these cases, for which are there likely to exist situations where the cross-reference is misleading?
He then guesses my answers, but I'm going to ignore that... Hmm. There are two answers: T1. In "marked up" docstrings, none of the above should be autodetected. I'll get into why below, but basically mark it down as "I'm paranoid" for the moment. T2. In docstrings with no markup, I think you should be able to do what you like, and although 3..5 may be risky, the benefit probably outweighs the problem (i.e., I'll put up with spurious references IN THIS CASE so I can get the correct ones as well). It might be polite to put a note at the top of the page saying that the markup is autogenerated, though, so that people don't blame the original author of the doc string for any mistakes (that's politeness to those of us with ego!). Expanding on T1 above. As you posit, 5 is generally dangerous. I hope I demonstrated (elsewhere, "London") that 4 is equally dangerous, and I reckon I can come up with other cases (than "OS(GB)") why 3 is dangerous too. But note that 1 and 2 are dangerous as well - english allows acronyms with "." as well as without (so NASA and BBC don't have dots, but N.B. does, and there are better examples which unfortunately I can't call to mind - try looking up a good style guide to publishing and I'm sure it will have examples). My *general* point is that, given the nature of the english language, it is *impossible* to *guarantee* that you will get it right, however complex the rules you produce. Now, for case T2 that's OK - we're trying to generate something from nothing, and I personally am willing to accept mistakes. But for T1 you're messing with someone's (hopefully) lovingly crafted text, and I refer you back to my previous points (and Eddy's too). [In fact, given that this is *programming* we're documenting, it is almost certain that someone will want to do *very* odd things in documentation that we haven't and probably *can't* think of, which render non-markup schemes untenable if one wants a decent result.] Hmm - and it's just occurred to me that if a programmer is marking up their text, they might also have a *reason* for suppressing a cross reference - one example is if they're talking about how one *might* have implemented fred.spam (definitely don't want a reference here) rather than how they actually *have* implemented #fred.spam# (to use Eddy's annotation, which I still hate). Summary: if it's autogenerated from no markup, the more guessing one can do the better, and your rules are well thought out, but if it's written by an author with intentional markup, then that markup is *all* one is *allowed* to infer. Tibs (And yes, I'm bringing in experience as a reader and writer of english, of being a pedant, of being a standards writer and user (very different experiences!), and also of producing a fanzine/magazine (using TeX) containing contributions from English and American speakers, where I needed to keep the original voice, spelling, etc., intact - how's that for spurious appeals to authority!) (oh - and I'm a programmer as well) -- Tony J Ibbs (Tibs) http://www.tibsnjoan.demon.co.uk/ Feet first with 5 wheels... (although not enough in the past few weeks) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)
Hi! Tony J Ibbs (Tibs):
Ka-Ping Yee reiterated a list of possible things to autodetect: [...] Hmm. There are two answers:
T1. In "marked up" docstrings, none of the above should be autodetected. I'll get into why below, but basically mark it down as "I'm paranoid" for the moment.
Question: What is difference between a "marked up" docstring and a doc string, which is not "marked up"? This must be easy to detect. Regards, Peter
Peter Funk wrote, on 10 February 2000 12:55:
Question: What is difference between a "marked up" docstring and a doc string, which is not "marked up"? This must be easy to detect.
"must"? hmm... Answer 1: A marked up doc string is one which contains markup. This is the easy answer, but see: Answer 2: You can't tell, because you can't distinguish a doc string that *required* no markup (or so the writer decided) from one which (for instance) predates the markup scheme. In fact, I don't think it's a problem in practise. Some person decides to run the appropriate tool over a particular Python file, and I think it is reasonable to assume (for our purposes) that either all of a file (maybe even a module) shall be marked up, or none of it shall be (note the careful ISO-speak there). In that case, the person can make the decision. If Ka-Ping Yee wants to be *really* nice (and I don't regard this as a requirement!) then he could have his script warn the user that there actually *is* something that looks like markup in the text, and even (maybe) attempt to use it - but that's a whole other discussion about trade-offs (e.g., what if it *looked* like our markup but wasn't?, and should the user be able to aver "ignore any characters that look like markup"?). Etc. Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.demon.co.uk/ .. "equal" really means "in some sense the same, but maybe not .. the sense you were hoping for", or, more succinctly, "is .. confused with". (Gordon McMillan, Python list, Apr 1998) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)
Ka-Ping Yee can be thought to have categorised, in part:
1. dotted.references to classes or functions in other modules 2. dotted.references to class methods or attributes
Strangely enough, cases 1 and 2 would be the places I would consider using a "don't cross reference this" markup. But given the problems of producing a markup for the case where we *do* want to markup items (pace Eddy and me arguing about "#"), I doubt we'll ever be able to agree on the idea of having "anti-markup" (gosh, there's a concept - obviously it should be used extensively in designing the documentation scheme for INTERCAL). Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.demon.co.uk/ 2 wheels good + 2 wheels good = 4 wheels good? 3 wheels good + 2 wheels good = 5 wheels better? My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)
Hi! [Tony J Ibbs (Tibs)]:
Strangely enough, cases 1 and 2 would be the places I would consider using a "don't cross reference this" markup. But given the problems of producing a markup for the case where we *do* want to markup items (pace Eddy and me arguing about "#"), I doubt we'll ever be able to agree on the idea of having "anti-markup" (gosh, there's a concept - obviously it should be used extensively in designing the documentation scheme for INTERCAL).
Several possibilities to markup (tag) identifiers in doc strings: 1. #identifier# (recent proposal here, looks ugly to me) 2. [ident identifier] (nobody here but M.Z. seems to like this kind of markup) 3. *identifier* (also used for emphsizing in StructuredText, used by the package Python Mega Widget) 4. 'identifier' (used in the gendoc sources) 5. "identifier" (I have seen this occasionally, misleading) 6. `identifier' (TeX look-alike quoting) 7. ^identifier (my personal idea, while thinking about alternatives since identifiers don't contain white space, a single markup character in front of the identifier should be enough) Are there other possibilities, which I may have missed? If not, we have simply to choose one. Regards, Peter -- Peter Funk, Oldenburger Str.86, 27777 Ganderkesee, Tel: 04222 9502 70, Fax: -60
Peter Funk wrote, on 10 February 2000 13:14
Several possibilities to markup (tag) identifiers in doc strings:
1. #identifier# 2. [ident identifier] 3. *identifier* 4. 'identifier' 5. "identifier" 6. `identifier' 7. ^identifier
I've omitted the comments. 1 is Eddy's idea - I've commented elsewhere - it works but I don't like the look either (but it might still be the best solution). 2 is too verbose (where one of the sister threads to this started!). The simpler [identifier] variant was proposed in the earlier round of talks, and is my favourite, but suffers from the "Eddy objection" - that "[" and "]" are too valuable/used too much in Python to be reserved for this purpose (I *think* that's the problem). 3 can't be used because it already is (emphasis) - firstly someone might want to emphasise a cross reference (for example, me), and secondly you can't actually *distinguish* a cross reference from plain text automatically anyway (see the other thread) 4 and 5 both use quotes, which must suffer the "Eddy objection" (see 2), but also I don't think you can force people to choose quote styles in doc strings. 6 won't fly because "`" (that first quote char) is not in ISO 646 (or ASCII if you prefer) and I don't think it's a good idea to *require* non-646 text in doc strings (also, it's paired with "'", which is a valid character for other purposes, so that doesn't work well - yu'd get confusion over what the "'" was doing, I think). 7 would need a terminator, and can suffer the "Eddy problem". Personally I like [reference], but I'm comfortably expecting this to be solved by someone else in the implementation we eventually see (which I'm not writing, you note) and I'll use their choice... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.demon.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)
I hope I demonstrated (elsewhere, "London") that 4 is equally dangerous
The first word of each sentence is capitalised. This can lead to further mis-fires of rule 4.
english allows acronyms with "." as well ... N.B. I.mech.eng., Ph.D., ... which all end in a dangling ., which might make them recognisable, except that legit name.attr may appear at end of sentence.
Further (c.f. why I don't advocate @ for the in-text code delimiter), text may legitimately contain domain names ... which we *do* want to make subjects of hyperlinks, but the href is invented differently than if it were a dotted python identifier ... indeed, chaos.org.uk wants to be linked as http://www.chaos.org.uk/ while eddy@chaos.org.uk, naturally, should be either a mailto or (which I'd prefer, but others might not) http://www.chaos.org.uk/~eddy/.
... #fred.spam# (to use Eddy's annotation, which I still hate) Can someone come up with a viable alternative that's less ugly ? Will North Americans have problems with using # in text as the number indicator ? [what is standard N.A. usage of # ?]
Would ` . ' (a dot surrounded by space) be acceptable as a delimiter for code embedded in text ? (I don't like it, but I guess it could work ...)
Summary: ... no markup ... guess..., but ... with ... markup is *all*
So, back to my earlier question: Ping, how much palaver would it take to have two ways of processing a doc-string, for the phase which decides what's code (and possibly hyperlink) and what's not: marked up -- if the doc-string appears to be using #...# for in-text code fragments, take those fragments as code (no guessing) unmarked -- if the doc-string doesn't use #...#, make Ping's educated guesses at what is code and what isn't then generate hrefs from identifiers in the code fragments thus identified ? Other hrefs may be generated other ways -- e.g. to URLs in the text -- but hrefs to python objects only get auto-generated if in a code fragment, however it has been recognised. Eddy.
Edward Welbourne wrote (I'm giving up including the date, since Outlook <fs:spit> ain't helping me and I'm not sure anyone cares):
Would ` . ' (a dot surrounded by space) be acceptable as a delimiter for code embedded in text ? (I don't like it, but I guess it could work ...)
No. (do I need explanations? do I *have* explanations? well, *I* wouldn't be able to visually parse it - I prefer Eddy's "#"!)
So, back to my earlier question: Ping, how much palaver would it take to have two ways of processing a doc-string, for the phase which decides what's code (and possibly hyperlink) and what's not:
marked up -- if the doc-string appears to be using #...# for in-text code fragments, take those fragments as code (no guessing)
unmarked -- if the doc-string doesn't use #...#, make Ping's educated guesses at what is code and what isn't
then generate hrefs from identifiers in the code fragments thus identified ? Other hrefs may be generated other ways -- e.g. to URLs in the text -- but hrefs to python objects only get auto-generated if in a code fragment, however it has been recognised.
As I've said in a reply to Peter Funk elsewhere, I think you *can't* guess if a doc string is not marked up because it (deliberately) wasn't, or because it (for instance) predates markup. So I think the choices have to be: i. doc string is clearly marked up (I'll concede detecting that for practical purposes, for some value of "clearly"), and there are #...# (or whatever). ii. doc string is clearly marked up, but there are no #...#? How do you tell if they just didn't *want* any cross references? iii. doc string clearly isn't marked up (for values of "clearly" as defined above, or perhaps their inverse!). In case i you can obviously use the cross references, and must not generate new ones. In case ii, I think there should be an option to the processor (e.g., -force_xref) which forces generation of cross references if the user believes they have "obviously" been omitted. But it needs a human to tell. In case iii, I'd prefer an option to tell the processor about the (probably infrequent) cases where the absence of markup was deliberate (e.g., -absent_xref). But in practice, I might have to live with the markup getting generated regardless - so could I have a pragma to say "this doc string contains no markup" please? (ouch) Tibs, getting tangled in conditional clauses -- Tony J Ibbs (Tibs) http://www.tibsnjoan.demon.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)
Okay, but what is the real cost of the mis-identification?
Clutter. A document, which should have five words picked out neatly as hrefs to significant text elsewhere, has twenty hrefs, almost all the extras being spurious. Now look at the result and tell me which five were the pertinent hrefs ... ? Eddy.
Talking with Eddy yesterday, he pointed out something that I hadn't thought about, which is the general usefulness of Ka-Ping Yee's "documentation from un-annotated doc strings" tool when one actually *wants* to get documentation from the vast number of doc strings which don't have annotation. I think that's a tool we *need*, and it looks like Ka-Ping Yee's approach is excellent for that purpose. So please don't take my grumbles about not wanting to go that route as saying I don't think the tool is very clever, and (I now realise) also very useful. I also rather like the appearance of the web page produced! (although I haven't read the HTML). Oh - and has anyone else said how nice it is to see Ka-Ping back on the Python lists again? Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.demon.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)
On Thu, 10 Feb 2000, Tony J Ibbs (Tibs) wrote:
I also rather like the appearance of the web page produced! (although I haven't read the HTML).
There is one particular thing that makes the HTML rather ugly: all the spaces in the docstrings are replaced with " "s (non-breaking spaces). This is to prevent word-wrapping so that any formatting in the docstrings (like that diagram at the top of SocketServer.py) is preserved.
Oh - and has anyone else said how nice it is to see Ka-Ping back on the Python lists again?
Wow, Tony, that's so nice of you! I have to say i'm really glad to be back too. -- ?!ng "Je n'aime pas les stupides gar�ons, m�me quand ils sont intelligents." -- Roople Unia
Looking at how one would delimit references to code items, Edward Welbourne wrote, on 08 February 2000 16:00:
I don't think $, @ and ! are good candidates, but they could work; however, how do folk feel about # as a marker ? Since we're in a doc-string, it doesn't have any special meaning; nor do I think it sensible to include an end-of-line-comment in an inline-in-the-doc code fragment (as opposed to a code fragment displayed using a Code: block or
). Thus #any.valid(python + ' code') is guaranteed(to, work)#.
I dislike "#" for two reasons: 1. I dislike it (i.e., it looks nasty to me). I find it visually distracting to try to parse the Python comment character as something else, in a Python file, and anyway, it just looks wrong (damn - that last isn't a very convincing bit of the argument). Eddy and I have been known to disagree about such things, of course. 2. I *do* use comments in doc strings! I do! As a strong believer in comments, if I include code in a doc string, it's damn well likely to be commented! - not least because it's probably copied from some actual code, which probably has comments in it. (sorry - this gets me in a ticklish spot, as it feels to me akin to the tendency of people teaching programming languages to first say "comments are good" and then present skads of examples where they don't use any, normally with the excuse "but it's only short, and an example, so I haven't bothered. Been there, seen that.) Tibs # no, I wouldn't say I overcomment my code... -- Tony J Ibbs (Tibs) http://www.tibsnjoan.demon.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)
2. I *do* use comments in doc strings!
Not an issue. As I specified it, the use of #...# is for samples of *code when they appear in text* within a doc string. Thus, for example: One can call #open('/tmp/junk', 'w')# to obtain a scratch file or one can use: Code: i = 0 while 1: try: fd = open('/tmp/junk%d' % i, 'r') # ith template file except IOError: return open('/tmp/junk%d' % i, 'w') # nonexistent file fd.close() i = 1 + i The parser knows when it's in an embedded Code: block so regards # as starting a comment in one of those; but where it meets # in the text of a paragraph, it knows that it's the start of an in-text code fragment. Furthermore, in a Code: block but after the first # on a line, we can use #...# to delimit fragments of code appearing in a comment (just as we can use *emphasis* in comments without the * being interpreted as `multiply') and have the doc-string parser understand it as such. If you're putting comments *against the text* in a doc string, # thus, as opposed to *within an example block*, you're being gratuitously perverse and you deserve everything you get ;^> As to # not being pretty ... tough. Being unambiguous is important, being easy to spot is important, being easy to type is important, being pretty is what the down-stream tools support. Fontification and colouring might be able to help, in the mean time ... Eddy.
your rules sound a little bit complicated. I think we should model the reference rules after the python namespace visibility rules,
Actually, i did try to model the rules after namespace rules (note the rules with respect to dotted references to things in other modules).
This behaviour seems "obvious" to me. The outcome may not be so very different from what you described in your ruleset.
It's kind of too bad that it came out appearing complicated. What i tried to do there was to give an exact specification of an algorithm that would yield the kind of "simple" or "obvious" behaviour you expect. A more succinct description of the intended behaviour would be: 1. dotted.references to classes or functions in other modules 2. dotted.references to class methods or attributes 3. references() to class methods or functions 4. references to class names in the local module Unfortunately the attempt to be very precise about how to achieve this behaviour yielded a long-winded and pedantic description. I just wanted to provide a very clearly stated framework for us to pick and poke at. -- ?!ng <http://www.lfw.org/ping>
On 08 February 2000, Ka-Ping Yee said:
Note that simple bare words are never interpreted as references. To refer to something, an identifier must be capitalized and match a class name, or be followed with an open-parenthesis and match a function or method name, or be preceded by "self.".
Two things I'm uncomfortable with here: * implicit enforcement of case conventions (modules, attributes, and methods lower-case, classes in StudlyCaps). While I think case conventions are a good thing and I like to see them followed, I'm not sure that people who *don't* follow them should be effectively barred from writing "standard" docstrings. (But my bondage 'n discipline side is cheering, "Yeah! Go! Stick it to those non-conformist punks!") * I think I'd prefer to distinguish identifiers before the docstring processor should look at them. And I'm really not sure if raising the word "self" so high in the pantheon is a good idea; it *is* an excellent convention that everyone should follow (much more so than the case convention), but the *user* of a class doesn't refer to it as 'self' -- that's something you do only when you're inside the class. Eg. in a docstring I prefer to type and to read: Don't frob the 'foo' attribute; use 'set_foo()' instead. rather than Don't frob self.foo; use self.set_foo() instead. I think I agree with at least that much from StructuredText. Greg
<awkward> Ka-Ping Yee wrote, on 08 February 2000 11:51:
Note that simple bare words are never interpreted as references.
Urm - but...
To refer to something, an identifier must be capitalized and match a class name,
that sure sounds to me like it could be a "simple bare word" - if I have a class called London that handles data relating to the city of London, then I'll certainly use the noun and the class name in the same text.
or be followed with an open-parenthesis and match a function or method name,
The company I work for have a customer who are the Ordnance Survey for Great Britain (as opposed to the Ordnance Survey for Northern Ireland, who are *not* the same people). This is commonly abbreviated OS(GB) (strangely enough, OSNI is not OS(NI)). It is not inconceivable that code written to handle OS(GB) data might have a function somewhere called OS()... (and I would not be prepared NOT to write such code just to make the documentation processing easier!).
or be preceded by "self."
OK - I find it harder to come up with an awkward example for this one, *but I bet I could given enough time to think* (heh, I've been *paid* to be a pedant on occasion!). </awkward> The point? I think that we MUST require some delimitation for words/phrases that are to become links into the code, because *in practice* it is not possible to guess often enough (and I don't regard my awkward examples of the guessing failing as "odd" cases at all, thank you very much). The alternative suggestion (flag the awkward cases to say "don't guess wrong here") seem to me truly horrid. (I've used a documentation preprocessing system which, whilst dumber - it doesn't discriminate *nearly* as well as your algorithm - still illustrates the sheer frustration of typing perfectly normal english text and having odd words highlighted when one doesn't want them to be.) despite-this-the-output-of-your-algorithm-is-very-pretty,guv Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.demon.co.uk/ .. "equal" really means "in some sense the same, but maybe not .. the sense you were hoping for", or, more succinctly, "is .. confused with". (Gordon McMillan, Python list, Apr 1998) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)
participants (7)
-
David Ascher -
Edward Welbourne -
Greg Ward -
Ka-Ping Yee -
Paul Prescod -
pf@artcom-gmbh.de -
Tony J Ibbs (Tibs)