[Doc-SIG] URLs

Edward Welbourne Edward Welbourne <eddy@chaos.org.uk>
Fri, 23 Mar 2001 20:44:49 +0000 (GMT)

OK, lots of stuff here and I'm a bit lost so I'm going to think out loud
at you, so you have a good chance of spotting where my confusion
diverges from what you thought you were saying.  If I'm confused, how
confused are the lurkers ?

It makes sense to provide for a bibliographic definition mechanism for
defining short names for use in xrefs in terms of full URLs (ideally
with some form of commentary).  As I understand it, this is what the

>   there is a directive of the form:
>      ..[ref] url

bit is about.  How about providing for the url to be followed by
arbitrary text to be presented, in the `See also' or biblio section as a
description of the relevant xref ?  This then makes it possible, as
discussed, to use [ref] in the body of paragraphs as a link.  This is
the `in the style of bibliographic citation' idiom that I gather STNG
folk are wedded to, and I see every reason to honour their choice.

I don't understand the use of "some text":[ref] with the above reading
of [ref], since the citation idiom calls for [ref] to be the text that
appears in the output, so aren't you throwing away the "some text", so
what's it for.  However,

I can see a use for "anchor text":<scheme://site.domain/path> as a good
way to say, inline, that you want the given anchor text to appear (with
its "quotes" stripped) as the text of a link to the given URL.  I can
see how it might be desirable to use "anchor text":[ref] as a short-hand
for the above but requesting [ref]'s URL as the URL, provided ref is the
subject of a '..[ref] url' directive.  In which case inline [ref] is
implicitly a short-hand for "[ref]":[ref] - i.e. use '[ref]' as the text
of a link to the url specified in the '..[ref] url' directive.

Indeed, I'd be tempted to at least allow the '..[ref] url' directive to
enclose url in <...> for the sake of similarity.

The only difference from Edward's
>   Use "name":[ref] for in-line hrefs.  If ref is a single token, and
>   there is a directive of the form:
>      ..[ref] url
>   Then use url as the URL; otherwise, use ref as the URL.

is then that the `otherwise use ref as the URL' fuzziness gets blown
away: we get

   Inline use of "text":[ref] is then a link, with text "text", to the
   url specified elsewhere by a '..[ref] url optional comments'
   directive; inline use of simply [ref] is equivalent to "[ref]":[ref]

   Inline use of "text":<url> is a link, with text "text", to the
   specified url, without recourse to an '..' directive.

anything else vaguely resembling these is just a lump of text with some
surprising uses of punctuation.  This gets us the asked-for win in terms
of letting URLs end in . or appear at the end of a sentence (or both)
without ambiguity, while also gaining the asked-for parsability win
*and* saving the `if that happens this otherwise the other' gumbo Edward
was giving.  Furthermore, use of # in a URL will now be within <...>, so
we get spared various parser uglies.

If I've understood what STNG does (which is a big if, as it's all by
inference from what I think you guys are saying), this either removes or
simplifies the problem of persuading the STNG folk, since it no longer
clashes with the [ref] forms they're used to, and probably makes their
lives a lot easier when it comes to parsing the "text":url idioms Tibs
lists.  And the above is manifestly simpler and more intuitive IMO ;^>

> It would indeed make life a lot simpler.

> Inline refs were introduced deliberately to look like footnotes 
aside: [blah] is surely what *bibliographic citations* look like, not
*footnotes* in any typesetting idiom I've ever met.
But you meant that, I presume.

(not sure who):
> 3. Local references (which look like '[this]' or '[1]') are now
> 	..[this]

ah, so a paragraph starting (or preceded by a line of form ?) '..[this]'
is implicitly an <A Name="something random"></A> accompanied by a::

   ..[this] <#something random>

directive somewhere in the docstring ?  Thus enabling xrefs to that
para from within the document using [this] or "anchor text":[this].
And "something random" is putatively "this", I suppose, in which
case we've also enabled "anchor text":<#this>
Sounds good.

> Clarification on the syntax..  is *anything* that looks like [this] a
> local reference, or does it have to be preceeded by "a parenthetical
> like"[this] or "a parenthetical and a colon like":[this]?  

erm ... any use of [ref] is either just some text with funny punctuation
or using the same name, 'ref', as some particular '..' directive.  What
problem is there in distinguishing ?  Is it the fact that the generated
page, in which the <#this> anchor is defined, may be made of several doc
strings, so that you don't *know* whether there's a ..[this] in one of
the other doc strings making up the page ?

If one of the latter, does [this] get rendered with brackets?  Flagged
as a warning when validating (in principle, not in current 

> If one of the latter, does [this] get rendered with brackets?  Flagged
> as a warning when validating (in principle, not in current
> implementation)?
either way, [this] gets rendered with brackets: either it's being made
to look like a citation, to the URL specified for '..[this]' to refer
to, or it's a lump of random text (about which a doc tool may wish to
generate a warning, at least if 'this' matches the label-spec).

> What is acceptable content for [this]?  '[\w_-]+'?

Hmm.  Well, ideally we'd support standard citation forms, which would
include '[this, that, other]', to be treated like '[this], [that],
[other]' but with the excess punctuation ditched (this *is* a standard
usage of the citation idiom being mimicked, after all; used when what
was said just before it is backed up by three separate texts elsewhere).
This can only sensibly be applied to '[refs]' forms, not to
'"text":[refs]' forms, for obvious reasons.  We'd still be using
'[\w_-]+' for the names specified in a '..[ref]' definition, but using
'[\w_-]+(, [\w_-]+)*' as the contents of a [...] used as an inline link.

But, that aside, and allowing we might insist on the `excess punctuation'
being given explicitly (for simplicity/unambiguity), [\w_-]+ sounds like
a reasonable deal, albeit I might ditch _ and, in any case, really just
ask for the same regex as we use for Labels ...

One might plausibly want to allow '&' in ref names (within [...], as
opposed to within <urls> where, obviously, they're allowed) because of
all those papers and books by two authors whose names are the standard
way to refer to the book, e.g.

  ..[K&L] <ISBN:0-582-46354-8> Kaye and Laby, Tables of Physical and
  Chemical Constants, pub. Longman Scientific and Technical

(ignoring the questions of whether the scheme ISBN is implemented yet;
pretend the fake ISBN URL were replaced with a suitable URL on Longman's
(or some online bookstore's) web site.)  But, again, we could demand
simplicity and insist on [KandL] without doing anyone any real harm.

>> I think we should just go with the English definition of a word,
>> which means [-A-Za-z], and leave it at that. It is *meant* to look
>> like a word.

> Is that too anglo-centric?  

(modulo inclusion/exclusion of _ which I don't care about)

No, it's ASCII-centric and we're really working inside ASCII, so it's
appropriate; except that I'd want to include digits, at least for
[citations] and I'd argue that we should anticipate folk wanting to use
python identifiers here (when, e.g., the relevant python object is
defined in some other module and the author doesn't want to rely on
vagaries of the doc-tool's relationship with include directives), hence
requiring _ and digits; i.e. I agree with Edward's

> ... underlines and digits are more applicable for endnotes.
> Some people might like this [1] or this [noam_chomsky97].

I'd go for either:

   * citation names are [\d\w_-]+ read case-sensitively
   * doc-string labels are [a-z\d-]+ once passed through string.tolower


   * both kinds are [\d\w_-]+ read case-insensitively

(here using \w_ purely to keep out of arguments about whether \w
includes _ already) without noticable preference, and accept that all
ST-generic doc-string labels are expected to be Anglic words, hence not
to *exercise* the \d allowed in the label spec, but to *allow* \d in
labels for the sake of ST-specific dialects which may well want, e.g.,
to use a number in a label.

(By the way - Edward, some of your sentences end .. others end in a
single . - why ? i.e. is there a reason other than bouncy fingers ?)