References in the same line as the target text
Hi everybody. I am currently poking around in the reStructuredText stuff of the docutils. I like it. However, there is one thing that really bugs me a bit. It seems impossible to get a Reference (Link) without having to type the name twice, which is a point where errors occur easily. Also sometimes it is not feasible to have the link targets stand out after a paragraph (where they interrupt the read flow) or at the end of the text (hard to keep in sync in case of changes to the target text). I would like to have some kind of inline link markup. I think a very logical way would be something like the following reference to Python_(http://www.python.org) or this Reference to `The GIMP`_(http://www.gimp.org). IMHO this is a logical and unintrusive extension to the Link Syntax (If a ')' is in the Link itself it should be escaped, but maybe somebody has a better idea for the delimiters). I am currently not sure if it is feasible to have these Links added to the target name list, because this might result in inconsistencies like here_(http://foo.bar) and here_(http://bar.baz). Or leave this problem to the user and allow also anonymous inline links like here__(http://blubb.bla). I think this addition would make it way easier to write structured text with links. But it is perfectly possible that I missed something obvious, since I am pretty new to this field. Please tell me what you think... :-) Bye, Simon -- Simon.Budig@unix-ag.org http://www.home.unix-ag.org/simon/
Simon Budig wrote:
It seems impossible to get a Reference (Link) without having to type the name twice, which is a point where errors occur easily.
You can avoid typing the name twice if you use anonymous links:: This is an `example reference`__. __ http://www.example.org But I take it you know this. :-)
Also sometimes it is not feasible to have the link targets stand out after a paragraph (where they interrupt the read flow) or at the end of the text (hard to keep in sync in case of changes to the target text).
I would like to have some kind of inline link markup. I think a very logical way would be something like the following reference to Python_(http://www.python.org) or this Reference to `The GIMP`_(http://www.gimp.org).
I can see why you might think link targets after a paragraph break the flow. I often group targets at the end of sections, where a break in the flow is not so noticeable. I agree that keeping references and targets in sync can be difficult when using anonymous links. However, I can't reconcile your notion that targets *after* a paragraph break the flow, but targets *within* a paragraph don't. To me, having the URL inside the text breaks the flow much more severely, *especially* when combined with syntax. If it's just an issue of keeping references and targets in sync, and you were only interested in the processed output, I could understand. But then, you wouldn't have any objection to putting targets after paragraphs. One of the major reasons for the current link syntax is a reaction against the inline syntax in StructuredText (one of reStructuredText's prececessors and sources), similar to what you propose. See http://docutils.sf.net/spec/rst/problems.html#hyperlinks for details. If you do want the references in the text, why not just put them in directly? For example: Here's a reference to Python (http://www.python.org) and one to The GIMP (http://www.gimp.org). One of the goals of reStructuredText is to be equally readable both before and after processing. It's for documents that are meant to be read in source (plaintext) form as well as processed form. If you're willing to have URLs in the middle of sentences in your source files, why not in the HTML? It's a reasonable compromise.
IMHO this is a logical and unintrusive extension to the Link Syntax
IMHO, it's a step backward. (Gotta be brutally honest; life's too short ;-) But thanks for the feedback! With any new syntax, be it a text markup or a programming language, it takes time to get used to unfamiliar corners. Perhaps you just need to give it a chance? -- David Goodger <goodger@users.sourceforge.net> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/
Hi all, hi David. First some words why I want to use reStructuredText and what motivates me to bring this up. I made the experience with my personal homepage, that there are two major problems for the maintainance of the (small) site. 1) keeping the structure consistent and have the menu refer to all available pages. Especially creating new graphical buttons for a menu is a pain I tend to avoid. 2) Making content creation easy. Even though i have lots of experience with HTML I still hate writing it, because it is so easy to break stuff. The latter is my reason why I am interested in reStructuredText. I basically want to have a system where I could drop in a .rst and some Apache magic will serve this as HTML with my customized design. David Goodger (goodger@users.sourceforge.net) wrote:
Simon Budig wrote: [...]
I would like to have some kind of inline link markup. I think a very logical way would be something like the following reference to Python_(http://www.python.org) or this Reference to `The GIMP`_(http://www.gimp.org).
I can see why you might think link targets after a paragraph break the flow. I often group targets at the end of sections, where a break in the flow is not so noticeable. I agree that keeping references and targets in sync can be difficult when using anonymous links.
It is also difficult if you use named links. It is even more if you group the targets at the end of a larger section and have a big distance between the two points of interest. The main reason for this is, that the text identifying the link appears twice. This is one of the few points where you can easily introduce serious errors in reST-text and make maintaining the text harder: If you want to change the text you have to do this at two places. Since my main motivation is to make the maintainance of stuff easier this bugs me.
However, I can't reconcile your notion that targets *after* a paragraph break the flow, but targets *within* a paragraph don't. To me, having the URL inside the text breaks the flow much more severely, *especially* when combined with syntax.
Oh - maybe it is just me, but skipping an URL in braces is easy for me. However, when there are a bunch of links at the end of a section I have to search for the start of the next "real" text... On the other hand it might even be useful to have the links directly ready for cut'n'paste/bookmarking while reading a text (see below). Note that you have to know something about reST to 1) understand that an '_' indicates a link (not trivial for a newbie!) and 2) find the proper link for the target you are currently interested in (also not trivial). [...]
One of the major reasons for the current link syntax is a reaction against the inline syntax in StructuredText (one of reStructuredText's prececessors and sources), similar to what you propose. See http://docutils.sf.net/spec/rst/problems.html#hyperlinks for details.
Btw - I remember having seen the design goals mentioned there, but I am unable to find them again. Maybe you should add a link there. You mention that the forms mentioned at that link are neither intuitive nor unobtrusive and I agree to that. However when I test my proposed syntax against this (ok, this has to be subjective...) I think, that at least the intuitivity is given. I have seen lots of texts where an URL is mentioned in braces inline with the text and it seems natural to me. The addition of an underscore is barely noticeable then. The "unobstrusitivity" is a bit harder to judge (it starts with interpreting the word, because I am not a native speaker... :-) A problem surely is that there is text in the reST source that does not get rendered to the final output - maybe a bit surprising. However, using Python_(http://www.python.org) versus using Python_ or Python (http://www.python.org) would be the choice of the author. .. _Python: http://www.python.org I also think of my proposal as unobstrusive, because if you read the reST it does not say "HELLO! THIS IS SYNTAX!" (versus e.g. Python_{http://www.python.org} - Ugh!). As mentioned above I have seen URLs in braces quite often and I assume that the average reader is used to it or able to interpret it correctly, because the meaning of braces (for remarks like this) are familiar to everybody and the URL in this context is exactly this: A remark to the topic just mentioned.
If you do want the references in the text, why not just put them in directly? For example:
Here's a reference to Python (http://www.python.org) and one to The GIMP (http://www.gimp.org).
One of the goals of reStructuredText is to be equally readable both before and after processing. It's for documents that are meant to be read in source (plaintext) form as well as processed form. If you're willing to have URLs in the middle of sentences in your source files, why not in the HTML? It's a reasonable compromise.
This does not work for relative links. Also for my personal homepage as mentioned at the beginning of the Mail I am pretty picky, since graphical stuff has a pretty high priority in my life... So while this is acceptable in plaintext (this is one of my preferred ways to mention URLs in plaintext) it is not in HTML, since HTML has a better (HTML-specific) way to handle links. BTW: You mention the goal "equally readable" for reST. I personally would replace this with "equally useable for the reader". IMO this is not the same (the latter is a broader goal) and Links are a good way to demonstrate this. Links are meant to point to an external reference. So if a user reaches a Link he usually wants to a) follow this link immediately, b) create a bookmark or c) ignore it. In HTML all three tasks are easily done, provided there is a slight hint in the text formatting, that there actually is a link. In reST the goals a) and b) are seriously hampered, because the URL necessary to do the action maybe everywhere in the text and the user has no choice but to search the whole text for the matching identifier (which might be in a totally "wrong" place in case of multiple links to the same target) and then perform his action. So the links in reST are less "useable"... My proposal would bring the URL close to the point where it is relevant and make a) and b) way easier. Of course it is a bit harder to ignore the URL, but since braces are very common to indicate that something has less importance I think that this is not too hard to ignore the link.
IMHO this is a logical and unintrusive extension to the Link Syntax
IMHO, it's a step backward. (Gotta be brutally honest; life's too short ;-)
Backward? I do not want to remove functionality from reST... :-) Sorry, I fail to see the badness of my proposal. I neither see any real bad impacts on the readability of the reST nor something that would wreak havoc with the Syntax of reST (or do I miss something here?).
But thanks for the feedback!
With any new syntax, be it a text markup or a programming language, it takes time to get used to unfamiliar corners. Perhaps you just need to give it a chance?
I definitely will give it a shot. Writing reST is so easy compared to HTML that I am definitely attracted towards doing my Homepage with this stuff. However, there is this point that bugs me and I have a real chance to (IMO) improve this point - it would be dumb to not to try it... :-) Phew, this is one of my longer mails, thanks for reading it to the end... Bye, Simon -- Simon.Budig@unix-ag.org http://www.home.unix-ag.org/simon/
Simon Budig wrote:
First some words why I want to use reStructuredText and what motivates me to bring this up.
I made the experience with my personal homepage,
I took a look at your homepage (http://www.home.unix-ag.org/simon/). I hope you do realize that Docutils/reStructuredText may never be capable of producing that kind of visually rich output. I hope to improve it's capabilities, but there is a limit beyond which plaintext cannot (and should not) go. However, such functionality is certainly within the realm of possibility, and I'd encourage anyone to tackle the challenge posed in the To Do list: Construct a _`templating system`, as in ht2html/yaptu, using directives and substitutions for dynamic stuff. [Simon]
I would like to have some kind of inline link markup. I think a very logical way would be something like the following reference to Python_(http://www.python.org) or this Reference to `The GIMP`_(http://www.gimp.org).
[David]
I can see why you might think link targets after a paragraph break the flow. I often group targets at the end of sections, where a break in the flow is not so noticeable. I agree that keeping references and targets in sync can be difficult when using anonymous links.
[Simon]
It is also difficult if you use named links.
The difficulty I refer to is that of keeping the order of anonymous references in sync with the order of anonymous targets. With named links, the order doesn't matter and that difficulty evaporates.
It is even more if you group the targets at the end of a larger section and have a big distance between the two points of interest. The main reason for this is, that the text identifying the link appears twice. This is one of the few points where you can easily introduce serious errors in reST-text and make maintaining the text harder: If you want to change the text you have to do this at two places.
That's exactly why anonymous links were introduced.
Since my main motivation is to make the maintainance of stuff easier this bugs me.
There are conflicting goals here: 1. Keep the plaintext as readable as possible. 2. Keep the URLs as close to the references as possible. 3. Keep the inter-paragraph space clear of targets. I find the suggested syntax, ``Python_(http://www.python.org)``, conflicts with goal 1. Goals 2 and 3 confilict. And I consider goal 1 more important than goal 3 (since goal 1 is the only one of the three goals which is also a reStructuredText goal). We cannot satisfy all three goals in plaintext, because it is two-dimensional. HTML has a third dimension, that of links "underneath" the text (in <a href=...> tags), which we can only simulate in reStructuredText.
Note that you have to know something about reST to 1) understand that an '_' indicates a link (not trivial for a newbie!) and 2) find the proper link for the target you are currently interested in (also not trivial).
Keep the targets close to the references and it becomes trivial. A newbie need see the construct just once to understand it. (Unless you're using anonymous links exclusively, in which case all bets are off.)
Btw - I remember having seen the design goals mentioned there, but I am unable to find them again. Maybe you should add a link there.
OK, will do. BTW, it's http://docutils.sf.net/spec/rst/introduction.html#goals.
You mention that the forms mentioned at that link [http://docutils.sf.net/spec/rst/problems.html#hyperlinks] are neither intuitive nor unobtrusive and I agree to that. However when I test my proposed syntax against this (ok, this has to be subjective...) I think, that at least the intuitivity is given.
But it's just as obtrusive.
I have seen lots of texts where an URL is mentioned in braces inline with the text and it seems natural to me.
So, as I said before, include URLs inline with the text in braces/parentheses/whatever, as a first-class part of the text.
The addition of an underscore is barely noticeable then.
If you leave out the underscore altogether, it won't be noticeable at all! ;-)
The "unobstrusitivity" is a bit harder to judge (it starts with interpreting the word, because I am not a native speaker... :-)
Hadn't noticed. Ah, now I see the ".de". I with I could speak German as well!
A problem surely is that there is text in the reST source that does not get rendered to the final output - maybe a bit surprising.
But unavoidable -- something has to give way.
However, using Python_(http://www.python.org) versus using Python_ or Python (http://www.python.org) would be the choice of the author.
.. _Python: http://www.python.org
I also think of my proposal as unobstrusive, because if you read the reST it does not say "HELLO! THIS IS SYNTAX!" (versus e.g. Python_{http://www.python.org} - Ugh!).
Come again? What's the difference between ``Python_(http://www.python.org)`` and ``Python_{http://www.python.org}``? Four pixels, by my count. Hardly enough to warrant an "ugh!".
As mentioned above I have seen URLs in braces quite often and I assume that the average reader is used to it or able to interpret it correctly, because the meaning of braces (for remarks like this) are familiar to everybody
I think it's because the meaning of *URLs* is familiar to everybody. The braces/parentheses are insignificant line noise. Put a relative URL (no "http://") in that syntax and I'd expect it to be quite confusing in plaintext.
If you do want the references in the text, why not just put them in directly? For example:
Here's a reference to Python (http://www.python.org) and one to The GIMP (http://www.gimp.org). ... This does not work for relative links.
True. That needs explicit, unambigous syntax.
Also for my personal homepage as mentioned at the beginning of the Mail I am pretty picky, since graphical stuff has a pretty high priority in my life...
But there are no graphics in plaintext. You're asking for too much. Either the plaintext is at least equally as important as the HTML (in which case they ought to look as similar as possible, precluding inline URLs that aren't displayed in the HTML), or the HTML is more important (in which case you're the only one who will ever read the plaintext). I suspect the latter.
BTW: You mention the goal "equally readable" for reST. I personally would replace this with "equally useable for the reader".
But I wouldn't. Otherwise, why bother converting to HTML? Answer: because it *increases usability*! (And visual appeal, of course.)
IMO this is not the same (the latter is a broader goal) and Links are a good way to demonstrate this.
Links are meant to point to an external reference. So if a user reaches a Link he usually wants to a) follow this link immediately, b) create a bookmark or c) ignore it. In HTML all three tasks are easily done, provided there is a slight hint in the text formatting, that there actually is a link. In reST the goals a) and b) are seriously hampered,
In reST the goal b) is a non-starter: you need a browser for bookmarks, and if you'll be using a browser, you should be reading the HTML-processed version of the doc.
because the URL necessary to do the action maybe everywhere in the text and the user has no choice but to search the whole text for the matching identifier (which might be in a totally "wrong" place in case of multiple links to the same target) and then perform his action. So the links in reST are less "useable"...
Granted. It's a trade-off between usability and readability. I chose readability, and I stick to that choice because those who read the plaintext will typically not be using a browser, but a text editor (which presumably has a "search" feature -- good enough).
My proposal would bring the URL close to the point where it is relevant and make a) and b) way easier. Of course it is a bit harder to ignore the URL, but since braces are very common to indicate that something has less importance I think that this is not too hard to ignore the link.
Hard enough. -1, sorry.
IMHO this is a logical and unintrusive extension to the Link Syntax
IMHO, it's a step backward. (Gotta be brutally honest; life's too short ;-)
Backward? I do not want to remove functionality from reST... :-)
Backward to StructuredText, I meant.
Sorry, I fail to see the badness of my proposal. I neither see any real bad impacts on the readability of the reST nor something that would wreak havoc with the Syntax of reST (or do I miss something here?).
It's not really "bad". Deciding these things is a subtle act of judging with conflicting goals. This proposal simply comes up short.
Phew, this is one of my longer mails, thanks for reading it to the end...
I appreciate the effort. Just about every proposal, idea, or criticism improves the project in some way, and this was no exception. -- David Goodger <goodger@users.sourceforge.net> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/
David Goodger wrote:
Simon Budig wrote:
First some words why I want to use reStructuredText and what motivates me to bring this up.
I made the experience with my personal homepage,
I took a look at your homepage (http://www.home.unix-ag.org/simon/). I hope you do realize that Docutils/reStructuredText may never be capable of producing that kind of visually rich output. I hope to improve it's capabilities, but there is a limit beyond which plaintext cannot (and should not) go.
Alternatively you might want to take a look at XIST (http://www.livinglogic.de/Python/xist/) The XIST pages themselves were made with XIST. (Click on the "Page Source" link to see the source for each page, or goto http://www.livinglogic.de/viewcvs/index.cgi/LivingLogic/WWW-Python/site/ for ViewCVS access.) Hope that helps, Walter Dörwald
On Mon, Jul 01, 2002, David Goodger wrote:
Simon Budig wrote:
Also for my personal homepage as mentioned at the beginning of the Mail I am pretty picky, since graphical stuff has a pretty high priority in my life...
But there are no graphics in plaintext. You're asking for too much. Either the plaintext is at least equally as important as the HTML (in which case they ought to look as similar as possible, precluding inline URLs that aren't displayed in the HTML), or the HTML is more important (in which case you're the only one who will ever read the plaintext). I suspect the latter.
Enh. I disagree with this line of reasoning. Where I'm coming from is wanting to use reST to save writing time for structured documents; nobody but me will use the plaintext. I do agree that graphics are beside the point for reST, but that's because of the emphasis on *structure*. To the extent that graphics are part of a structured design, I think reST should support them (and already does, I think). -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/
[Simon]
Also for my personal homepage as mentioned at the beginning of the Mail I am pretty picky, since graphical stuff has a pretty high priority in my life...
[David]
But there are no graphics in plaintext. You're asking for too much. Either the plaintext is at least equally as important as the HTML (in which case they ought to look as similar as possible, precluding inline URLs that aren't displayed in the HTML), or the HTML is more important (in which case you're the only one who will ever read the plaintext). I suspect the latter.
[Aahz]
Enh. I disagree with this line of reasoning. Where I'm coming from is wanting to use reST to save writing time for structured documents; nobody but me will use the plaintext. I do agree that graphics are beside the point for reST, but that's because of the emphasis on *structure*. To the extent that graphics are part of a structured design, I think reST should support them (and already does, I think).
I think there's some misunderstanding here (perhaps on my part). Yes, reStructuredText already supports graphics, using an explicit directive mechanism. All I'm saying is that there are no graphics *embedded* in the plaintext file itself, *when you're editing it*. In other words, you don't see the images within the text in Emacs; no cut & paste from GIMP. reStructuredText is *not* about graphic layout. I understood Simon to be saying that he wants the reader convenience/usability of HTML in reStructuredText with regards to external references (URLs), and images are simply one form of external reference. It comes down to this: the top goal of reStructuredText is to be as readable in plaintext (source) form as in processed form. An important market for this is (will be) Python docstrings. You and Simon seem not so interested in the plaintext readability issue; it's the processed output which is most important. If reStructuredText works for that, great, but we're not going to make significant alterations for the output-centric market if those changes adversely affect the "plaintext as readable as processed" market. I believe that the "reference_(url)" proposed syntax would adversely affect the plaintext readability of reStructuredText. -- David Goodger <goodger@users.sourceforge.net> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/
David Goodger (goodger@users.sourceforge.net) wrote:
It comes down to this: the top goal of reStructuredText is to be as readable in plaintext (source) form as in processed form. An important market for this is (will be) Python docstrings. You and Simon seem not so interested in the plaintext readability issue; it's the processed output which is most important.
Yes, this is an important point, but easy edit-ablity (?) is also very important. There are enough complicated markup languages out there, and reST is the easiest I have came across (with the exception of some other textual markup languages that fail to have some kind of sane specification). It is in my eyes a great pity that you seem to have a tendency to limit reST to the docstring-scope. I in your place would not have merged reST and the DPS, because I think of them as two different things (I have to admit that I did not look too deeply in the DPS stuff, but extracting docstrings seems unrelated to process reST text to me). I think that there really is a need for a simple markup language that can be used by - for example - a secretary to maintain a simple Website. HTML fails the "understandable to non-geek-guys"-test, GUI-tools are known to produce crappy HTML code when used by non-experts. reST really could fill a gap here. Bye, Simon -- Simon.Budig@unix-ag.org http://www.home.unix-ag.org/simon/
My comments on the comments inline...
David Goodger (goodger@users.sourceforge.net) wrote:
It comes down to this: the top goal of reStructuredText is to be as readable in plaintext (source) form as in processed form. An important market for this is (will be) Python docstrings. You and Simon seem not so interested in the plaintext readability issue; it's the processed output which is most important.
On Wed, 3 Jul 2002, Simon Budig wrote:
Yes, this is an important point, but easy edit-ablity (?) is also very important. There are enough complicated markup languages out there, and reST is the easiest I have came across (with the exception of some other textual markup languages that fail to have some kind of sane specification).
I have an open source project which is all coded in Python. Currently we use StructuredText for all our docstrings (which I border on loathing for various reasons) and we used to use AFT for our manuals. I switched all the manual stuff from AFT to reStructuredText a week or two ago in the space of two hours without much effort. I include the actual reST source of the manual text with our source code--it all comes in one big bundle. So it's important that users can read the manual without having to convert it to some readable format. reST is perfect in this case. My developers are happy, my users are happy, and I'm happy. No one seems to have a problem understanding the text even though they don't know the specifics of reST markup. Then I convert the manuals to HTML for the web-site along with the doc-string docs from HappyDoc and various other things we maintain on the web-site. My experiences with reST so far lead me to believe that it's very well thought out, it's very readable, it's easy to maintain (with the exception of tables, but like David said, tables are going to be difficult to maintain whichever way you slice it), and I can convert it to other formats for other mediums. I've been on this list for some time. It has never been a goal for reST to **only** be used for doc-strings. In fact, it's been suggested several times that we use reST as the unifying markup for all of Python's documentation. So that's specifically what reST was built around. There will be a series of things for which reST markup is a good tool and a series of things where it's going to be terrible. I don't think any of us disagree on this point. Personally, I think you're crazy to use reST with SSI and whatever else to build your web-site. I think it'd be much easier for you, especially if you hate HTML, to get a wysiwig editor and build it with that. The Mozilla composer (or whatever it's called) is pretty good and the html it produces isn't that crappy. You could always run the composer HTML through tidy which will fix up a lot of things. Does that mean you shouldn't do it? No. But you don't see people writing RDMS database servers in Befunge either.
On Wed, 3 Jul 2002, Simon Budig wrote:
I think that there really is a need for a simple markup language that can be used by - for example - a secretary to maintain a simple Website. HTML fails the "understandable to non-geek-guys"-test, GUI-tools are known to produce crappy HTML code when used by non-experts.
reST really could fill a gap here.
reST can't be all things to all people. It's simply not possible. Just as there's no single spoken language in all the world. From that we derive that we have to pick the applications that we want reST to be really good at and if people find other applications for reST, then that's way cool, but it shouldn't change our mission. In that vein, reST is definitely not the syntax I would use for building web-sites where the user doesn't understand what they're doing. You're just replacing one verbose formatting markup language (HTML) for another non-formatting markup language which is not verbose and uses a lot of punctuation because we're interested in the source being readable. In either case, the secretary is going to have to learn a markup language. So my bottom line is that unless you really have something that's a clincher for an argument one way or another about the links, I vote we table this dicussion possibly ad infinitum. Though it would be interesting to get some input from other people using reST. /will -- whatever it is, you can find it at http://www.bluesock.org/~willg/ except Will--you can only see him in real life.
will (willg@bluesock.org) wrote:
Personally, I think you're crazy to use reST with SSI and whatever else to build your web-site. I think it'd be much easier for you, especially if you hate HTML, to get a wysiwig editor and build it with that.
Nah. Just to show you what is possible with SSIs and why I think using reST for webpages is not crazy: Have a look at http://www.home.unix-ag.org/simon/bsdaemon/ . The raw source of it is:: <!--#include virtual="$SCRIPT_NAME/../../include/head_start.shtml" --> <title>A vector version of the BSD Daemon</title> <!--#include virtual="$DOCROOT/include/body_start.shtml"--> <h2> A vector version of the BSD Daemon </h2> <p> <img src="bsdaemon.png" alt="Preview of the BSD Daemon" align="right" width="326" height="352"><p> </p> <p> The BSD daemon originally was done by John Lasseter. The copyright holder for these images is Marshall Kirk McKusick <<a href="mailto:mckusick@mckusick.com">mckusick@mckusick.com</a>>. </p> [some more simple paragraphs ommitted] <p> Grab the tarball <a href="bsdaemon-1.0.tar.gz">here</a>. </p> <p> Have fun!<br> Simon Budig <<a href="mailto:simon@budig.de">simon@budig.de</a>> </p> <!--#include virtual="$DOCROOT/include/separator.shtml"--> <!--#include virtual="$DOCROOT/include/navi.shtml"--> <!--#include virtual="$DOCROOT/include/body_end.shtml"--> I think it should be fairly trivial to create such an output with reST (just need to change some small things for the header/footer). And there is no CSS used yet. Bye, Simon -- Simon.Budig@unix-ag.org http://www.home.unix-ag.org/simon/
David Goodger (goodger@users.sourceforge.net) wrote:
Simon Budig wrote:
First some words why I want to use reStructuredText and what motivates me to bring this up.
I made the experience with my personal homepage,
I took a look at your homepage (http://www.home.unix-ag.org/simon/). I hope you do realize that Docutils/reStructuredText may never be capable of producing that kind of visually rich output. I hope to improve it's capabilities, but there is a limit beyond which plaintext cannot (and should not) go.
I am aware that using reST would limit my abilities to do funky stuff with tables and images. However, the overhaul of my homepage would also reduce this kind of fiddeling. For example the current graphical menu is a nightmare to maintain - in fact I didn't ever maintain it, so that it only contains the items that were there since the very beginning. However, you can do very funky stuff with CSS. I think, reST + ability to do Server Side Includes + CSS can create quite interesting stuff.
However, such functionality is certainly within the realm of possibility, and I'd encourage anyone to tackle the challenge posed in the To Do list:
Construct a _`templating system`, as in ht2html/yaptu, using directives and substitutions for dynamic stuff.
I don't know yaptu. But for a start I'd like to have a bit more configurable html converter for reST. This means for example to be able to switch off the <html>....<body> and </body>...</html> parts.
[Simon]
I would like to have some kind of inline link markup. I think a very logical way would be something like the following reference to Python_(http://www.python.org) or this Reference to `The GIMP`_(http://www.gimp.org).
[David]
I can see why you might think link targets after a paragraph break the flow. I often group targets at the end of sections, where a break in the flow is not so noticeable. I agree that keeping references and targets in sync can be difficult when using anonymous links.
[Simon]
It is also difficult if you use named links.
The difficulty I refer to is that of keeping the order of anonymous references in sync with the order of anonymous targets. With named links, the order doesn't matter and that difficulty evaporates.
It is even more if you group the targets at the end of a larger section and have a big distance between the two points of interest. The main reason for this is, that the text identifying the link appears twice. This is one of the few points where you can easily introduce serious errors in reST-text and make maintaining the text harder: If you want to change the text you have to do this at two places.
That's exactly why anonymous links were introduced.
This essentially boils down to "Ok, named links are hard to use, so we introduced anonymous links. They are hard to maintain too, but a little less." So to use links in reST I have the choice between two evil things: 1) make sure that Link targets and Link text are absolutely the same 2) count "__"s. This is exactly why I proposed a third solution :-)
Since my main motivation is to make the maintainance of stuff easier this bugs me.
There are conflicting goals here:
1. Keep the plaintext as readable as possible. 2. Keep the URLs as close to the references as possible. 3. Keep the inter-paragraph space clear of targets.
I find the suggested syntax, ``Python_(http://www.python.org)``, conflicts with goal 1. Goals 2 and 3 confilict. And I consider goal 1 more important than goal 3 (since goal 1 is the only one of the three goals which is also a reStructuredText goal).
Ok, we seem to disagree here fundamentally. I personally think that the proposed syntax is readable, you think it isn't. Not sure how to solve this...
We cannot satisfy all three goals in plaintext, because it is two-dimensional. HTML has a third dimension, that of links "underneath" the text (in <a href=...> tags), which we can only simulate in reStructuredText.
Note that you have to know something about reST to 1) understand that an '_' indicates a link (not trivial for a newbie!) and 2) find the proper link for the target you are currently interested in (also not trivial).
Keep the targets close to the references and it becomes trivial. A newbie need see the construct just once to understand it. (Unless you're using anonymous links exclusively, in which case all bets are off.)
I think you overestimate the intuitivity of the named link syntax. Especially when you use the same target multiple times you simply cannot keep targets close to references. I think your longer experience with reST tricks you into thinking that it is easy to understand. [...]
So, as I said before, include URLs inline with the text in braces/parentheses/whatever, as a first-class part of the text.
The addition of an underscore is barely noticeable then.
If you leave out the underscore altogether, it won't be noticeable at all! ;-)
But it doesn't solve my problem... :-)
The "unobstrusitivity" is a bit harder to judge (it starts with interpreting the word, because I am not a native speaker... :-)
Hadn't noticed. Ah, now I see the ".de". I with I could speak German as well!
A problem surely is that there is text in the reST source that does not get rendered to the final output - maybe a bit surprising.
But unavoidable -- something has to give way.
However, using Python_(http://www.python.org) versus using Python_ or Python (http://www.python.org) would be the choice of the author.
.. _Python: http://www.python.org
I also think of my proposal as unobstrusive, because if you read the reST it does not say "HELLO! THIS IS SYNTAX!" (versus e.g. Python_{http://www.python.org} - Ugh!).
Come again? What's the difference between ``Python_(http://www.python.org)`` and ``Python_{http://www.python.org}``? Four pixels, by my count. Hardly enough to warrant an "ugh!".
Oh come on. I thought I elaborated on that. It seems my english still isn't good enough (thanks for the compliment though... :-) The whole point is, that regular braces are used in regular text. In fact it will be hard to find a book that doesn't use them for remarks, that are not too important. However, I have yet to find a book that uses curly braces in regular text. Round braces have a traditional meaning: "You can skip this and it will not hurt your understanding of the text!" Curly braces have no traditional meaning - everything inside them will stand out. It happens that the first one is exactly valid for URLs. [...] [reordered the text a bit]
My proposal would bring the URL close to the point where it is relevant and make a) and b) way easier. Of course it is a bit harder to ignore the URL, but since braces are very common to indicate that something has less importance I think that this is not too hard to ignore the link.
Hard enough. -1, sorry.
Sorry, I fail to see the badness of my proposal. I neither see any real bad impacts on the readability of the reST nor something that would wreak havoc with the Syntax of reST (or do I miss something here?).
It's not really "bad". Deciding these things is a subtle act of judging with conflicting goals. This proposal simply comes up short.
Ok, this might be a bit unfair, but let me bring up a different point. As mentioned already I judge the reference_(target) syntax as readable, for some reason you don't. However, I then fail to see how you could introduce something like substitution references into the spec. This |contradicts goals| and looks very much like some intrusive syntax. .. |contradicts goals| replace:: really makes a big difference between the plaintext "source" and the html output Of course I can understand why such a construct is feasible, it makes writing the text way easier and also enables to do other funky stuff, but it is a severe impact to the readability of the text and thus really contradicts the goals you mention against my proposal. Naturally it is the choice of the author to use such constructs. But it'd be the choice of the author to use my constructs also. [end of reordered text]
Either the plaintext is at least equally as important as the HTML (in which case they ought to look as similar as possible, precluding inline URLs that aren't displayed in the HTML), or the HTML is more important (in which case you're the only one who will ever read the plaintext). I suspect the latter.
BTW: You mention the goal "equally readable" for reST. I personally would replace this with "equally useable for the reader".
But I wouldn't. Otherwise, why bother converting to HTML? Answer: because it *increases usability*! (And visual appeal, of course.)
I clearly see a "market" for using reST as a simple markup language with low learning effort. For these kind of uses the plaintext is less important than the processed output, because it will most likely be hidden to the target audience. I think ignoring this "market" and just focus on python docstrings would be stupid. If used as a way to easily publish HTML pages we should be aware that linking is very important - in fact this is the whole point about "Hypertext". In my Opinion the current reST linking stuff is hard to use and tends to scare away authors from linking. The current reST requires you to edit the text simultaneously in two places. This is a bad thing, because it interrupts your line of thought while authoring. My proposal would introduce a way to markup lots of links in a paragraph (some people like to do this) fluently while typing the original text. I'd think this is a good thing... Bye, Simon -- Simon.Budig@unix-ag.org http://www.home.unix-ag.org/simon/
Simon Budig (Simon.Budig@unix-ag.uni-siegen.de) wrote: [named links are hard to maintain] [David:]
That's exactly why anonymous links were introduced.
This essentially boils down to "Ok, named links are hard to use, so we introduced anonymous links. They are hard to maintain too, but a little less."
So to use links in reST I have the choice between two evil things: 1) make sure that Link targets and Link text are absolutely the same 2) count "__"s.
This is exactly why I proposed a third solution :-)
Just to show the uglyness of anonymous and named links: This is from the webpage source (as in CVS): *snip* Project Links ============= - `Project Summary page`__: `released files`__, `bug reports`__, patches__, `mailing lists`__, and news__. - `Docutils CVS repository`__ - Project coordinator and architect: `David Goodger`_ - Please direct discussions to the `Python Documentation Special Interest Group (Doc-SIG)`__: doc-sig@python.org. - Powered by |Python|__ - Hosted by |SourceForge|__ __ http://sourceforge.net/projects/docutils/ __ `project files page`_ __ http://sourceforge.net/tracker/?group_id=38414&atid=422030 __ http://sourceforge.net/tracker/?group_id=38414&atid=422032 __ http://sourceforge.net/mail/?group_id=38414 __ http://sourceforge.net/news/?group_id=38414 __ CVS_ __ Doc-SIG_ __ http://www.python.org/ .. |Python| image:: PyBanner016.png .. :border: 0 __ http://sourceforge.net/ .. |SourceForge| image:: http://sourceforge.net/sflogo.php?group_id=38414 :alt: SourceForge Logo .. :border: 0 .. _project files page: http://sourceforge.net/project/showfiles.php?group_id=38414 .. _Anonymous CVS access: http://sourceforge.net/cvs/?group_id=38414 .. _CVS: .. _browse the CVS repository: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/docutils/ .. _To Do list: spec/notes.html#to-do .. _README: README.html .. _HISTORY: HISTORY.html .. _master repository: http://www.python.org/peps/ .. _mailto: .. _David Goodger: mailto:goodger@users.sourceforge.net .. _Doc-SIG: http://www.python.org/sigs/doc-sig/ *snip* Please don't try to tell me that a newbie easily finds the URL for `Docutils CVS repository`__ Bye, Simon -- Simon.Budig@unix-ag.org http://www.home.unix-ag.org/simon/
Simon, I don't have time to answer all of your points right away, but I can quickly respond to some of them. Simon Budig wrote:
It is in my eyes a great pity that you seem to have a tendency to limit reST to the docstring-scope.
Nothing could be further from the truth! Although a major goal is to get docstring processing working, and it will be a major part of the project, take a look at the project right now: docstring processing is not yet part of the core. Tony Ibbs has a preliminary prototype in the sandbox, but that's it. The only working part of the project now is geared toward producing web pages! And there's plenty of room for improvement, which would be quite welcome. Just because I'm not accepting the ``reference_(URL)`` syntax, doesn't mean reStructuredText is anti-HTML. Simply put, that syntax just doesn't fit with the rest of reStructuredText. Please don't confuse unrelated issues. [Referring to http://docutils.sf.net/index.txt, source of http://docutils.sf.net/index.html:]
Just to show the uglyness of anonymous and named links: This is from the webpage source (as in CVS):
Yes, the source to that web page is not pretty. It was written in a lazy manner, using many anonymous hyperlinks. But that's OK, because this is an example of a document where only the HTML is meant to be seen. The source is *not* intended to be read by anyone but me or other project developers. That file is not distributed with the project code or documentation. It is not given as a model for good reStructuredText usage. Also, being a home page, it is complex and full of external links, much more so than a typical document.
Please don't try to tell me that a newbie easily finds the URL for `Docutils CVS repository`__
Sorry to burst your bubble, but I think it would be *very easy* for a newbie to find the URL. They just click on the text "Docutils CVS repository" on the HTML page, and their browser takes them there. No newbie would ever be exposed to the source text. (This is a poor choice to single out as a poor example. ;-) will (willg@bluesock.org) wrote:
Personally, I think you're crazy to use reST with SSI and whatever else to build your web-site.
I don't think it's crazy; the Docutils web site is built with reStructuredText/Docutils and I intend for this functionality to become more sophisticated. I agree with Simon when he wrote: [Simon]
I think that there really is a need for a simple markup language that can be used by - for example - a secretary to maintain a simple Website. HTML fails the "understandable to non-geek-guys"-test, GUI-tools are known to produce crappy HTML code when used by non-experts.
reST really could fill a gap here.
And that's one of the things it's already doing, and pretty successfully I think. Of course there's room to grow.
Just to show you what is possible with SSIs and why I think using reST for webpages is not crazy:
Have a look at http://www.home.unix-ag.org/simon/bsdaemon/
Apart from the extras pulled in through server-side includes, the page is very simple, nothing that reStructuredText couldn't handle.
The raw source of it is::
<!--#include virtual="$SCRIPT_NAME/../../include/head_start.shtml" -->
A server-side include directive could be added to reStructuredText, no problem. Although I'm very careful about new *syntax*, new directives are much easier to let in, because they're explicit and typically don't require new syntax (and if they do, it's localized and *explicit*). ...
<img src="bsdaemon.png" alt="Preview of the BSD Daemon" align="right" width="326" height="352"><p>
I've thought of adding support for more image attributes, like "align". Care to try?
I think it should be fairly trivial to create such an output with reST
I agree, as long as the fancy graphical layout elements don't have to be reStructuredText. That's what stylesheets and server-side includes are for. -- David Goodger <goodger@users.sourceforge.net> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/
Hi all, Hi david. Sorry, I did not mean to stomp on somebodys toes with my reference to "docstring centric" design. I apologize for this. David Goodger (goodger@users.sourceforge.net) wrote:
Simply put, that syntax just doesn't fit with the rest of reStructuredText.
However, I'd like to point out a flaw in your argument. It seems that to me that the design-goals of reST could be clarified a bit. In another mail you wrote:
It comes down to this: the top goal of reStructuredText is to be as readable in plaintext (source) form as in processed form.
in this Mail however, you wrote:
[Referring to http://docutils.sf.net/index.txt, source of http://docutils.sf.net/index.html:]
Just to show the uglyness of anonymous and named links: This is from the webpage source (as in CVS):
Yes, the source to that web page is not pretty. It was written in a lazy manner, using many anonymous hyperlinks. But that's OK,
no, it is not, because this shows that the current link syntax easily leads to files where the topmost goal (readability in raw and processed form) is seriously hurt. In this case the badness if this file counts twice because...
because this is an example of a document where only the HTML is meant to be seen. The source is *not* intended to be read by anyone but me or other project developers.
... this is definitely wrong. index.txt is (intended to be) linked from the bottom of the docutils website. As a newbie searching for information on how this looks in practice (not the example texts with all features in one file) I certainly would look at the source of the page. And frankly: If I weren't so stubborn, the usage of links in this sample would scare me away from reST.
[...] Also, being a home page, it is complex and full of external links, much more so than a typical document.
What exactly is a typical document? I think that the usage of reST for a homepage with lots of links would be a good use for reST.
Please don't try to tell me that a newbie easily finds the URL for `Docutils CVS repository`__
Sorry to burst your bubble, but I think it would be *very easy* for a newbie to find the URL. They just click on the text "Docutils CVS repository" on the HTML page, and their browser takes them there. No newbie would ever be exposed to the source text. (This is a poor choice to single out as a poor example. ;-)
I think the above explains why I think it is a perfectly valid choice. I am wondering a bit about your reasoning. The rejection of my proposal is based on it's "readability" in the source. When I point out that the current syntax has it's flaws too your argument basically is "It doesn't matter, because nobody will ever read it". Uhm. What was the argument against my proposal again? Let me rephrase the main goals of my proposal. My focus is not mainly the reST-source-readability, the processed output is currently more important to me. I want to improve the maintainability of links in reST. Anonymous Links IMO have the maintainance-problem, that inserting links or removing links in a paragraph always means monotonous and error prone counting. You will always have to check on the processed output, if the links indeed point to the correct target. Named Links IMO have the maintainence-problem that the reference-text appears twice in the document. This is prone to errors, since you can easily introduce mismatches. Both approaches have the problem, that the reference and the target specification can be quite a bit apart. This makes it necessary, that a change to a reference or a link might need editing in two different places in the source file. From my personal experience with myself this is always a problem, regardless if I am editing sourcecode, reST, HTML or whatever. I tend to forget the editing in the second place. Maybe it is just me, but I doubt this. My reference_(target) proposal would solve these maintainance problems, since it keeps reference and target close together. In my eyes - but this is a controversial point in this discussion - it does not look too weird to the unsuspecting reader of the reST-source, since the usage of parentheses suggests that their content is a remark related to the topic mentioned above (which an URL indeed is), the added underscore is the same as in the current use of references and makes the distinction between this construct and "topic (remark)"-usage possible. As a last remark: I promise, that I will not occupy this mailinglist forever with this stuff. I guess I will be tired of this discussion by the end of the week or so - but I still think that this is a good idea... ;-) Since the rest of your mail had a focus on more technical stuff I will change the subject a bit and reply to it in a separate mail. Bye, Simon -- Simon.Budig@unix-ag.org http://www.home.unix-ag.org/simon/
Simon Budig wrote:
This essentially boils down to "Ok, named links are hard to use, so we introduced anonymous links. They are hard to maintain too, but a little less."
I'd say that hyperlinks are hard to mark up, period. You can't have a markup that is simultaneously readable, unobtrusive, easy, maintainable, and URLs-close-to-the-reference-text. You have to choose some aspect as most important; and you have to give up something. reStructuredText chose "readable" and "unobtrusive" as the most important aspects.
So to use links in reST I have the choice between two evil things: 1) make sure that Link targets and Link text are absolutely the same 2) count "__"s.
This is exactly why I proposed a third solution :-) ... I'd think this is a good thing...
IMO, the solution is worse than the problem. But I'm not interested in debating this point *ad nauseum* (and it is). I've already written a bunch of replies to yesterday's posts, but I think it's better to deep-six them. This is getting old, real fast. Our positions can be summed up with:
Ok, we seem to disagree here fundamentally. I personally think that the proposed syntax is readable, you think it isn't. Not sure how to solve this...
You seem to feel inline URLs are important. Here's a chance to prove your position. I'll leave it up to you to implement them, on an experimental basis. There is a proposed mechanism for experimental syntax called "pragma directives": It may also be possible for directives to be used as pragmas, to modify the behavior of the parser, such as to experiment with alternate syntax. There is no parser support for this functionality at present; if a reasonable need for pragma directives is found, they may be supported. (http://docutils.sf.net/spec/rst/reStructuredText.html#directives) I will help with the infrastructure (any changes that need to be made to the parser to accept pragma directives), but I won't implement the parsing itself. Here's an example of how such a directive might work:: .. enable-inline-urls:: Ordinary text ... A paragraph containing an `inline hyperlink`_(http://www.example.org/). However, I really don't like that syntax; it doesn't make sense. Let's examine it and see if we can come up with something better. I have two objections: 1. The "`ref`_(URL)" syntax forces the last word of the reference text to be joined to the URL, making a potentially very long word that can't be wrapped (URLs can be very long). The reference and the URL should be separate. 2. The "inline hyperlink" text is *not* a named reference (there's no lookup by name), so it shouldn't look like one. Instead, use the anonymous double-underscore syntax. Perhaps a matching double-underscore "anonymous inline target" syntax for the URL as well? A space in-between would separate the reference from the target and allow words to wrap. For example:: A paragraph containing an `inline hyperlink`__ __`http://www.example.org/`. Yes, that's much better. A bit more verbose, but it fits better with the rest of the syntax. If you insist on parentheses, then some compromise may do. Perhaps:: A paragraph containing an `inline hyperlink`__ __(http://www.example.org/). However, looking at the URI-recognition code (based on the IETF standards RFC 2396 and RFC 2732), parentheses are legal URI characters. This would introduce ambiguity (a legal URI containing parentheses wouldn't be recognized properly). Curly braces and backquotes are not legal URI characters, but they *are* legal email characters (see RFC 822). It's not easy to come up with a completely unambiguous syntax! The only useful characters that are neither URI characters nor email characters are angle brackets, "<>". So the syntax becomes:: A paragraph containing an `inline hyperlink`__ __<http://www.example.org/>. Which actually doesn't look too bad. There's precedent in using angle brackets for URIs. Coming full circle, perhaps we can now drop the leading "__":: A paragraph containing an `inline hyperlink`__ <http://www.example.org/>. Ah, but then it would be difficult to write about HTML/XML/SGML tags ("img" in "the <img> tag" would be parsed as a relative URL). We *could* recognize inline URLs only immediately after anonymous references, but that would require keeping track of state. So the leading "__" *is* required. Once the pragma directive is implemented, we'll see how it fares in the real world. I may accept it into standard reStructuredText, leave it in as a pragma directive, or reject it outright. Your mission, Mr. Budig, if you choose to accept it, is to create an "enable-inline-urls" pragma directive that implements some variation of the above syntax (recommended: "__<URI>"). Except for necessary infrastructure support for pragmas, there should be no changes to the parser itself. I will work with you to add support for pragmas. -- David Goodger <goodger@users.sourceforge.net> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/
David Goodger (goodger@users.sourceforge.net) wrote:
You seem to feel inline URLs are important. Here's a chance to prove your position. I'll leave it up to you to implement them, on an experimental basis. There is a proposed mechanism for experimental syntax called "pragma directives":
It may also be possible for directives to be used as pragmas, to modify the behavior of the parser, such as to experiment with alternate syntax. There is no parser support for this functionality at present; if a reasonable need for pragma directives is found, they may be supported.
(http://docutils.sf.net/spec/rst/reStructuredText.html#directives)
I will help with the infrastructure (any changes that need to be made to the parser to accept pragma directives), but I won't implement the parsing itself. Here's an example of how such a directive might work::
.. enable-inline-urls::
[...]
A paragraph containing an `inline hyperlink`__ __<http://www.example.org/>.
Once the pragma directive is implemented, we'll see how it fares in the real world. I may accept it into standard reStructuredText, leave it in as a pragma directive, or reject it outright.
Your mission, Mr. Budig, if you choose to accept it, is to create an "enable-inline-urls" pragma directive that implements some variation of the above syntax (recommended: "__<URI>"). Except for necessary infrastructure support for pragmas, there should be no changes to the parser itself. I will work with you to add support for pragmas.
I'd like to raise two points. First I am not sure if the use of pragmas to change the behaviour is a good way to do this. There might be a need for lots of different local extensions to the syntax. You'd end up implementing lots of pragmas... It might be better to have either a pragma that looks like:: .. reST-options:: :inline-urls: true :math-markup: true :whatever-id: "blah" or something like this. So you could have a more generic framework for extensions to the parser. reST could provide a mechanism to derive for example class names from the first field and try to import and plug them into the parser. This would also make it easier to avoid having to type this pragma by creating customized document processors where you would do something like parser.add_plugin (InlineUrlPlugin (1)) The second point is closely connected to this. When looking at Inline markup the parsing work is done by a class "Inliner". This is dominated by a huge regular expression that matches to a lot of different constructs. In my eyes it would be better to break this apart in different regular expressions and test them in a sequence (it might be necessary to remember which match starts first). An extension could add a regular expression to that list instead of having to replace a complicated regular expression with an even more complicated regex. Of course this would mean that there *would* be changes to the parser itself, but it might result in a more flexible parsing framework. Do you think this is worth it? Bye, Simon -- Simon.Budig@unix-ag.org http://www.home.unix-ag.org/simon/
Simon Budig wrote:
First I am not sure if the use of pragmas to change the behaviour is a good way to do this. There might be a need for lots of different local extensions to the syntax. You'd end up implementing lots of pragmas...
It might be better to have either a pragma that looks like::
.. reST-options:: :inline-urls: true :math-markup: true :whatever-id: "blah"
Lots of individual directives, or one large pragma directive with subcommands. Either way would be fine. I'd drop the "true" though; just the presence of the field is enough.
reST could provide a mechanism to derive for example class names from the first field and try to import and plug them into the parser.
Too much magic; potentially dangerous. Better to have a registry.
This would also make it easier to avoid having to type this pragma by creating customized document processors where you would do something like
parser.add_plugin (InlineUrlPlugin (1))
Yes, something along those lines. But please don't worry about the mechanics; it's too early.
The second point is closely connected to this. When looking at Inline markup the parsing work is done by a class "Inliner". This is dominated by a huge regular expression that matches to a lot of different constructs. In my eyes it would be better to break this apart in different regular expressions and test them in a sequence (it might be necessary to remember which match starts first). An extension could add a regular expression to that list instead of having to replace a complicated regular expression with an even more complicated regex.
The "Inliner" class has to use one large regular expression. If we have some text like this:: Here is an ``inline **literal**``. If we check for "strong" (**) first, the result will be wrong. No ordering would get it right for all constructs. We have to check for each start-string simultaneously, because there are no precedence rules (almost); first occurrence from left to right in the text is the determinant. But that idea is close to the solution I'm thinking of. My idea is to break up the one huge regexp into several lists of individual regexps, one list per construct/regexp type (find start-string only, find the whole construct, etc.), and join them dynamically into compound OR-groups, building the large regexp from components at runtime. Dynamic syntax directives can install new regexps and rebuild the master regexp.
Of course this would mean that there *would* be changes to the parser itself, but it might result in a more flexible parsing framework.
This is the infrastructure support I spoke of. For now, please just make a subclass of the "Inliner" class and pass it to the parser. See the PEP reader for an example. Don't try to be fancy, just brute-force copy & paste the code you need from docutils.parsers.rst.states.Inliner; we'll sort out what needs to be done afterward. Please put your code in the sandbox for now (see http://docutils.sf.net/spec/notes.html#additions-to-docutils). -- David Goodger <goodger@users.sourceforge.net> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/
David Goodger (goodger@users.sourceforge.net) wrote:
Simon Budig wrote:
The second point is closely connected to this. When looking at Inline markup the parsing work is done by a class "Inliner". This is dominated by a huge regular expression that matches to a lot of different constructs. In my eyes it would be better to break this apart in different regular expressions and test them in a sequence (it might be necessary to remember which match starts first). An extension could add a regular expression to that list instead of having to replace a complicated regular expression with an even more complicated regex.
The "Inliner" class has to use one large regular expression. If we have some text like this::
Here is an ``inline **literal**``.
If we check for "strong" (**) first, the result will be wrong. No ordering would get it right for all constructs. We have to check for each start-string simultaneously, because there are no precedence rules (almost); first occurrence from left to right in the text is the determinant.
This is why I meant that it might be necessary to remember which match starts first. To emulate the behaviour of a big regex we have to match against all regexes, check which one starts closest to the beginning of the string and if this is ambigous check, which one is the longest match. Advantage: This would immediately give the matching construct.
But that idea is close to the solution I'm thinking of. My idea is to break up the one huge regexp into several lists of individual regexps, one list per construct/regexp type (find start-string only, find the whole construct, etc.), and join them dynamically into compound OR-groups, building the large regexp from components at runtime. Dynamic syntax directives can install new regexps and rebuild the master regexp.
The advantage of this approach is that it might be a bit more quick since it is inside a single regular expression. It makes it a bit harder to detect what actually was the matching regex. Of course this is doable via ((?P<regex1>blablabla)|(?P<regex2>blu(?P<data>b*)lubb)) and then check, which of the named groups regex1 or regex2 matches. It might be a problem because you have to be careful with the naming of additional groups in the different regexes to avoid conflicts. Bye, Simon -- Simon.Budig@unix-ag.org http://www.home.unix-ag.org/simon/
[David Goodger:]
The "Inliner" class has to use one large regular expression. If we have some text like this::
Here is an ``inline **literal**``.
If we check for "strong" (**) first, the result will be wrong. No ordering would get it right for all constructs. We have to check for each start-string simultaneously, because there are no precedence rules (almost); first occurrence from left to right in the text is the determinant.
[Simon Budig:]
This is why I meant that it might be necessary to remember which match starts first. To emulate the behaviour of a big regex we have to match against all regexes, check which one starts closest to the beginning of the string and if this is ambigous check, which one is the longest match.
Advantage: This would immediately give the matching construct.
But at what cost? Sounds very complex. It ain't broke. Why fix it? Let's just use the big regexp, and not try to emulate it.
But that idea is close to the solution I'm thinking of. My idea is to break up the one huge regexp into several lists of individual regexps, one list per construct/regexp type (find start-string only, find the whole construct, etc.), and join them dynamically into compound OR-groups, building the large regexp from components at runtime. Dynamic syntax directives can install new regexps and rebuild the master regexp.
The advantage of this approach is that it might be a bit more quick since it is inside a single regular expression. It makes it a bit harder to detect what actually was the matching regex. Of course this is doable via ((?P<regex1>blablabla)|(?P<regex2>blu(?P<data>b*)lubb)) and then check, which of the named groups regex1 or regex2 matches. It might be a problem because you have to be careful with the naming of additional groups in the different regexes to avoid conflicts.
If it ever does become a problem, we'll deal with it. Until then, I don't see the point of redesigning something that works well. I don't think we'll be adding much more to the regexp, so I don't anticipate running into name clashes any time soon. If you think it's worth doing though, please try it and show us. -- David Goodger <goodger@users.sourceforge.net> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/
David Goodger (goodger@users.sourceforge.net) wrote:
You seem to feel inline URLs are important. Here's a chance to prove your position. I'll leave it up to you to implement them, on an experimental basis.
In the sandbox there is a drop in replacement for the states.py file. I have not yet come around to implement this as a subclass of Inliner() but it should not be too hard - all changes to the file are insde this class... It is nearly 5 am now and I don't want to think about that now... :-) [...]
However, I really don't like that syntax; it doesn't make sense. Let's examine it and see if we can come up with something better. I have two objections:
1. The "`ref`_(URL)" syntax forces the last word of the reference text to be joined to the URL, making a potentially very long word that can't be wrapped (URLs can be very long). The reference and the URL should be separate.
2. The "inline hyperlink" text is *not* a named reference (there's no lookup by name), so it shouldn't look like one.
I have now implemented reference__ __<uri> and `refe rence`__ __<uri>. they are analogous to anonymous links. I also implemented reference_ _<uri> and `refe rence`_ _<uri> analogoes to named links, this is some kind of closure of the syntax (mathematically speaking... :-) I am currently not sure if the possibility to wrap before long URLs is worth the added line noise by doubling the underscores. I think reference_<uri> resp. reference__<uri> might be acceptable too. Comments? Bye, Simon -- Simon.Budig@unix-ag.org http://www.home.unix-ag.org/simon/
Simon Budig wrote:
In the sandbox there is a drop in replacement for the states.py file.
Great! I'll take a look.
I have not yet come around to implement this as a subclass of Inliner() but it should not be too hard - all changes to the file are insde this class...
This way is fine; don't bother converting it into a subclass. We can use "diff".
It is nearly 5 am now and I don't want to think about that now...
You're keeping hacker's hours. ;-) It's 1 am here; time for bed.
I have now implemented reference__ __<uri> and `refe rence`__ __<uri>. they are analogous to anonymous links.
Did you allow for long URIs split over lines? This would have to be allowed:: reference__ __<http://this.is.the.beginning .of.a.very.long.uri.com/index.html#and-here-is -even-more>
I also implemented reference_ _<uri> and `refe rence`_ _<uri> analogoes to named links, this is some kind of closure of the syntax (mathematically speaking... :-)
Where the target name is implied. Yes, I suppose it follows.
I am currently not sure if the possibility to wrap before long URLs is worth the added line noise by doubling the underscores. I think reference_<uri> resp. reference__<uri> might be acceptable too.
Without the spaces & matching underscores the syntax would be too subtle I think. And allowing for line-wrapping is important. -- David Goodger <goodger@users.sourceforge.net> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/
(as promised in a separate mail, focus on technical stuff) David Goodger (goodger@users.sourceforge.net) wrote:
Simon Budig (simon.budig@unix-ag.org) wrote:
Just to show you what is possible with SSIs and why I think using reST for webpages is not crazy:
Have a look at http://www.home.unix-ag.org/simon/bsdaemon/
Apart from the extras pulled in through server-side includes, the page is very simple, nothing that reStructuredText couldn't handle.
The raw source of it is::
<!--#include virtual="$SCRIPT_NAME/../../include/head_start.shtml" -->
A server-side include directive could be added to reStructuredText, no problem. Although I'm very careful about new *syntax*, new directives are much easier to let in, because they're explicit and typically don't require new syntax (and if they do, it's localized and *explicit*).
I am not sure if this would be necessary. I would not want to have these SSI-directives in my sourcecode, since they add unnecessary complexity to the raw page source ("The secretary would have to know about SSIs"). I would prefer if the stuff could be embedded easily in a template system. Either make it easy to write a tool for creating the pages where the site administrator has full control over the HTML output of the htmp4css1-writer. This means customizeable headers with the option to discard the headers/footers. I am not sure how the preferred use of the docutils would be for a random site administrator. I think I would try to write a small propriate application where I would try to derive the correct writer class and expand it with my personal preferences. Other people might prefer to have a simple template system where you can specify a simple template. Also the class names used in some <span>'s should be customizeable, maybe a dictionary with a native <--> target mapping of the class names.
<img src="bsdaemon.png" alt="Preview of the BSD Daemon" align="right" width="326" height="352"><p>
I've thought of adding support for more image attributes, like "align". Care to try?
Maybe on the weekend, when I manage to get the CVS to sourceforge to work... Bye, Simon -- Simon.Budig@unix-ag.org http://www.home.unix-ag.org/simon/
Simon Budig wrote:
(as promised in a separate mail, focus on technical stuff)
Good idea. I'll merge in some technical answers from previous posts.
I would not want to have these SSI-directives in my sourcecode, since they add unnecessary complexity to the raw page source ("The secretary would have to know about SSIs").
That's fair.
I would prefer if the stuff could be embedded easily in a template system. Either make it easy to write a tool for creating the pages where the site administrator has full control over the HTML output of the htmp4css1-writer. This means customizeable headers with the option to discard the headers/footers.
[Simon, from previous post]
But for a start I'd like to have a bit more configurable html converter for reST. This means for example to be able to switch off the <html>....<body> and </body>...</html> parts.
The HTMLTranslator class of the docutils/writers/html4css1.py module keeps the parts separately: head_prefix (<?xml ...><!DOCTYPE html ...><html ...> ... <link rel="stylesheet" ...>), head (<title> & <meta>), body_prefix (</head><body>), body (page contents), and body_suffix (</body></html). These are all lists of strings. I'll expose these in the Writer class. Beyond that it's up to you. Please don't feel that you have to use html4css1.py. It's just one way of producing HTML. You can write your own, or subclass it and add in your customizations. Patches are gratefully accepted.
I am not sure how the preferred use of the docutils would be for a random site administrator. I think I would try to write a small propriate application where I would try to derive the correct writer class and expand it with my personal preferences. Other people might prefer to have a simple template system where you can specify a simple template.
Sounds good.
Also the class names used in some <span>'s should be customizeable, maybe a dictionary with a native <--> target mapping of the class names.
I don't follow. Examples? [David, from previous post]
However, such functionality is certainly within the realm of possibility, and I'd encourage anyone to tackle the challenge posed in the To Do list:
Construct a _`templating system`, as in ht2html/yaptu, using directives and substitutions for dynamic stuff.
[Simon]
I don't know yaptu.
It's a simple templating tool by Alex Martelli, a recipe in the Python Cookbook: http://aspn.activestate.com/ASPN/Python/Cookbook/Recipe/52305. I helped to extend it a bit, and used it for the old project pages. The extended version is here: http://structuredtext.sourceforge.net/yaptu.py. -- David Goodger <goodger@users.sourceforge.net> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/
David Goodger (goodger@users.sourceforge.net) wrote:
Simon Budig wrote: <meta>), body_prefix (</head><body>), body (page contents), and body_suffix (</body></html). These are all lists of strings. I'll expose these in the Writer class. Beyond that it's up to you.
Please don't feel that you have to use html4css1.py. It's just one way of producing HTML. You can write your own, or subclass it and add in your customizations.
With my python knowledge coming mainly from 1.5.x I am not sure if I got the idea behind the packages correctly. Could somebody point me to a resource, why it is a good idea to have different classes with the same name (there are writers.Writer, html4css1.Writer and docutils_xml.Writer in docutils, why aren't they named after what they actually write?) I most probably miss something here since this technique is also used in the "encodings" package from the core python.
Also the class names used in some <span>'s should be customizeable, maybe a dictionary with a native <--> target mapping of the class names.
I don't follow. Examples?
In processed reST output you find class names like '<a class="reference"' or '<p class="field-name">'. If you want to include reST output as part of a larger website these class names might clash with the names in the system-wide css file. It might be useful to be able to replace the class names used by reST with the class names used in the rest of the site, so that reST does not output the stuff above but '<a class="link"' instead. I am not sure how important this is, alternatively you could also adjust the CSS file. But it seems nicer to me to be able to control the output of reST instead of having to adjust the rest of your framework to the needs of reST. Hmm. Bye, Simon -- Simon.Budig@unix-ag.org http://www.home.unix-ag.org/simon/
Simon Budig wrote:
With my python knowledge coming mainly from 1.5.x I am not sure if I got the idea behind the packages correctly. Could somebody point me to a resource, why it is a good idea to have different classes with the same name (there are writers.Writer, html4css1.Writer and docutils_xml.Writer in docutils, why aren't they named after what they actually write?) I most probably miss something here since this technique is also used in the "encodings" package from the core python.
I did it that way out of practicality. The front-end tells Docutils which format it wants. Docutils looks up that format name in a mapping, to determine the actual module name ({'html': 'html4css1'}). The docutils.writers module (docutils/writers/__init__.py) imports the module, and returns the Writer class. If each Writer class had a different name, there would be one more level of indirection, one more variable. Of course, each writer could be given the same name as its module, but I prefer lowercase module names and StudlyCaps class names. If done that way, it wouldn't be possible pass around the module; the class itself would have to be passed. It may be an arbitrary decision, but it works well for Docutils and hasn't presented any problems. Plus, I find the uniformity of API elegant.
Also the class names used in some <span>'s should be customizeable, maybe a dictionary with a native <--> target mapping of the class names.
I don't follow. Examples?
In processed reST output you find class names like '<a class="reference"' or '<p class="field-name">'. If you want to include reST output as part of a larger website these class names might clash with the names in the system-wide css file.
I see. It's not our problem. If an application has this problem, it can deal with it; Docutils doesn't need to. If a real example of conflict ever appears, we can deal with it then. Until then, think XP: "always do the simplest thing that could possibly work" and "never add functionality before it's needed." -- David Goodger <goodger@users.sourceforge.net> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/
participants (6)
-
Aahz -
David Goodger -
Simon Budig -
Simon Budig -
Walter Dörwald -
will