Given that there's no parser yet, I suppose now is a good time to consider Wiki markup requirements and how they might be added as extensions or incorporated into the specification?
I'm not a Wiki user, so I don't know what special Wiki requirements might look like. Can you elaborate?
See QuickWikiBackground_ below for a quick background on what the hell Wiki is. Skip to WikiModificationsToReStructuredText_ if you just want to find out what modifications might be required to the reStructuredText specification. Just for a play, I'm going to try writing this mail in reStructuredText. .. _QuickWikiBackground: QuickWikiBackground =================== Wiki fans, my apologies if any of this is a tad confrontational or bordering on flamebait [FlameFlame]_. I'm in a hurry. :/ A Wiki is a web-based content management system that makes buiding web sites as easy as tunnelling with a magic shovel. Three key reasons for this ease of use are: * Wiki is brower based, * Most of what you type in will display properly, and * You create links with WordsOfManyCapitalLetters [7]_. If I'd typed that into a Wiki, what I'd get when I hit the Submit would be some normal text, a bullet list, and a link called WordsOfManyCapitalLetters. If I clicked on it, I'd be fed the edit form for a newly created WordsOfManyCapitalLetters page. .. _[7]: Properly called WikiNames. .. _[8]: Purists might want blank lines between those, but I want this to be as close to copy-and-paste from email as I can get it. .. _[9]: I first noticed the need for automatic footnote numbering in footnote [1]_. Now I've finally hit the tangle. I've also just noticed that if I were rendering this for printing I'd want a different format for the explicit link to the footnote -- specifically, not a subscript. .. _[FlameFlame]: This document has so much markup and so many bloody footnotes [cough]_ it'll be a miracle if anyone manages to read far enough to find anything worth flaming. If I referred to DocstringProcessingSystem, that'd become a link, and if I clicked on it I'd be editing the DocstringProcessingSystem page. Creating a new Wiki has so little friction the process can become a non stop brain dump. I created 150+ pages on a ZWiki in spare time between meetings in a week. I'm sure others have done more. To have a play with a Python-based Wiki, check out MoinMoin_. My biggest problem with MoinMoin is that the default markup drives me nuts (sorry, Jurgen). I have some other issues (see BlogBlog_ [LameLame]_), but my biggest source of angst is probably the parser issue. MoinMoin supports plug-in parsers, but none of them grab me yet. See below. I'm slightly more comfortable with the markup used in ZWiki_, which is heavily based on StructuredText. StructuredText has its own problems, though. In short, it's bloody unpredictable. StructuredTextNG might fix it, and the BizarStructuredText plug-in contributed to MoinMoin by RichardJones [1]_ reStructuredText_ seems a lot cleaner to me than either StructuredText or WikiWiki markup. I have marginal concerns about how "normal" people will cope with the underscore suffix [2]_ for links, but my reading of the spec was pleasant enough. What I'd like to do is develop a reStructuredText plug-in parser for MoinMoin, also one for MoinWiki:BlogBlog [3]_ (which I've decided is going to pre-parse pages to XML for various reasons) .. _MoinMoin: http://moin.sourceforge.net/cgi-bin/moin/moin/ .. _BlogBlog: http://moin.sourceforge.net/cgi-bin/moin/moin/BlogBlog .. _ZWiki: http://zwiki.org/FrontPage .. _[LameLame]: I've been promising to finish work on a Wiki clone for years now. I did some work on ZWiki modifications (NooZWiki), which was fine until I got sick of a) edit conflict error handling in Zope, and b) the incredible costs of Zope posting at the time. I'm now caught between extending MoinMoin to suit my purposes -- it's really quite amazing -- or writing my own. I sympathise with anyone tempted to consider me a mere meddling troublemaker until I finally cough up some code. .. _NooZWiki: http://zwiki.org/NooZWiki .. _reStructuredText: http://structuredtext.sf.net .. _[1]: As you can see, automatically dropping in WikiNames is an easy habit to fall into -- you'll find yourself doing it accidentally in email. Whilst I'm in a reStructuredText footnote, however, surely they should be automatically numbered? [9]_ .. _[2]: The problem is that the underscore becomes a prefix when finally pointed. If it's confusing me, it's definitely going to confuse my users. They're going to have enough trouble just with .. `identifier`: URL. Is this another paragraph in the [2]_ footnote, or is this in the main text? .. _[3]: This is an example of an InterWiki_ link, a streamlined way of pointing to a particular page at another known Wiki. .. _InterWiki: http://moin.sourceforge.net/cgi-bin/moin/moin/InterWiki .. _WikiModificationsToReStructuredText: Pant. Wheeze. This is hard work. Especially the footnotes. WikiModificationsToReStructuredText [4]_ [5]_ =================================== Wiki would need the following from reStructuredText: * Suddenly, there's markup that doesn't look like punctuation. WikiName might well end up being a link. * As written, a reStructuredText document will always parse the same way. Once you introduce WikiNature, it'll parse differently depending on which other WikiNames are defined. [BlogBlogXML]_ [SickOfRememberingNumbers]_ [InternalWikiNames]_ * Square brackets are extremely useful in ZWiki markup to force a word that doesn't look like a WikiName to be treated as a link to a page of that name. For example, [David]. [10]_ * I dimly remember square brackets also being useful for an inline linking representation I don't recall from the reStructuredText markup. It's something like [here is some text: URL]. A more reStructuredText way of doing it would be something like `here is some text`:URL. [11]_ [12]_ * MoinMoin further overloads square brackets for an EXTREMELY [6a]_ useful macro system. .. _[BlogBlogXML]: ... which is why I want to preparse documents to XML in BlogBlog. <potentialWikiName>, here we come. That brings us back to predictable output, which I can then reparse to produce appropriate HTML. .. _[SickOfRememberingNumbers]: Bah! And now, I'm suddenly wondering whether underscored link destinations in reStructuredText specifically use square brackets to say "this is a footnote", or whether that's handled by the presence of non-URL text after the colon without an intervening newline. .. _[InternalWikiNames]: Wouldn't it be nice for other pages in the Wiki to be able to refer to this footnote as WhateverThisDocumentIsNamed.InternalWikiNames? .. _[4]: The `MoinMoin heading style`_ seems to have a lot to offer reStructuredText, which seems unable to do sub-headings. At least, having read the spec, I don't remember how to do it -- and the markup really has to be that simple, hence my problems with the underscores [1]_ [6]_. .. _[5]: Another confusion: should the equals signs extend all the way, or not? I hope not, otherwise users editing with proportional fonts are going to have an awful time. .. _[6]: Aha! Sometimes, you need [6a]_ to be able to explicitly target a previous footnote. Now I'm thinking "named, automatically numbered footnotes". .. _[6a]: Hang on, how do I embolden again? Suddenly, the StructuredText *embolden this* format looks lovely. `embolden this`*? *`embolden this`*? .. _`MoinMoin heading style`: http://moin.sourceforge.net/cgi-bin/moin/moin/HelpOnHeadlines .. _[10]: Some Wikis scan any word with an initial capital to see whether or not there's a page of that name, in which case you only need the square brackets to force a link to the uncreated page when you're first creating it. .. _[11]: I like the backquotes! .. _[12]: We need to also consider `here is some text`:WikiName. My brain hurts. This is harder than it looks. I should never [6a]_ have started with the footnotes. :) Regards, Garth. .. _[cough]: You are in a twisty maze of little passages, all alike.
It just occurred to me: The lowest friction implementation for a Wiki using reStructuredText is to implicitly assume that if a _ target doesn't exist in the current document, the link must be intended to refer to another document in the system. This would still require people to add the _ suffix for anything they want linked, but would eliminate the concerns about non-punctuation markup and ways of forcing links to pages whose names aren't that WikiLike. I'm concerned, however, that people could end up with some truly wierd names:: See `the rest of my documentation on the fnarzle system`_ for more details. ... would imply a page named "the rest of my... system". I think it's a tad ugly, and that it might encourage slackness that WikiNamesOfTooManyWordsLookVerySilly tends to implicitly suppress. :) On the other hand, the system can always refuse to create pages of such names. This brings me back to the potential need for a "short form" of the link syntax. Consider an attempt to have the above example link to a more reasonably named FnarzleDocs:: See `the rest of my documentation on the fnarzle system`_ for more details. .. _`the rest of my documentation on the fnarzle system`: FnarzleDocs_ ... with FnarzleDocs_ being unresolved within the current document, resolving to an in-system URL, and collapsed by the parser into a single link with the appropriate text. I think my users would prefer:: See `the rest of my documentation on the fnarzle system`:FnarzleDocs for more details. BTW, I just found the section in the spec that permits:: I think you should `download Python 2.1`:http://www.python.org/2.1/ before you touch that ugly Perl code. ... which makes my suggestion look fairly reasonable. Miscellaneous concerns: - embedding directives like includes or macro calls mid-paragraph. - being able to recognise indented paragraphs solely by their first line, so that people can lazily just keep typing (like I'm doing now) without having to manually terminate lines and indent the next one. The behaviour is arguably implied by the specification ("This is a paragraph continuation, not a sublist"), but I'd love explicit confirmation because it might be argued that an outdent implicitly terminates the previous paragraph but an indent doesn't. Needless to say, I'd argue against that. - underlined style headings; I *really* like MoinMoin's use of a number of equals signs on each side of a paragraph: if the number of equals signs is the same, the paragraph is a heading of ident level equal to the number of equals signs. For example:: = this is a level 1 heading = == this is a level 2 heading == == this is also a level 2 heading despite the fact that it has been wrapped for some reason == === this is not a heading because the number of equals signs isn't equal <ahem>. == - I would hope that the reStructuredText parser is smart enough to figure out that the example text above is part of the <li> above? - What's wrong with an automatically wrapped comment block like the following? :: .. This would be a comment block except this line hasn't been indented because some client re-wrapped the lines. Damn! - I'm trying to figure out how to refer to a target within another document: - \`hyperlink targets`_ in the spec refers to the appropriate heading; - \`the specification`_ could refer to the spec if there was an appropriate target specification later in the document:: .. `the specification`: reStructuredText (implicitly targeting the reStructuredText peer of the referrer); - \`the specification`:reStructuredText could do the same given the [mild] extension I suggested earlier; - Using a dot as a delimiter could be treaded by the link resolver as an implied sub-target:: See `hyperlinks`:reStructuredText.`hyperlink targets`. ... but that starts to get ugly. You can see why it's worrying me. :) - I hope the parser is able to deal with paragraphs inconveniently wrapped in the middle of some emphasis:: This is *emphasised text* with an inconvenient wrap point. ... otherwise I'd need an option to require delimiting the text:: This is *`emphasised text`* with an... ... a shortcut for which could be:: This is `emphasized text`* with an... ... which matches nicely with the link syntax, at least if you don't need to embold the link, which sometimes I do:: This is an `emphasized link`*_ with an... That's not as ugly as I thought it would be. - Leading and trailing whitespace should be trimmed from inline literals so that someone can do this:: Find the `` `interpreted text` `` in this paragraph! If I should just make up my own mind on each and submit a diff per to be discussed|debated|mangled, someone let me know, okay? :) Regards, Garth.
on 2001-07-18 9:12 PM, Garth T Kidd (garth@deadlybloodyserious.com) wrote:
It just occurred to me:
The lowest friction implementation for a Wiki using reStructuredText is to implicitly assume that if a _ target doesn't exist in the current document, the link must be intended to refer to another document in the system.
The "create a new page if the link doesn't exist" mechanism is an application issue. If a Wiki uses reStructuredText, it's free to do whatever it likes. The markup doesn't need to know about it though.
This would still require people to add the _ suffix for anything they want linked, but would eliminate the concerns about non-punctuation markup
Wiki ImplicitLinksUsingCamelCase are so ambiguous, they must cause lots of problems. Like a discussion about "Old MacDonald"... Better to have some unambiguous syntax saying "this is a link".
and ways of forcing links to pages whose names aren't that WikiLike. I'm concerned, however, that people could end up with some truly wierd names:: ... On the other hand, the system can always refuse to create pages of such names.
Again, an application issue.
This brings me back to the potential need for a "short form" of the link syntax. Consider an attempt to have the above example link to a more reasonably named FnarzleDocs::
See `the rest of my documentation on the fnarzle system`_ for more details.
.. _`the rest of my documentation on the fnarzle system`: FnarzleDocs_
Multiply-indirect hyperlinks? Interesting idea. I don't know if it's worth the trouble though. Of course, Wiki users could just write:: See the rest of my documentation on the fnarzle system (FnarzleDocs_) for more details.
I think my users would prefer::
See `the rest of my documentation on the fnarzle system`:FnarzleDocs for more details.
BTW, I just found the section in the spec that permits::
I think you should `download Python 2.1`:http://www.python.org/2.1/ before you touch that ugly Perl code.
Which section of which spec? That looks like StructuredText hyperlink markup, which I rejected for reStructuredText. I chose a modified Setext indirect hyperlink style because of WYSIWYG. In the processed page, we don't want the URL of the hyperlink to be visible. In the raw text, having the URL immediately after the link text is distracting; it breaks the flow of the text.
Miscellaneous concerns:
- embedding directives like includes or macro calls mid-paragraph.
Could be done with interpreted text. But is it really necessary? Could you provide some examples?
- being able to recognise indented paragraphs solely by their first line, so that people can lazily just keep typing (like I'm doing now) without having to manually terminate lines and indent the next one. The behaviour is arguably implied by the specification ("This is a paragraph continuation, not a sublist"), but I'd love explicit confirmation because it might be argued that an outdent implicitly terminates the previous paragraph but an indent doesn't. Needless to say, I'd argue against that.
Unfortunately, that syntax is ambiguous if blank lines between list items are optional, which reStructuredText allows. You can have one or the other, not both. For example, if a list item's paragraph containing text "x = x - 1" were to word wrap badly, you'd end up with:: - This is list item 1. Here's a formula: "x = x - 1". - Here's list item 2. Sure looks like item 3 though. And that's too dangerous to allow. I agree that the lazy typing style is convenient, but reStructuredText has avoid ambiguity as much as possible. The Doc-SIG historical record shows that allowing intra-list-item blank lines to be optional is more in demand. Opinions or counter-arguments anyone?
- underlined style headings; I *really* like MoinMoin's use of a number of equals signs on each side of a paragraph: if the number of equals signs is the same, the paragraph is a heading of ident level equal to the number of equals signs.
It's a workable alternate syntax.
- I would hope that the reStructuredText parser is smart enough to figure out that the example text above is part of the <li> above?
Once you indent the list item's paragraph, and further indent the example text, yes. :-)
- What's wrong with an automatically wrapped comment block like the following? ::
.. This would be a comment block except this line hasn't been indented because some client re-wrapped the lines. Damn!
Same as list items: ambiguity.
- I'm trying to figure out how to refer to a target within another document:
There's HTML's fragment syntax, which could be used:: .. _link to 'refname' within another file: fileURL#refname
- I hope the parser is able to deal with paragraphs inconveniently wrapped in the middle of some emphasis::
This is *emphasised text* with an inconvenient wrap point.
Yes, already implemented.
... otherwise I'd need an option to require delimiting the text::
Ugh. Thank goodness, unnecessary.
... which matches nicely with the link syntax, at least if you don't need to embold the link, which sometimes I do::
This is an `emphasized link`*_ with an...
reStructuredText doesn't support nested inline markup. That way lies madness...
That's not as ugly as I thought it would be.
Ugly enough ;-)
- Leading and trailing whitespace should be trimmed from inline literals so that someone can do this::
Find the `` `interpreted text` `` in this paragraph!
Inline markup start-strings must be followed by non-whitespace, end-strings preceeded by non-whitespace, so that won't work. What will work, though, is:: Find the ```interpreted text``` in this paragraph! or:: Find the \`interpreted text` in this paragraph!
If I should just make up my own mind on each and submit a diff per to be discussed|debated|mangled, someone let me know, okay? :)
Please discuss reStructuredText syntax issues here. If you'd like to start a variant syntax, or a completely unrelated syntax, you're free to do so; indeed I'd encourage it. You're welcome to use my codebase. Please make it compatible with the DPS (whose API is in its infancy, a blank slate). As I mentioned in private email, I'll be posting version 0.3 of both reStructuredText and the DPS by the end of the weekend. Look for: several thousand lines of code; most constructs implemented; warning & error generation; many unittests (over 90 & counting, just for the parser); DOM generation; oodles of fun for all ages. -- David Goodger dgoodger@bigfoot.com Open-source projects: - Python Docstring Processing System: http://docstring.sf.net - reStructuredText: http://structuredtext.sf.net - The Go Tools Project: http://gotools.sf.net
David Goodger wrote:
... I'll be posting version 0.3 of both reStructuredText and the DPS by the end of the weekend. Look for: several thousand lines of code; most constructs implemented; warning & error generation; many unittests (over 90 & counting, just for the parser); DOM generation; oodles of fun for all ages.
Aagh! OK. Congratulations, that's great, definitely a Good Thing, and I'm seriously envious. (shades of "heh, look folks, Python Doc-SIG has a *product*, all that wait and effort *was* worth doing".) But... I'm off for a week of holiday (visiting relatives in Germany) on Sunday, and was planning to take the various spec documents with me to peruse and mark up with any comments (yes, I know, but there's more chance of doing it next week than during "normal" time, so far). And now there's going to be a new release whilst I'm away (possibly even whilst in the air). Curses, foiled again. More seriously - David, is there any chance of getting a copy of the updated specs (both DPS and reStructuredText) before then, or is it worth my commenting on the documents "as is", assuming that changes in the "what it provides" will be minor? It would have to be before mid-Friday afternoon my time... Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ "How fleeting are all human passions compared with the massive continuity of ducks." - Dorothy L. Sayers, "Gaudy Night" My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)
.. The subject line used to be ``re: reStructuredText``, but we're getting so much point-bloat that I'm splitting it up.
Wiki ImplicitLinksUsingCamelCase are so ambiguous, they must cause lots of problems. Like a discussion about "Old MacDonald"...
You can escape the word with a bang (!), but, because the bang is also used to escape whole lines it turns out to be a mind-bender to escape just the first word in a line. Maybe you can break the paragraph, maybe not -- I forget, and that's the whole problem. It's something you have to *think* about. Ugh.
Better to have some unambiguous syntax saying "this is a link".
The explicit but still simple reStructuredText link format is growing on me, for sure. Regards, Garth. -- Garth T Kidd <garth@netapp.com> Mobile: +61-411-596-593 Consulting Systems Engineer, Aust/NZ Direct: +61-2-9779-5614 Network Appliance http://www.netapp.com/
.. This also used to be in the ``re: reStructuredText`` thread, but it's sufficiently contentious that I think it deserves its own subject line. ... being able to recognise indented paragraphs solely by their first line, so that people can lazily just keep typing (like I'm doing now) without having to manually terminate lines and indent the next one. Unfortunately, that syntax is ambiguous if blank lines between list items are optional, which reStructuredText allows. You can have one or the other, not both. .. The spec supports nested block quotes, right? Consider: * People who want to use reStructuredText in docstrings need the blank lines between list items to be optional, and will be using proper programming editors that can handle indentation for them. This group is quite obvious at the moment. * People who want to write reStructuredText in mail clients and web browsers will be constantly frustrated if they are forced to manually indent everything, and won't mind at all if they're forced to put blank lines between list items. Nobody appears to have spent much time considering the requirements of this group yet. Those are both sizable target groups, right? Now, I believe the following: * We don't want to exclude either of those target groups. * On the other hand, we don't want to make reStructuredText ambiguous. It sure looks to me like a requirement for a switchable mode in the parser. Different applications can choose different defaults. Or, the parser could attempt to automatically figure it out. If the very first bullet point or indented paragraph you see looks like this, you probably want to select for lazy paragraph indenting:: * If the line after the first bullet point or indented paragraph starts at column zero and is not empty, lazy paragraph indenting can be assumed by applications that expect that some users might be using crummy editors. Docstring processors would explicitly suppress such automatic selection. You point out ambiguity in your example of a badly wrapped paragraph containing the bullet selector:: - This is list item 1. Here's a formula: "x = x - 1". - Here's list item 2. Sure looks like item 3 though. .. _abuse of the word "ambiguous": To me, that's not ambiguous. The bad wrapping makes it explicitly a three item list. It's not what the user intended [2]_, but there are so many ways for the user to unambiguously fix it I don't think it's a problem: * Manually wrap it closer to column zero:: - This is list item 1. Here's a formula: "x = x - 1" - Here's list item 2... * Use a different bullet:: * This is list item 1. Here's a formula: "x = x - 1". * Here's list item 2. Implication: a rule in the parser that says that blank lines are required between adjacent but different lists at the same indentation level, even if lazy paragraph formatting is turned on. That nicely matches the * Use an inline literal:: - This is list item 1. Here's a formula: ``x = x - 1``. - Here's list item 2, as the parser considers the second line in this example part of the literal started in line 1.
The Doc-SIG historical record shows that allowing intra-list-item blank lines to be optional is more in demand.
I can *readily* imagine that intra-list-item blank lines being optional is more in demand at the moment. The majority of the people discussing this specification are probably Python programmers who want to use it for Python code (in docstrings) and the documentation for their Python code which they'll probably be editing in the same indentation-smart text editor they use for their code.
Opinions or counter-arguments anyone?
I'm not sure we should dig your heels in and assert that reStructuredText should *only* be useful for Python programmers with an indentation-smart text editor. There are hundreds of billions [1]_ of frustrated Wiki users out there pounding their heads against the Wiki markup syntax, and almost as many ZWiki users ripping their hair out because StructuredText is just as bad or worse. Telling them we're not going to throw them a line and rescue them from shark infested water because they might get our precious rope wet seems a tad... stingy. Getting into the mud on ambiguity: .. _explicit discussion of ambiguity: I'm going to come under some well deserved flack for my `abuse of the word "ambiguous"` above, so I'm going to break it out a little. If the specification is changed as I suggest, *and* the parser is implemented as I'm saying, *and* the user tries to do what David suggests, *and* their text gets badly wrapped in the position David indicates, then: * The *specification* is not ambiguous, and * The *parser* won't find the input ambiguous, but * The *user* might be a little confused for a moment. The user is going to spend a lot of time confused regardless. Every time I try and represent a bullet list for which each item owns a literal block, for example, I forget to indent the literal block and have to go back and fix it. Users are going to spend a lot of time going back and fixing things that they got wrong. Going back and fixing the list won't be any additional hassle. I'm wary of insisting upon serious inconvenience to a large segment of the user population for [3]_ to save inconvenience to the occasional user who stumbles across the edge case of a list item that happens to have a list delimiter just after the wrap column. More glibly put: two out of three ain't bad. I think they'll cope. :) .. _[1] I counted them. Really! .. _[2] Before firing missiles on my use of the word "ambiguous", please see my `explicit discussion of ambiguity`, upon which you can unload entire batteries if you want. :) .. _[3] e.g. having to manually indent every single list item as punishment for using an editor that doesn't handle indentation properly and that wraps long paragraphs with newlines. Regards, Garth.
One point I hadn't made explicitly was: if lazy paragraph indentation on list items is enabled, each list item may contain only one paragraph. Second elements of any kind (including sublists) are not possible without significantly reworking other aspects of the markup. For example, here's a list in 'strict' reStructuredText:: - List 1, item 1, para 1. Item 1, para 2. - List 1, item 2, para 1. Item 2, para 2. * Sublist A of item 2. In with lazy indentation, that structure is impossible to represent:: - List 1, item 1, para 1. Item 1, para 2. - List 1, item 2, para 1. Item 2, para 2. * Sublist A of item 2. If we don't indent, like 'Item 1, para 2', we get a one-item list, followed by a paragraph, followed by a second, separate list. If we do indent, like 'Item 2, para 2', we have a block quote containing a paragraph followed by a list. But lazy indenters are loathe to indent, so the point is moot. Lazy indentation would only be useful for the simplest of documents: flat, limited to one paragraph per list item, no nested lists possible. This seriously limits the expressive power of the markup. Is the lazy variant sufficiently powerful to be useful to anyone? If anyone can come up with or refer to a self-consistent scheme to combine lazy indentation with powerful expressivity, please do chime in. -- David Goodger dgoodger@bigfoot.com Open-source projects: - Python Docstring Processing System: http://docstring.sf.net - reStructuredText: http://structuredtext.sf.net - The Go Tools Project: http://gotools.sf.net
I've had some time to respond to individual comments: on 2001-07-19 10:24 PM, Garth T Kidd (garth@deadlybloodyserious.com) wrote:
.. The spec supports nested block quotes, right?
Yes. So? I don't get your point.
It sure looks to me like a requirement for a switchable mode in the parser. Different applications can choose different defaults.
This is workable. If you can come up with consistent, unambiguous, safe rules for lazy indentation, then Wikis and other apps could use the lazy variant.
Or, the parser could attempt to automatically figure it out.
That's a dangerous path. Explicit is better.
You point out ambiguity in your example of a badly wrapped paragraph containing the bullet selector::
- This is list item 1. Here's a formula: "x = x - 1". - Here's list item 2. Sure looks like item 3 though.
I think intervening blank lines are an absolute requirement for lazy indentation. So the example would be like this:: - This is list item 1. Here's a formula: "x = x - 1". - Here's list item 2. Sure looks like item 3 though. If *only* lazy indentation is used, no problem. If the parser tries to infer the author's style, it would mistakenly infer strict indentation. [Garth lists workarounds:]
* Manually wrap it closer to column zero::
Yes, but we are trying to avoid surprises when accidental bad wrapping takes place. The user doesn't always have control. My email client wraps my paragraphs, even if I don't want it to.
* Use a different bullet::
Change the example to "x = (x + 1) * 3 - 2" (all possible bullets included), and this workaround won't always work.
Implication: a rule in the parser that says that blank lines are required between adjacent but different lists at the same indentation level, even if lazy paragraph formatting is turned on.
My parser actually does this. I'll add mention of it to the spec.
* Use an inline literal::
- This is list item 1. Here's a formula: ``x = x - 1``. - Here's list item 2, as the parser considers the second line in this example part of the literal started in line 1.
Although not explicitly stated in the spec (yet), the way I've implemented the parser is to do line/block parsing first, then inline markup parsing afterwards (standalone URI parsing last). So in the case above, the "- 1``." would be recognized as a new list item before being examined for inline literals. The "\``x = x" at the end of the first line would generate a warning, "Inline literal start-string without end-string."
There are hundreds of billions [1]_ of frustrated Wiki users out there pounding their heads against the Wiki markup syntax, and almost as many ZWiki users ripping their hair out because StructuredText is just as bad or worse. Telling them we're not going to throw them a line and rescue them from shark infested water because they might get our precious rope wet seems a tad... stingy.
I'm all in favor of throwing them a line. But (to extend your analogy further) I want the line to be strong and well anchored, so they don't get tangled up in it and drown. :-)
* The *user* might be a little confused for a moment.
The user is going to spend a lot of time confused regardless.
Confusion is OK, as long as it stems from ignorance; education/experience fixes that. Confusion stemming from surprising (even if *very occasionally* surprising) side-effects of the markup, that's not acceptable.
I'm wary of insisting upon serious inconvenience to a large segment of the user population for [3]_ to save inconvenience to the occasional user who stumbles across the edge case of a list item that happens to have a list delimiter just after the wrap column.
In putting together these specs and the parser software, I've always kept this in mind: If it can go wrong, it will. Writing the spec and implementing the parser, I've tried to avoid surprises and ambiguity wherever possible. If avoidance is not possible, then the possible surprises have to be minimized, explicity documented, and warned of by the parser. Also, there has to be an "out" or workaround (which is where backslash-escapes come in handy).
More glibly put: two out of three ain't bad. I think they'll cope. :)
You're a programmer. Imagine if Python had funny edge cases. Would you *cope*? Or would you scream bloody murder? Out of respect for the eventual users of reStructuredText, we can't allow *any* surprises. It will be great if you can come up with a consistent indentation-minimized syntax; I'm all for it. All you need to do is devise an alternative representation of hierarchical structures, one that doesn't use indentation or begin/end markers. If it *does* use begin/end markers, we'll call it something else ;-), and start another parser component project for it.
It sure looks to me like a requirement for a switchable mode in the parser. Different applications can choose different defaults.
This is workable. If you can come up with consistent, unambiguous, safe rules for lazy indentation, then Wikis and other apps could use the lazy variant.
Wicked.
Or, the parser could attempt to automatically figure it out.
That's a dangerous path. Explicit is better.
Fair enough. This quote is out of order because it's more important:
Although not explicitly stated in the spec (yet), the way I've implemented the parser is to do line/block parsing first, then inline markup parsing afterwards (standalone URI parsing last).
"Sorry, changing the parser order is just too hard at this stage to relax the requirement for blank lines between list entries in lazy mode" is a perfectly reasonable argument in favour of that requirement, and I'm entirely happy to accept it. [3]_ .. _[3] You have no idea how frustrated my fiance gets when, after attempting to justify a decision with several sadly illogical [4]_ arguments in its favour and listening to me patiently dissect and dismiss each one, discovers that "I feel like it, okay?" was all that she needed to say. Well, maybe you have a slight idea. :) One of these days, I'm going to clue up and ask right after the first one whether she just feels like it. It'll save a lot of time and angst. Similarly, I should have asked up front whether the implementation of my proposal was going to be difficult. .. _[4] No, this is not an attempt to slyly call your arguments illogical. Misguided, much Frowned upon by God, and if not abandoned sure to lead to your Eternal Damnnation in Hell, but not illogical by any shake of the stick. :) The now sadly irrelevant argument in favour of a less strict lazy mode follows anyway. Summarizing the issue of badly wrapped lists and lazy mode: * We're either in lazy mode, or not. No automatic selection. Cool. * The following example is still contentious:: - This is list item 1. Here's a formula: "x = x - 1". - Here's list item 2. Sure looks like item 3 though. * The strict approach to the example: * A parser permitting lazy indentation without insisting upon blank lines would interpret the above as "three lists", and * A human reader strictly reading the specification would reach a similar conclusion, but * That's obviously not what the user intended when they wrote :: - This is list item 1. Here's a formula: "x = x - 1". before their editor badly wrapped the line. * We can could this disconnect "ambiguity", * In the parser world, "ambiguity" is a bad word, * Therefore blank lines **must** be insisted upon between list items in lazy mode. * Arguments in favour of being more forgiving: *Ambiguity ain't always that ambiguous*: The kind of ambiguity we're most worried about is *circumstances for which the parser's behaviour is undefined*. The parser needs to be able to consistently make a decision, and programmers implementing parsers need to be able to make a decision. This clearly isn't such a case. The user will be typing a list. When they see the results, they'll mutter dark words about the stupid editor their company insist they use, and they'll fix the markup somehow (see below). If asked "hey, do you consider what just happened ambiguous?", I don't imagine many users would reply in the affirmative. They explicitly typed something. Their editor explicitly stuffed it up. The parser explicitly interpreted the text, and the user explicitly said expletives and explicitly fixed the problem. Any confusion in the user's mind when seeing the output will disappear when the system sends them their text back for editing and they see what their text editor did. *Consider the user impact*: This kind of a strict "never suffer ambiguity to live" attitude imposes a heavy burden on the user every time they use a list (probably quite often) in order to save them from something untoward that might happen to them only once a year, if ever. A comparison might be made to money handling. If your current cash register techniques occasionally let minor mistakes to be made, you could well lose hundreds of dollars per year. Insisting that all totals are manually verified by a supervisor will save those hundreds of dollars, but cost tens of thousands in additional salary. Moreover, all of your customers might abandon your store because they're sick of the hassle. *Users can avoid the problem very, very easily*: Any user aware that their editor wraps lines for them, and aware that a copy of the list delimiter unfortunately wrapped to the beginning of the line will cause the parser to start a new list item, will do one of the following: * Manually wrap such a long item well before the wrap point:: - This is list item 1. Here's a formula: "x = x - 1". - Here's list item 2. * Choose a different list delimiter. * Use literals (assuming the parser is changed so that literals bind harder than the beginning of list items). * Drop into strict mode temporarily: _[2] :: .. strict:: - This is list item 1, which contains a formula that I'm not sure will wrap appropriately, so I'm going to drop into strict mode and manually wrap each and every line well before the wrap point. Anyway, here's the formula: "x = x - 1". - Here's list item 2. .. lazy:: I suspect the first two will be slightly more popular. :) Any user waking up regularly dripping with sweat because of recurring nightmares about having to go back and fix their markup will, I think, go to the effort of finding an editor that will write their markup for them. *What would the user choose?*: Given a choice between the following: * a *strict* mode that insists that users manually wrap each and every line well before their editor's wrap point *and* manually indent those lines as well, * a *strictly lazy* mode that relaxes the requirements for manual wrapping and indentation but insists upon blank lines between all list items, and * a hypothetical *bloody lazy* [1]_ mode that doesn't insist upon those blank lines but that requires users to consider editor wrap points when putting list delimiters in the middle of list items, I somewhat suspect that many users would end up being bloody lazy. Certainly, if bloody laziness were the default, I sincerely doubt that many people would bother switching to a stricter mode, even if they got caught out once or twice. .. _[1] There's the `Queen's English` again. .. _[2] Well, there's an example of a parser directive, if we need one.
Yes, but we are trying to avoid surprises when accidental bad wrapping takes place. The user doesn't always have control. My email client wraps my paragraphs, even if I don't want it to.
Well, exactly, but there's nothing wrong with surprises if the user can figure out how to respond to the surprise. Users are going to be stuffing up quite often, will be surprised to see that what they did didn't work, and will look at their markup again and maybe refer to the specification to figure out what happened and what to do about it. If we're not worried about that (leading to directives like: "users must never write their own markup, but must use an editor that doesn't let them make mistakes"), why are we worried about this wrapping and list items issue? The user has enough control over the wrapping to force a wrap earlier than the parser did, which is more than s/he needs to either dodge or fix the problem.
* The *user* might be a little confused for a moment.
The user is going to spend a lot of time confused regardless.
Confusion is OK, as long as it stems from ignorance; education/experience fixes that. Confusion stemming from surprising (even if *very occasionally* surprising) side-effects of the markup, that's not acceptable.
Call it a side-effect of the editor. If anyone gets particularly detail-oriented and angst ridden about the whole thing, direct them to the list archives (of which I'm sure I'm going to be sufficiently embarrassed), point out that it's all my fault, and give them my email address. :)
Writing the spec and implementing the parser, I've tried to avoid surprises and ambiguity wherever possible. If avoidance is not possible, then the possible surprises have to be minimized, explicity documented, and warned of by the parser. Also, there has to be an "out" or workaround (which is where backslash-escapes come in handy).
Let's say that it were impossible to insist on the blank lines for non-technical reasons (the managing director hates them). I think the possible surprises are minimal, I'll write the documentation, I'll try and figure out a way to warn about the situation (spotting a broken literal is the easiest way until we climb into the ordered list rat-hole), and there's an easy out. Close enough?
More glibly put: two out of three ain't bad. I think they'll cope. :)
You're a programmer. Imagine if Python had funny edge cases. Would you *cope*? Or would you scream bloody murder?
Python surprises me every week. Then I figure out that my editor broke the indentation. I fix what my editor broke, and keep working. I cope. :)
Out of respect for the eventual users of reStructuredText, we can't allow *any* surprises.
We're doing it for your own good! Out of respect for people already suffering crummy editors, I'm trying to cut them as many breaks as I can. Users who absolutely cannot stand surprises can always turn on strictness or strict laziness, eh? It just occurred to me that I've spent more time discussing this than I could possibly have spent as a user swearing about needing to put blank lines in. Sorry about that. I'm mainly worried about people cutting and pasting mail in to their web browser (it'll happen). Saving them the effort of breaking the bullet lists apart seems like a fair thing.
It will be great if you can come up with a consistent indentation-minimized syntax; I'm all for it.
Still working on it! Oh, the shame: a Python programmer trying to figure out how to avoid indenting...
All you need to do is devise an alternativerepresentation of hierarchical structures, one that doesn't use indentation or begin/end markers. If it *does* use begin/end markers, we'll call it something else ;-), and start another parser component project for it.
If it had to use begin and end markers, we may as well write it in *Perl*. Ewwwww... Regards, Garth.
More time-wasting goodness. Thankfully, a short "yup" will straighten most of it out. - `subjects with momentum`_ - `multiply indirect links`_ - `general coding attitude`_ - `embedding directives mid-paragraph`_ - `alternative heading format`_ - `block quotes and literal blocks in list items`_ - `miscellaneous`_ - `tying up loose ends`_ I'm leading with the items that have a bit of momentum. I've bumped to the tail end of the message all of the knotted loose ends: the yeps, the uh-huhs, and the oopses. Paragraph indentation, I put in a completely different message; it looks like it could take a while. If reading my email in reStructuredText 0.2.2 is driving anyone nuts, please let me know. .. _subjects with momentum: Kicking off with the momentum: .. is that English? :) .. _multiply indirect links: Multiply-indirect hyperlinks? Interesting idea. I don't know if it's worth the trouble though. Collapsing multiply indirect links seems clean and trivial. Why not? :) That said, it's not necessarily *important*, so feel free to leave it in the pile of features labelled "Garth can code these if he wants them so bloody badly." .. _my general coding attitude: In general, if it's easier to make the parser accomodate a user behaviour than to persuade the users to select another behaviour, I'll consistently be in favour of changing the parser. That doesn't override my insistence on clean code, especially if it's hanging out there in public. If I can't do it without tangling the parser, I'll hold off until I can figure out a way to refactor it cleanly. Finally, I don't mind putting my code where my mouth is. .. That almost ended up "... don't mind putting my keyboard where my mouse is." I clearly need more sleep than I'm getting. Summary: * If it's easier to change the code's behaviour than change the users' behaviour, change the code. * Writing dirty, ugly code is harder than changing user behaviour. * If it's not obvious whether the code is going to be clean or not, I'll find out by trying to write it. .. _embedding directives mid-paragraph: Embedding directives mid-paragraph... Could be done with interpreted text. But is it really necessary? Could you provide some examples? If I had an example in mind, I've since forgotten it. Howabout we nail down how to hand a block of text to a directive, and then leave messy stuff like doing something unusual mid-paragraph to application-specific directives. If it turns out to be mind-blowingly popular, it can be factored in later. .. _alternative heading format: I *really* like MoinMoin's use of a number of equals signs on each side of a paragraph [to indicate a heading]: if the number of equals signs is the same, the paragraph is a heading of ident level equal to the number of equals signs. It's a workable alternate syntax. Yep. .. _block quotes and literal blocks in list items: On block quotes and literal blocks in list items, which I still find mildly confusing (at least, when I'm typing -- my fingers aren't used to it yet, if you know what I mean): I would hope that the reStructuredText parser is smart enough to figure out that the example text above is part of the <li> above? Once you indent the list item's paragraph, and further indent the example text, yes. :-) I keep forgetting the extra indents. Just to confirm:: - list item new paragraph in list item:: literal block in list item .. outdented comment to force the end of the literal block block quote in list item block quote outside of list item (oops!) - another list item .. _miscellaneous: A few quick questions: * If I accidentally indent a bullet list, does that become a bullet list inside a block quote? :: I'm not sure whether the following is a bullet list at the same level as this paragraph, or a bullet list inside a block quote: * Anyone? Anyone? Bueller? I suspect the answer is: yes. * Does the specification implicitly or explicitly support the use of an outdented comment to force the end of an indented block, as above? I suspect the behaviour isn't yet defined. .. tying up loose ends: tying up loose ends ------------------- We turn out to completely agree on the following: * Treating unresolved links as opportunitities to create a new page is up to Wiki, not the parser. * Suppressing page creation for unresolved link destinations that would make wierd page names is also up to Wiki. I've discovered the following blunders: * Inline URL specification for links is too ugly to bear:: `download Python 2.1`:http://www.python.org/2.1/ I must have missed the word "rejected" at the time. My blunder. * Interpreted literals and literal interpretation:: Find the `` `interpreted text` `` in this paragraph! What was I on? Regards, Garth.
on 2001-07-20 2:17 AM, Garth T Kidd (garth@deadlybloodyserious.com) wrote:
If reading my email in reStructuredText 0.2.2 is driving anyone nuts, please let me know.
Looks OK to me! :-) New in the spec and parser (which will be posted to the web site this weekend) is implicit hyperlink targets in titles, so you could use titles instead of ".. _internal hyperlink targets:".
Multiply-indirect hyperlinks? Interesting idea. I don't know if it's worth the trouble though.
Collapsing multiply indirect links seems clean and trivial. Why not? :)
But is the gain worth the added complexity?
That said, it's not necessarily *important*, so feel free to leave it in the pile of features labelled "Garth can code these if he wants them so bloody badly."
Consider it so left. And it's good to hear the Queen's English. [Summary of Garth's coding attitude:]
* If it's easier to change the code's behaviour than change the users' behaviour, change the code.
* Writing dirty, ugly code is harder than changing user behaviour.
So write clean code! :>
* If it's not obvious whether the code is going to be clean or not, I'll find out by trying to write it.
Sounds reasonable.
Howabout we nail down how to hand a block of text to a directive, and then leave messy stuff like doing something unusual mid-paragraph to application-specific directives. If it turns out to be mind-blowingly popular, it can be factored in later.
Yes. The parser currently *parses* directives just fine, but doesn't actually *do* anything with them yet. I'll have to code up a directive or two to see what they'll need. Any suggestions?
I keep forgetting the extra indents. Just to confirm::
- list item
new paragraph in list item::
literal block in list item
.. outdented comment to force the end of the literal block
block quote in list item
block quote outside of list item (oops!)
- another list item
Correct on all counts.
* If I accidentally indent a bullet list, does that become a bullet list inside a block quote? :: ... I suspect the answer is: yes.
Your suspicions are well-founded. (Yes.)
* Does the specification implicitly or explicitly support the use of an outdented comment to force the end of an indented block, as above?
I suspect the behaviour isn't yet defined.
Right again. The spec doesn't explicitly or implicitly say anything. The parser *does* support such (ab)use of an unindented comment. I'll modify the spec. Thanks for the feedback. -- David Goodger dgoodger@bigfoot.com Open-source projects: - Python Docstring Processing System: http://docstring.sf.net - reStructuredText: http://structuredtext.sf.net - The Go Tools Project: http://gotools.sf.net
On Thu, 19 Jul 2001 01:11:15 -0400, David Goodger wrote:
As I mentioned in private email, I'll be posting version 0.3 of both reStructuredText and the DPS by the end of the weekend. Look for: several thousand lines of code; most constructs implemented; warning & error generation; many unittests (over 90 & counting, just for the parser); DOM generation; oodles of fun for all ages.
You did not _explicitely_ mention documentation (docstrings & docs). If those are included, I'll try integration into MoinMoin.
[replying to me about the upcoming post of spec & code] on 2001-07-20 6:48 AM, Juergen Hermann (jh@web.de) wrote:
You did not _explicitely_ mention documentation (docstrings & docs). If those are included, I'll try integration into MoinMoin.
I didn't *explicitly* mention documentation because, outside of the specs, there is none yet. Apart from dps.statemachine, the docstings are nowhere near complete or acceptable. Please beware, the code is not in any semblance of completion yet. There are many holes in it. Examples: I haven't tested the parser's external API at all; there's no output formatter (except for raw XML). The code should be considered experimental. Having said that, please do take a look. I'd like to hear what MoinMoin (or any other potential client application) would need from the DPS and/or reStructuredText. -- David Goodger dgoodger@bigfoot.com Open-source projects: - Python Docstring Processing System: http://docstring.sf.net - reStructuredText: http://structuredtext.sf.net - The Go Tools Project: http://gotools.sf.net
On Thu, 19 Jul 2001 11:12:56 +1000, Garth T Kidd wrote:
- underlined style headings; I *really* like MoinMoin's use of a number of equals signs on each side of a paragraph: if the number of equals signs is the same, the paragraph is a heading of ident level equal to the number of equals signs. For example::
= this is a level 1 heading =
== this is a level 2 heading ==
== this is also a level 2 heading despite the fact that it has been wrapped for some reason ==
=== this is not a heading because the number of equals signs isn't equal <ahem>. ==
This is not exactly the rules MoinMoin implements, but close. ;)
on 2001-07-20 6:44 AM, Juergen Hermann (jh@web.de) wrote:
This is not exactly the rules MoinMoin implements, but close. ;)
Could you post a link to the rules that MoinMoin *does* implement please? (I visited the site but couldn't find the markup rules.)
On Fri, Jul 20, 2001 at 06:29:55PM -0400, David Goodger wrote:
on 2001-07-20 6:44 AM, Juergen Hermann (jh@web.de) wrote:
This is not exactly the rules MoinMoin implements, but close. ;)
Could you post a link to the rules that MoinMoin *does* implement please? (I visited the site but couldn't find the markup rules.)
participants (5)
-
David Goodger -
Garth T Kidd -
Jim Tittsler -
Juergen Hermann -
Tony J Ibbs (Tibs)