Hi David! David Goodger wrote:
Unfortunately, there's a final, showstopper problem with this syntax: RFC 2732 ("Format for IPv6 Literal Addresses in URL's") adds the "[" and "]" characters to the set of possible URI characters. This means we can't surround URIs with "[]" with the current parser, which is intentionally limited in its inline markup parsing ability (uses regexps). Here's an example::
Wow. Oops. Ok, point well taken; I've always missed that update to RFC 2396, so far, and have assumed that [] are still reserved URI chars.
[in a follow-up:]
I've come up with a third variation that doesn't break the _ convention as much as the other two:
An `example hyperlink` <http://example.com>_.
From there it's a *very* short step back to::
An `example hyperlink <http://example.com>`_.
One underscore means "named", two means "anonymous", same as in the rest of the cases.
Well, yes, it can be argued that it is just one backquote moving a little forwards. Ultimately, this is a question of taste, but I still find that the first version is quite a bit more readable; there, I'm able to parse the backquotes as a marker for the extent of the link (as in `example hyperlink`_), and the angle bracketed text as an annotation to the link-- my interpretation of the syntax is, `example hyperlink`_ with an intersparsed annotation that gives the URI inline. With `example hyperlink <http://example.com>`_, on the other hand, I find it harder to ignore the URI when reading: my eyes search for the corresponding closing marker to the first backquote, which in the context of reST I interpret as an opening marker (like an opening bracket). What happens is that the URI jumps into the foreground (because it's immediately before the closing backquotes my eyes are searching for) and doesn't any more look like the annotation I'm used to from plain text. Now, I can understand that you don't want to implement backtracking in the parser for this, but I don't actually see why that's necessary (then again, I'm still trying to grasp how the parser works, so if I'm misinterpreting here, I'd be glad for being corrected). As far as I can see, in ``parsers/rst/states.py``, you already distinguish between inline literals and single-backquoted text; then at a latter point I think you further distinguish between single-backquoted phrase refs (underscore at end) and single-backquoted domain-specific text (no underscore at end). How about simply introducing another case, inline hyperlinks? The opening marker would be a single backquote (i.e., a backquote not preceded or followed by another backquote, as currently). The closing marker would be identified by the following regular expression:: r'`\s*<' + uri + r'>_' (Can be improved by allowing for a second underscore at the end and checking that whitespace or punctuation follows.) Possibly we'd have to do a little more parsing to get the URI out of the angle brackets, but that won't be hard. -- Ok, maybe this isn't extremely beautiful, but from what I understand now it could work without implementing backtracking. Again, it's a matter of taste to decide whether this is worth the effort; because of the reasons above, in my humble opinion, it is ;-) - Benja