[Doc-SIG] Re: docutils REs

Tony J Ibbs (Tibs) tony@lsl.co.uk
Fri, 23 Mar 2001 10:22:59 -0000


> 01> _descriptive = """\
> 02> (?P<item>		# start our *item*
> 03>   (?:		# an unnamed group
> 04>      [^\n]*		  # 0..n of anything but newline
> 05>      '[^'\n]+'	  # a literal string, containing 1 or more chars
> 06>      [^\n]*		  # 0..n of anything but newline
> 07>   )*		# end group
> 08>
> 09>   |			# or
> 10>
> 11>   [^\n]*		# 0..n of anything but newline
> 12>
> 13>   (?!		# negative lookahead for
> 14>     '		# a quote
> 15>     [^']*		# 0..n of anything but quote
> 16>
> 17>     [ ]+ -- [ ]+	# spaces -- spaces
> 18>
> 19>     [^']*		# 0..n of anything but quote
> 20>     '		# a quote
> 21>   )			# end of negative lookahead
> 22> )			# end of our *item*
> 23>
> 24> [ ]+ -- [ ]+	# spaces -- spaces
> 25>
> 26> (?P<text> .*)	# 0..n of any character
> 27> """
>
> What are lines 11-21 for?  The only cases I can think of that
> they capture (that 3-7 don't) are dubious cases like::
>
>   bad 'apostrophe nesting -- in the key

I can't offhand remember - the RE growed until it appeared to work, and
some of it appeared to rely on the "fuzzy" handline that REs appear (to
me) to do in balancing the greediness of different bits of the RE. It's
possible it's skeletal remains which should be excised, I suppose.

> Also, I wanted to make sure you're clear that '^' and '$' match
> beginning and end of LINE, not of STRING (although the latter is
> a subset of the former).

Not according to the RE documentation in the Python 1.5.2 reference
manual, they don't - that's quite clear in saying start and end of
STRING, and recognition of newlines is only in MULTILINE mode.

> I don't think that STNG currently requires whitespace before
> *emph* or **strong** etc... that's why I coded it like I did.

I kept STNG REs around as comments for "of interest" reasons, but
personally found them less than useful, so basically have worked from
scratch and the ST "documentation". So it's quite possible they're
different.

> But I think that STpy's approach may be more reasonable..
> (we should start making a list of proposed changes to STNG,
> in order to make STpy and STNG more compatible.. Otherwise,
> STminus will just end up being a big useless mess :) )..

Well, no, I wouldn't say that.

> Hm.. I guess s/we/I/ in that last parenthetical. :-/

my preferred option!

> I haven't decided yet on whether I'm happy about having this
> concept of "acceptable ending punctuation.."  It sort of seems
> like *all* punctuation should be ok, or *none*..

I'm not *too* happy about it myself, and actually it's a string that's
'%' included into the RE texts where it's needed - this means that (a)
it's easy to change, but (b) it should be the same in all places - I
thought consistency was a Good Idea.

> But I'll
> think on it some more. (e.g., should it be ok to have a dash after
> an *emph region*-like this?)

That looks wrong to me - but then you can see how I use dashes in plain
text!

There *are* some conventions on how one uses punctuation - for instance,
'this ,' looks wrong to almost everyone. ST<whatever> just enforces some
of them (this is, of course, yet another class of things to consider
warning people about).

Tibs

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
Well we're safe now....thank God we're in a bowling alley.
- Big Bob (J.T. Walsh) in "Pleasantville"
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)