[Doc-SIG] Tokens for labels & endnotes
Edward D. Loper
edloper@gradient.cis.upenn.edu
Wed, 21 Mar 2001 14:03:24 EST
> I'm assuming we're talking about paragraph labels.
Actually, I think we were talking about [endnotes]. But the same
questions apply to labels..
> I think we should just go with the English definition of a word, which
> means [-A-Za-z], and leave it at that. It is *meant* to look like a
> word.
Is that too anglo-centric?
> I think "keep it simple" is required here - these labels are meant to be
> few and simple, so English words seems sensible to me. I would thus vote
> against underlines and against digits.
It might be that underlines and digits are more applicable for
endnotes. Some people might like this [1] or this [noam_chomsky97].
> Also, validation aside, I don't *use* a regular expression - I look for
> the right "shape" of paragraph (1 line, colon in it) and check what is
> to the left of the colon against the dictionary. From *my* point of view
> the legitimate characters idea only comes in with a validation phase (of
> course, it would be different for Edward).
This may be different if you want [this to not be an endnote].
> > Basically re defines '\w' = '[0-9a-zA-Z_]
>
> Erm - basically it doesn't - it invokes "locales" which makes life more
> complex (and I have no idea what sre does about '\w').
If LOCALE and UNICODE flags aren't used when compiling a regexp,
\w = [a-zA-Z0-9_] (at least according to "the python library
reference manual
for re":<http://www.python.org/doc/current/lib/re-syntax.html>).
Furthermore, it will always match '_', regardless of LOCALE and
UNICODE (again, according to the ref. manual).
-Edward