[Doc-SIG] suggestions for a PEP

Wed, 21 Mar 2001 20:39:30 +0000 (GMT)

> I think "keep it simple" is required here
to me that needs to include:
   * case insensitive
   * digits

because authors of doc-strings are going to be shocked if it behaves
otherwise.  The former means your dictionary-based approach is not
satisfactory - string.tolower the apparent label, then check to see
whether the result appears in some list (or other implementation of
`collection') of known labels.  Otherwise, your builder.label_dict is
going to need further entries for, at least:

			  "Pep":"pep",
                          "Post-history":"post-history",
                          "Discussions-to":"discussions-to",

since some folk using the keys you gave *will* use them in the forms
shown; and you'll probably also need

                          "Discussions-TO":"discussions-to",

etc.
Simpler: use tolower.  

Have canonical forms generally be in Capitalised-Word form (like RFC 822
labels).  Indeed, a good way to implement the aforementioned
`collection' would indeed be a mapping which is exactly the reverse of
the ones you showed us - mapping from the tolower form to the canonical
form for each key - so that one recognises a key using:

    try: canon = labels[string.tolower(text)]
    except KeyError: ... # it isn't a real label

I am entirely happy to have the present *actual dialects* of ST use only
letters and dash; however, allow ST-generic to permit numbers, e.g. so
that ST variants *can* use
       "rfc2954-char-set": "RFC2954-Char-Set"
in their label dicts, or similar.
(No, I have no idea what RFC 2984 is, nor even whether it exists.)

>> Basically re defines '\w' = '[0-9a-zA-Z_]

> Erm - basically it doesn't - it invokes "locales" which makes life more
> complex (and I have no idea what sre does about '\w').

and I can't say I care much either way, once you're allowing - in the label.
The only need for _ is to separate words, and - is easier to type ;*>

	Eddy.