[Doc-SIG] suggestions for a PEP
Edward Welbourne
Edward Welbourne <eddy@chaos.org.uk>
Wed, 21 Mar 2001 20:39:30 +0000 (GMT)
> I think "keep it simple" is required here
to me that needs to include:
* case insensitive
* digits
because authors of doc-strings are going to be shocked if it behaves
otherwise. The former means your dictionary-based approach is not
satisfactory - string.tolower the apparent label, then check to see
whether the result appears in some list (or other implementation of
`collection') of known labels. Otherwise, your builder.label_dict is
going to need further entries for, at least:
"Pep":"pep",
"Post-history":"post-history",
"Discussions-to":"discussions-to",
since some folk using the keys you gave *will* use them in the forms
shown; and you'll probably also need
"Discussions-TO":"discussions-to",
etc.
Simpler: use tolower.
Have canonical forms generally be in Capitalised-Word form (like RFC 822
labels). Indeed, a good way to implement the aforementioned
`collection' would indeed be a mapping which is exactly the reverse of
the ones you showed us - mapping from the tolower form to the canonical
form for each key - so that one recognises a key using:
try: canon = labels[string.tolower(text)]
except KeyError: ... # it isn't a real label
I am entirely happy to have the present *actual dialects* of ST use only
letters and dash; however, allow ST-generic to permit numbers, e.g. so
that ST variants *can* use
"rfc2954-char-set": "RFC2954-Char-Set"
in their label dicts, or similar.
(No, I have no idea what RFC 2984 is, nor even whether it exists.)
>> Basically re defines '\w' = '[0-9a-zA-Z_]
> Erm - basically it doesn't - it invokes "locales" which makes life more
> complex (and I have no idea what sre does about '\w').
and I can't say I care much either way, once you're allowing - in the label.
The only need for _ is to separate words, and - is easier to type ;*>
Eddy.