Suggestion for a new regular expression extension
nicolas.lehuen at thecrmcompany.com
Fri Nov 21 10:58:04 CET 2003
"Terry Reedy" <tjreedy at udel.edu> a écrit dans le message de
news:sa-dncv-SrE2vyCiRVn-gQ at comcast.com...
> > But in my case, it forces me to duplicate each alternative
> > in the big regexp in my normalisation function,
> > which causes quite tedious maintenance of the whole piece of code.
> I do not see that. I believe I would factor out and label the
> needed-twice pieces and use ''.join([list-of-pieces]) to make the big
I thought of that, and this would of course solve one part of the problem,
which is code duplication. However, it won't spare me a double execution of
the regexps, once to check the whole address and parse it into high level
tokens, and the second time to normalise each token according to various
normalisation rules. That's why using Scanner would be handy, as it could
help me do it in one pass. And the regular expression extension could help
me do it in just one (big, ugly, yet no-nonsense) regular expression, with
no code added.
Anyway, I'm not saying this extension is required to do the job, I'm doing
it already. The extension would greatly simplify the code, which is the
point of using Python, isn't it ;). I guess it would also be useful to
anybody implementing transliterators or tokenizers. Plus, it would be a
feature of the RE engine that may not exist in Perl (just saying this as an
incentive to implement it...).
I'm having a look at _sre.c, but I have to confess that it is not the
easiest piece of code I've seen... Plus, it seems that matches (apparently
created in pattern_new_match) are built using indices within the tested
string, so return arbitrary strings in matches would require quite a few
More information about the Python-list