[Python-ideas] thoughts on regular expression improvements
Bill Janssen
janssen at parc.com
Fri May 6 21:11:59 CEST 2011
I've been doing a lot of RE hacking lately, and some possible
improvements suggest themselves.
1. Multiple occurrences of a named group
Right now, you can compose RE's with
x = re.compile("...")
y = re.compile("..." + x.pattern + "...")
But if x contains named groups, you run into trouble if you have
something like
z = re.compile("..." + x.pattern + "..." + x.pattern + "...")
which can easily happen if x could occur at various places in z. The
issue is that a named group is only allowed once, which isn't a bad
error-prevention mechanism, but it would be nice if it could occur more
than once (in alternative subexpressions), perhaps enabled by a another
RE flag.
2. Easier composition.
Writing
y = re.compile("..." + x.pattern + "...")
seems a tad groty, to use a term from my childhood, and affords the RE
engine no purchase on the composition, which can be an issue if the
flags for x are different from the flags for y.
If the first argument to re.compile could be a tuple or list, you could write
y = re.compile(["...", x, "..."])
and the engine could see that "..." is a string, and that x is a RE, and
could inspect x as necessary.
3. Edit distances.
The RE engine TRE (http://laurikari.net/tre/about/) supports fuzzy
matching of strings, using edit distances.
One can write an expression like "(total){~2}" which would any string
that's "total" with no more than two edit errors.
You can also specify insertions, deletions, and substitution limits
separately with "+", "-", and "#".
That would be nice to have...
Bill
More information about the Python-ideas
mailing list