thoughts on regular expression improvements

I've been doing a lot of RE hacking lately, and some possible improvements suggest themselves. 1. Multiple occurrences of a named group Right now, you can compose RE's with x = re.compile("...") y = re.compile("..." + x.pattern + "...") But if x contains named groups, you run into trouble if you have something like z = re.compile("..." + x.pattern + "..." + x.pattern + "...") which can easily happen if x could occur at various places in z. The issue is that a named group is only allowed once, which isn't a bad error-prevention mechanism, but it would be nice if it could occur more than once (in alternative subexpressions), perhaps enabled by a another RE flag. 2. Easier composition. Writing y = re.compile("..." + x.pattern + "...") seems a tad groty, to use a term from my childhood, and affords the RE engine no purchase on the composition, which can be an issue if the flags for x are different from the flags for y. If the first argument to re.compile could be a tuple or list, you could write y = re.compile(["...", x, "..."]) and the engine could see that "..." is a string, and that x is a RE, and could inspect x as necessary. 3. Edit distances. The RE engine TRE (http://laurikari.net/tre/about/) supports fuzzy matching of strings, using edit distances. One can write an expression like "(total){~2}" which would any string that's "total" with no more than two edit errors. You can also specify insertions, deletions, and substitution limits separately with "+", "-", and "#". That would be nice to have... Bill

On Fri, May 6, 2011 at 22:32, Bill Janssen <janssen@parc.com> wrote:
I might've been more specific: I think MRAB is working on regex as a playground for new regex-module things (and potentially a replacement for stdlib re), so it might be a good place to implement these kinds of things or discuss them. Cheers, Dirkjan

On Fri, May 6, 2011 at 22:32, Bill Janssen <janssen@parc.com> wrote:
I might've been more specific: I think MRAB is working on regex as a playground for new regex-module things (and potentially a replacement for stdlib re), so it might be a good place to implement these kinds of things or discuss them. Cheers, Dirkjan
participants (2)
-
Bill Janssen
-
Dirkjan Ochtman