
On 03/02/2010 09:39 PM, Andrey Fedorov wrote:
So a couple of libraries (Django being the most popular that comes to mind) try to match a string against several regex expressions. I'm wondering if there exists a library to "merge" multiple compiled regex expressions into a single lookup. This could be exposed in a interface like:
So for an example:
rd = ReDict()
rd['^foo$'] = 1 rd['^bar*$'] = 2 rd['^bar$'] = 3
assert rd['foo'] == [1] assert rd['barrrr'] == [2] assert rd['bar'] == [2,3]
The naive implementation I link is obviously inefficient. What would be the easiest way to go about compiling a set of regex-es together, so that they can be matched against a string at the same time? Are there any standard libraries that do this I'm not aware of?
Cheers, Andrey
You can do something like this: r=re.compile('(?P<a>^foo$)|(?P<b>(?P<c>^bar)r*$)')
r.match('barrrr').groupdict() {'a': None, 'c': 'bar', 'b': 'barrrr'} r.match('bar').groupdict() {'a': None, 'c': 'bar', 'b': 'bar'} r.match('foo').groups() ('foo', None, None)
Ok, it's not 100% the same (it does not match 'ba'), but I think this should cover most cases where you want something like this. Hmm, well. You should resolve it to a form where there are no overlappings in the subexpressions: (?P<a>^foo$)|(?P<b>^ba$)|(?P<c>^bar$)|(?P<d>^bar+$) -panzi