[Python-ideas] Matching multiple regex patterns simultaneously

Mathias Panzenböck grosser.meister.morti at gmx.net
Tue Mar 2 23:32:42 CET 2010


On 03/02/2010 09:39 PM, Andrey Fedorov wrote:
> So a couple of libraries (Django being the most popular that comes to
> mind) try to match a string against several regex expressions. I'm
> wondering if there exists a library to "merge" multiple compiled regex
> expressions into a single lookup. This could be exposed in a interface like:
>
>     http://gist.github.com/319905
>
>
> So for an example:
>
> rd = ReDict()
>
> rd['^foo$'] = 1
> rd['^bar*$'] = 2
> rd['^bar$'] = 3
>
> assert rd['foo'] == [1]
> assert rd['barrrr'] == [2]
> assert rd['bar'] == [2,3]
>
> The naive implementation I link is obviously inefficient. What would be
> the easiest way to go about compiling a set of regex-es together, so
> that they can be matched against a string at the same time? Are there
> any standard libraries that do this I'm not aware of?
>
> Cheers,
> Andrey
>

You can do something like this:
r=re.compile('(?P<a>^foo$)|(?P<b>(?P<c>^bar)r*$)')
 >>> r.match('barrrr').groupdict()
{'a': None, 'c': 'bar', 'b': 'barrrr'}
 >>> r.match('bar').groupdict()
{'a': None, 'c': 'bar', 'b': 'bar'}
 >>> r.match('foo').groups()
('foo', None, None)

Ok, it's not 100% the same (it does not match 'ba'), but I think this should cover most cases where 
you want something like this. Hmm, well. You should resolve it to a form where there are no 
overlappings in the subexpressions:
(?P<a>^foo$)|(?P<b>^ba$)|(?P<c>^bar$)|(?P<d>^bar+$)

	-panzi



More information about the Python-ideas mailing list