
Andrey Fedorov wrote:
So a couple of libraries (Django being the most popular that comes to mind) try to match a string against several regex expressions. I'm wondering if there exists a library to "merge" multiple compiled regex expressions into a single lookup. This could be exposed in a interface like:
So for an example:
rd = ReDict()
rd['^foo$'] = 1 rd['^bar*$'] = 2 rd['^bar$'] = 3
assert rd['foo'] == [1] assert rd['barrrr'] == [2] assert rd['bar'] == [2,3]
The naive implementation I link is obviously inefficient. What would be the easiest way to go about compiling a set of regex-es together, so that they can be matched against a string at the same time? Are there any standard libraries that do this I'm not aware of?
This is a rather difficult problem to solve in general. If you only need to search for a finite set of words, there are few good algorithms for this: http://en.wikipedia.org/wiki/Aho-Corasick_algorithm * used in Unix fgrep * Python implementation: http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/ http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm#Rabin.E2.80.... * uses hashing -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 02 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/