88k regex = RuntimeError
Peter Otten
__peter__ at web.de
Tue Feb 14 05:58:38 EST 2006
jodawi wrote:
> I need to find a bunch of C function declarations by searching
> thousands of source or html files for thousands of known function
> names. My initial simple approach was to do this:
>
> rxAllSupported = re.compile(r"\b(" + "|".join(gAllSupported) + r")\b")
> # giving a regex of \b(AAFoo|ABFoo| (uh... 88kb more...) |zFoo)\b
>
> for root, dirs, files in os.walk( ... ):
> ...
> for fileName in files:
> ...
> filePath = os.path.join(root, fileName)
> file = open(filePath, "r")
> contents = file.read()
> ...
> result = re.search(rxAllSupported, contents)
>
> but this happens:
>
> result = re.search(rxAllSupported, contents)
> File "C:\Python24\Lib\sre.py", line 134, in search
> return _compile(pattern, flags).search(string)
> RuntimeError: internal error in regular expression engine
>
> I assume it's hitting some limit, but don't know where the limit is to
> remove it. I tried stepping into it repeatedly with Komodo, but didn't
> see the problem.
>
> Suggestions?
One workaround may be as easy as
wanted = set(["foo", "bar", "baz"])
file_content = "foo bar-baz ignored foo()"
r = re.compile(r"\w+")
found = [name for name in r.findall(file_content) if name in wanted]
print found
Peter
More information about the Python-list
mailing list