regexp: maximum recursion limit exceeded
F. GEIGER
fgeiger at datec.at
Sun Jan 6 12:18:53 EST 2002
As I faced the RuntimeError "maximum recursion limit exceeded" when I
applied re.findall() to an HTML-file to find form contents, I thought the
reason could be a limit within findall().
So I tried the following to replace findall() with:
def findall(re, string):
'''Find all /re/ in /string/.
Idea: Alex Martelli in a response to a Usenet post on 4th of January
2001.
'''
mos = []
pos = 0
while 1:
mo = re.search(string, pos)
if mo is None:
return [mo.group(0) for mo in mos]
mos.append(mo)
pos = mo.end()
return None
But the problem remains. Now re.search() reports the same error, which
means, that not findall() but some deeper mechanisms have problems with the
string that has to be searched in.
If you want to reproduce the error, try this (any ill formatting caused by
my newsreader, sorry):
def test():
stringToBeSearchedIn = ("<form blablabla>%s</form>" %
("<blablabla>blablabla</blablabla> " * 500)) * 100
# print stringToBeSearchedIn
for stringFound in findall(re.compile(r"\<form.*?\/form\>", re.DOTALL |
re.MULTILINE | re.IGNORECASE), stringToBeSearchedIn)[:10]:
print stringFound
return
It's very likely, that a form causes this error, if the contents between the
form tags are large and - more important - have many '<tag></tag>' pairs.
To overcome this, I could
1) use find() to search for '<form' and '/form>',
2) use the SGML parser,
3) re.search for the opening tag and kill everything up to it, then
re.search for the closing tag and kill everything after it.
BTW, increasing the recursion depth doesn't solve the problem.
What other options do I have? How is this done "Pythonicly"?
(Platform: Win2kPro/SP2, ActivePython 2.1).
Many thanks in advance and best regards
Franz GEIGER
More information about the Python-list
mailing list