regexp: maximum recursion limit exceeded
fgeiger at datec.at
Sun Jan 6 18:18:53 CET 2002
As I faced the RuntimeError "maximum recursion limit exceeded" when I
applied re.findall() to an HTML-file to find form contents, I thought the
reason could be a limit within findall().
So I tried the following to replace findall() with:
def findall(re, string):
'''Find all /re/ in /string/.
Idea: Alex Martelli in a response to a Usenet post on 4th of January
mos = 
pos = 0
mo = re.search(string, pos)
if mo is None:
return [mo.group(0) for mo in mos]
pos = mo.end()
But the problem remains. Now re.search() reports the same error, which
means, that not findall() but some deeper mechanisms have problems with the
string that has to be searched in.
If you want to reproduce the error, try this (any ill formatting caused by
my newsreader, sorry):
stringToBeSearchedIn = ("<form blablabla>%s</form>" %
("<blablabla>blablabla</blablabla> " * 500)) * 100
# print stringToBeSearchedIn
for stringFound in findall(re.compile(r"\<form.*?\/form\>", re.DOTALL |
re.MULTILINE | re.IGNORECASE), stringToBeSearchedIn)[:10]:
It's very likely, that a form causes this error, if the contents between the
form tags are large and - more important - have many '<tag></tag>' pairs.
To overcome this, I could
1) use find() to search for '<form' and '/form>',
2) use the SGML parser,
3) re.search for the opening tag and kill everything up to it, then
re.search for the closing tag and kill everything after it.
BTW, increasing the recursion depth doesn't solve the problem.
What other options do I have? How is this done "Pythonicly"?
(Platform: Win2kPro/SP2, ActivePython 2.1).
Many thanks in advance and best regards
More information about the Python-list