regexp: maximum recursion limit exceeded

F. GEIGER fgeiger at datec.at
Sun Jan 6 12:18:53 EST 2002


As I faced the RuntimeError "maximum recursion limit exceeded" when I
applied re.findall() to an HTML-file to find form contents, I thought the
reason could be a limit within findall().

So I tried the following to replace findall() with:

def findall(re, string):
   '''Find all /re/ in /string/.

   Idea: Alex Martelli in a response to a Usenet post on 4th of January
2001.
   '''
   mos = []
   pos = 0
   while 1:
      mo = re.search(string, pos)
      if mo is None:
         return [mo.group(0) for mo in mos]
      mos.append(mo)
      pos = mo.end()
   return None

But the problem remains. Now re.search() reports the same error, which
means, that not findall() but some deeper mechanisms have problems with the
string that has to be searched in.

If you want to reproduce the error, try this (any ill formatting caused by
my newsreader, sorry):

def test():
   stringToBeSearchedIn = ("<form blablabla>%s</form>" %
("<blablabla>blablabla</blablabla> " * 500)) * 100
#   print stringToBeSearchedIn

   for stringFound in findall(re.compile(r"\<form.*?\/form\>", re.DOTALL |
re.MULTILINE | re.IGNORECASE), stringToBeSearchedIn)[:10]:
      print stringFound
   return

It's very likely, that a form causes this error, if the contents between the
form tags are large and - more important - have many '<tag></tag>' pairs.

To overcome this, I could

1) use find() to search for '<form' and '/form>',

2) use the SGML parser,

3) re.search for the opening tag and kill everything up to it, then
re.search for the closing tag and kill everything after it.

BTW, increasing the recursion depth doesn't solve the problem.

What other options do I have? How is this done "Pythonicly"?

(Platform: Win2kPro/SP2, ActivePython 2.1).

Many thanks in advance and best regards
Franz GEIGER











More information about the Python-list mailing list