problem with (s)re in 1.6b1

Eddy De Greef degreef at imec.be
Mon Aug 7 15:41:53 EDT 2000


Hi,

when I tried to use version 1.6b1, I ran into a problem with the new 
(s)re module. Code that used to work with earlier versions now stopped
working. I've been able to reproduce the problem on a small example:

------------------------------------------------------------------------------
import re

for LEN in [9996, 9997, 9998]:

   print "\nLENGTH:", LEN
   text = "1" + LEN*"x" + "2"

   print "\n  Non-lazy match:" 
   print "   ", re.search("1(.*)2", text)

   try:
      print "\n  Lazy match:"
      print "   ", re.search("1(.*?)2", text)
   except Exception, e:
      print "Exception:", e

   print "\n  Extra char + lazy match:"
   print "   ", re.search(" ?1(.*?)2", text)

------------------------------------------------------------------------------

When I run this example, I get the following output:

------------------------------------------------------------------------------

LENGTH: 9996

  Non-lazy match:
    <SRE_Match object at 400f9298>

  Lazy match:
    <SRE_Match object at 400f9298>

  Extra char + lazy match:
    <SRE_Match object at 400f9298>

LENGTH: 9997

  Non-lazy match:
    <SRE_Match object at 400f9298>

  Lazy match:
    <SRE_Match object at 400f9298>

  Extra char + lazy match:
    None

LENGTH: 9998

  Non-lazy match:
    <SRE_Match object at 400f9298>

  Lazy match:
    Exception: maximum recursion limit exceeded

  Extra char + lazy match:
    None
    
------------------------------------------------------------------------------

According to the CVS logs, a recursion limit of 10000 was recently added,
so I guess that it is the reason for this behaviour.

I have two remarks:

  - From a user's point of view, it seems rather strange that lazy matches
    are limited in length, while non-lazy matches aren't. I can understand 
    the reasons for such a limit from a programmer's point of view, but from
    a user's point of view, it seems rather artificial.

  - The matching failure for the 3rd pattern is apparently also caused by
    the same limit. Shouldn't it also result in an exception instead of
    silently assuming a mismatch ?
    
    
Regards,

Eddy




More information about the Python-list mailing list