[Python-Dev] 2.5: recently introduced sgmllib regexp bug hangs Python

John J Lee jjl at pobox.com
Thu Aug 17 03:58:22 CEST 2006


Looks like revision 47154 introduced a regexp that hangs Python (Ctrl-C 
won't kill the process, CPU usage sits near 100%) under some 
circumstances.  There's a test case here:

http://python.org/sf/1541697


The problem isn't seen if you read the whole file at once (or almost the 
whole file at once).  (But that doesn't make it a non-bug, AFAICS.)

I'm not sure what the problem is, but presumably the relevant part of the 
patch is this:

+starttag = re.compile(r'<[a-zA-Z][-_.:a-zA-Z0-9]*\s*('
+        r'\s*([a-zA-Z_][-:.a-zA-Z_0-9]*)(\s*=\s*'
+        r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~@]'
+        r'[][\-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~\'"@]*(?=[\s>/<])))?'
+    r')*\s*/?\s*(?=[<>])')


The patch attached to bug 1515142 (also from Sam Ruby -- claims to fix a 
regression introduced by his recent sgmllib patches, and has not yet been 
applied) does NOT fix the problem.

If nobody has time to fix this, perhaps rev 47154 should be reverted?


commit message for -r47154:

"""
SF bug #1504333: sgmlib should allow angle brackets in quoted values
(modified patch by Sam Ruby; changed to use separate REs for start and end
  tags to reduce matching cost for end tags; extended tests; updated to 
avoid
  breaking previous changes to support IPv6 addresses in unquoted attribute
  values)
"""


John



More information about the Python-Dev mailing list