[Python-Dev] 2.5: recently introduced sgmllib regexp bug hangs Python
John J Lee
jjl at pobox.com
Thu Aug 17 03:58:22 CEST 2006
Looks like revision 47154 introduced a regexp that hangs Python (Ctrl-C
won't kill the process, CPU usage sits near 100%) under some
circumstances. There's a test case here:
http://python.org/sf/1541697
The problem isn't seen if you read the whole file at once (or almost the
whole file at once). (But that doesn't make it a non-bug, AFAICS.)
I'm not sure what the problem is, but presumably the relevant part of the
patch is this:
+starttag = re.compile(r'<[a-zA-Z][-_.:a-zA-Z0-9]*\s*('
+ r'\s*([a-zA-Z_][-:.a-zA-Z_0-9]*)(\s*=\s*'
+ r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~@]'
+ r'[][\-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~\'"@]*(?=[\s>/<])))?'
+ r')*\s*/?\s*(?=[<>])')
The patch attached to bug 1515142 (also from Sam Ruby -- claims to fix a
regression introduced by his recent sgmllib patches, and has not yet been
applied) does NOT fix the problem.
If nobody has time to fix this, perhaps rev 47154 should be reverted?
commit message for -r47154:
"""
SF bug #1504333: sgmlib should allow angle brackets in quoted values
(modified patch by Sam Ruby; changed to use separate REs for start and end
tags to reduce matching cost for end tags; extended tests; updated to
avoid
breaking previous changes to support IPv6 addresses in unquoted attribute
values)
"""
John
More information about the Python-Dev
mailing list