[ python-Bugs-1541697 ] Recently introduced sgmllib regexp bug hangs Python
SourceForge.net
noreply at sourceforge.net
Thu Aug 17 03:51:00 CEST 2006
Bugs item #1541697, was opened at 2006-08-17 02:51
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1541697&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Submitted By: John J Lee (jjlee)
Assigned to: Nobody/Anonymous (nobody)
Summary: Recently introduced sgmllib regexp bug hangs Python
Initial Comment:
Looks like revision 47154 introduced a regexp that
hangs Python (Ctrl-C won't kill the process, CPU usage
sits near 100%) under some circumstances. A test case
is attached (sgmllib.html and hang_sgmllib.py).
The problem isn't seen if you read the whole file (or
nearly the whole file) at once. But that doesn't make
it a non-bug, AFAICS.
I'm not sure what the problem is, but presumably the
relevant part of the patch is this:
+starttag = re.compile(r'<[a-zA-Z][-_.:a-zA-Z0-9]*\s*('
+ r'\s*([a-zA-Z_][-:.a-zA-Z_0-9]*)(\s*=\s*'
+
r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~@]'
+
r'[][\-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~\'"@]*(?=[\s>/<])))?'
+ r')*\s*/?\s*(?=[<>])')
The patch attached to bug 1515142 (also from Sam Ruby
-- claims to fix a regression introduced by his recent
sgmllib patches, and has not yet been applied) does NOT
fix the problem.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1541697&group_id=5470
More information about the Python-bugs-list
mailing list