[Python-bugs-list] [ python-Bugs-704996 ] HTMLParser fails on <![if ...]>
SourceForge.net
noreply@sourceforge.net
Sun, 30 Mar 2003 06:53:43 -0800
Bugs item #704996, was opened at 2003-03-17 15:40
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=704996&group_id=5470
Category: Python Library
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: gary aviv (aaii1025)
Assigned to: Martin v. Löwis (loewis)
Summary: HTMLParser fails on <![if ...]>
Initial Comment:
HTMLParser can't handle constructs such as:
<![if !supportLists]>
...
<![endif]>
which is found in MS Word documents saved as HTML.
The problem is in markupbase.py
Suggest the follwoing patch:
diff -u /usr/local/lib/python2.2/markupbase.py.orig
/usr/local/lib/python2.2/markupbase.py
--- /usr/local/lib/python2.2/markupbase.py.orig Mon Mar
17 09:23:17 2003
+++ /usr/local/lib/python2.2/markupbase.py Mon Mar
17 09:26:45 2003
@@ -3,7 +3,8 @@
import re
import string
-_declname_match =
re.compile(r'[a-zA-Z][-_.a-zA-Z0-9]*\s*').match
+#_declname_match =
re.compile(r'[a-zA-Z][-_.a-zA-Z0-9]*\s*').match # was this
+_declname_match =
re.compile(r'[a-zA-Z][-_.a-zA-Z0-9]*\s*|\[.*\]').match
# replace by this
_declstringlit_match =
re.compile(r'(\'[^\']*\'|"[^"]*")\s*').match
del re
@@ -73,7 +74,7 @@
if c == ">":
# end of declaration syntax
data = rawdata[i+2:j]
- if decltype == "doctype":
+ if decltype == "doctype" or
decltype[0] == '[': # handle <![if !supportLists]>
self.handle_decl(data)
else:
self.unknown_decl(data)
@@ -310,7 +311,7 @@
return string.lower(name), m.end()
else:
self.updatepos(declstartpos, i)
- self.error("expected name token")
+ self.error("expected name token '%s'" %
rawdata[i:i+10]) # improve error message
# To be overridden -- handlers for unknown objects
def unknown_decl(self, data):
----------------------------------------------------------------------
>Comment By: Martin v. Löwis (loewis)
Date: 2003-03-30 16:53
Message:
Logged In: YES
user_id=21627
This has now been fixed with patch 545300.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=704996&group_id=5470