[Python-bugs-list] [ python-Bugs-704996 ] HTMLParser fails on <![if ...]>

SourceForge.net noreply@sourceforge.net
Sun, 30 Mar 2003 06:53:43 -0800


Bugs item #704996, was opened at 2003-03-17 15:40
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=704996&group_id=5470

Category: Python Library
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: gary aviv (aaii1025)
Assigned to: Martin v. Löwis (loewis)
Summary: HTMLParser fails on <![if ...]>

Initial Comment:
HTMLParser can't handle constructs such as:

<![if !supportLists]>
...
<![endif]>

which is found in MS Word documents saved as HTML.

The problem is in markupbase.py  

Suggest the follwoing patch:

diff -u /usr/local/lib/python2.2/markupbase.py.orig
/usr/local/lib/python2.2/markupbase.py
--- /usr/local/lib/python2.2/markupbase.py.orig Mon Mar
17 09:23:17 2003
+++ /usr/local/lib/python2.2/markupbase.py      Mon Mar
17 09:26:45 2003
@@ -3,7 +3,8 @@
 import re
 import string

-_declname_match =
re.compile(r'[a-zA-Z][-_.a-zA-Z0-9]*\s*').match
+#_declname_match =
re.compile(r'[a-zA-Z][-_.a-zA-Z0-9]*\s*').match  # was this
+_declname_match =
re.compile(r'[a-zA-Z][-_.a-zA-Z0-9]*\s*|\[.*\]').match
# replace by this
 _declstringlit_match =
re.compile(r'(\'[^\']*\'|"[^"]*")\s*').match

 del re
@@ -73,7 +74,7 @@
             if c == ">":
                 # end of declaration syntax
                 data = rawdata[i+2:j]
-                if decltype == "doctype":
+                if decltype == "doctype" or
decltype[0] == '[':        # handle <![if !supportLists]>
                     self.handle_decl(data)
                 else:
                     self.unknown_decl(data)
@@ -310,7 +311,7 @@
             return string.lower(name), m.end()
         else:
             self.updatepos(declstartpos, i)
-            self.error("expected name token")
+            self.error("expected name token '%s'" %
rawdata[i:i+10]) # improve error message

     # To be overridden -- handlers for unknown objects
     def unknown_decl(self, data):

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2003-03-30 16:53

Message:
Logged In: YES 
user_id=21627

This has now been fixed with patch 545300.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=704996&group_id=5470