[Python-bugs-list] [ python-Bugs-505747 ] markupbase handling of HTML declarations
noreply@sourceforge.net
noreply@sourceforge.net
Tue, 12 Mar 2002 22:05:24 -0800
Bugs item #505747, was opened at 2002-01-19 09:37
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=505747&group_id=5470
Category: Python Library
Group: Python 2.2
Status: Open
Resolution: None
>Priority: 6
Submitted By: Greg Chapman (glchapman)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: markupbase handling of HTML declarations
Initial Comment:
Using Python 2.2., I tried to use websucker.py on this
page:
http://magix.fri.uni-lj.si/orange/start/
This resulted in an exception in ParserBase._scan_name
because _declname_match failed. Examining the source
for the page above, I see there are several tags that
look like: "<![endif]>" where the first character
after "<!" is a '[', not an alpha as mandated by
_delcname_match. Perhaps this is badly formed HTML (I
see it was produced by FrontPage), but if not, it
appears that _scan_name may have to be modified. FYI,
here's the traceback from the exception:
Traceback (most recent call last):
File "C:\Python22\Tools\webchecker\websucker.py",
line 126, in ?
sys.exit(main() or 0)
File "C:\Python22\Tools\webchecker\websucker.py",
line 43, in main
c.run()
File "C:\Python22\Tools\webchecker\webchecker.py",
line 349, in run
self.dopage(url)
File "C:\Python22\Tools\webchecker\webchecker.py",
line 403, in dopage
page = self.getpage(url_pair)
File "C:\Python22\Tools\webchecker\webchecker.py",
line 507, in getpage
return Page(text, url, maxpage=self.maxpage,
checker=self)
File "C:\Python22\Tools\webchecker\webchecker.py",
line 671, in __init__
self.parser.feed(self.text)
File "c:\Python22\lib\sgmllib.py", line 95, in feed
self.goahead(0)
File "c:\Python22\lib\sgmllib.py", line 161, in
goahead
k = self.parse_declaration(i)
File "c:\Python22\lib\markupbase.py", line 66, in
parse_declaration
decltype, j = self._scan_name(j, i)
File "c:\Python22\lib\markupbase.py", line 313, in
_scan_name
self.error("expected name token")
File "c:\Python22\lib\sgmllib.py", line 102, in error
raise SGMLParseError(message)
sgmllib.SGMLParseError: expected name token
----------------------------------------------------------------------
Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-02-15 01:13
Message:
Logged In: YES
user_id=3066
Ugh! I don't think that's legal HTML at all. I'll have to
think about the right way to deal with it.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=505747&group_id=5470