[Python-bugs-list] [ python-Bugs-505747 ] markupbase handling of HTML declarations

noreply@sourceforge.net noreply@sourceforge.net
Tue, 12 Mar 2002 22:05:24 -0800


Bugs item #505747, was opened at 2002-01-19 09:37
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=505747&group_id=5470

Category: Python Library
Group: Python 2.2
Status: Open
Resolution: None
>Priority: 6
Submitted By: Greg Chapman (glchapman)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: markupbase handling of HTML declarations

Initial Comment:
Using Python 2.2., I tried to use websucker.py on this 
page:

http://magix.fri.uni-lj.si/orange/start/

This resulted in an exception in ParserBase._scan_name 
because _declname_match failed.  Examining the source 
for the page above, I see there are several tags that 
look like: "<![endif]>" where the first character 
after "<!" is a '[', not an alpha as mandated by 
_delcname_match.  Perhaps this is badly formed HTML (I 
see it was produced by FrontPage), but if not, it 
appears that _scan_name may have to be modified.  FYI, 
here's the traceback from the exception:

Traceback (most recent call last):
  File "C:\Python22\Tools\webchecker\websucker.py", 
line 126, in ?
    sys.exit(main() or 0)
  File "C:\Python22\Tools\webchecker\websucker.py", 
line 43, in main
    c.run()
  File "C:\Python22\Tools\webchecker\webchecker.py", 
line 349, in run
    self.dopage(url)
  File "C:\Python22\Tools\webchecker\webchecker.py", 
line 403, in dopage
    page = self.getpage(url_pair)
  File "C:\Python22\Tools\webchecker\webchecker.py", 
line 507, in getpage
    return Page(text, url, maxpage=self.maxpage, 
checker=self)
  File "C:\Python22\Tools\webchecker\webchecker.py", 
line 671, in __init__
    self.parser.feed(self.text)
  File "c:\Python22\lib\sgmllib.py", line 95, in feed
    self.goahead(0)
  File "c:\Python22\lib\sgmllib.py", line 161, in 
goahead
    k = self.parse_declaration(i)
  File "c:\Python22\lib\markupbase.py", line 66, in 
parse_declaration
    decltype, j = self._scan_name(j, i)
  File "c:\Python22\lib\markupbase.py", line 313, in 
_scan_name
    self.error("expected name token")
  File "c:\Python22\lib\sgmllib.py", line 102, in error
    raise SGMLParseError(message)
sgmllib.SGMLParseError: expected name token



----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-02-15 01:13

Message:
Logged In: YES 
user_id=3066

Ugh!  I don't think that's legal HTML at all.  I'll have to
think about the right way to deal with it.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=505747&group_id=5470