[Python-bugs-list] [ python-Bugs-505747 ] markupbase handling of HTML declarations
SourceForge.net
noreply@sourceforge.net
Sun, 30 Mar 2003 06:52:56 -0800
Bugs item #505747, was opened at 2002-01-19 15:37
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=505747&group_id=5470
Category: Python Library
Group: Not a Bug
Status: Closed
>Resolution: Fixed
Priority: 6
Submitted By: Greg Chapman (glchapman)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: markupbase handling of HTML declarations
Initial Comment:
Using Python 2.2., I tried to use websucker.py on this
page:
http://magix.fri.uni-lj.si/orange/start/
This resulted in an exception in ParserBase._scan_name
because _declname_match failed. Examining the source
for the page above, I see there are several tags that
look like: "<![endif]>" where the first character
after "<!" is a '[', not an alpha as mandated by
_delcname_match. Perhaps this is badly formed HTML (I
see it was produced by FrontPage), but if not, it
appears that _scan_name may have to be modified. FYI,
here's the traceback from the exception:
Traceback (most recent call last):
File "C:\Python22\Tools\webchecker\websucker.py",
line 126, in ?
sys.exit(main() or 0)
File "C:\Python22\Tools\webchecker\websucker.py",
line 43, in main
c.run()
File "C:\Python22\Tools\webchecker\webchecker.py",
line 349, in run
self.dopage(url)
File "C:\Python22\Tools\webchecker\webchecker.py",
line 403, in dopage
page = self.getpage(url_pair)
File "C:\Python22\Tools\webchecker\webchecker.py",
line 507, in getpage
return Page(text, url, maxpage=self.maxpage,
checker=self)
File "C:\Python22\Tools\webchecker\webchecker.py",
line 671, in __init__
self.parser.feed(self.text)
File "c:\Python22\lib\sgmllib.py", line 95, in feed
self.goahead(0)
File "c:\Python22\lib\sgmllib.py", line 161, in
goahead
k = self.parse_declaration(i)
File "c:\Python22\lib\markupbase.py", line 66, in
parse_declaration
decltype, j = self._scan_name(j, i)
File "c:\Python22\lib\markupbase.py", line 313, in
_scan_name
self.error("expected name token")
File "c:\Python22\lib\sgmllib.py", line 102, in error
raise SGMLParseError(message)
sgmllib.SGMLParseError: expected name token
----------------------------------------------------------------------
>Comment By: Martin v. Löwis (loewis)
Date: 2003-03-30 16:52
Message:
Logged In: YES
user_id=21627
This has now been fixed with patch 545300, on grounds of
conformance with SGML.
----------------------------------------------------------------------
Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-06-14 03:39
Message:
Logged In: YES
user_id=3066
Ok, here's what I think.
This is not an actual bug in the interpretation of HTML, and
there has not been a recurring pattern of complaints about
this. Given that we do not want to encourage the creation
of broken HTML, this edge case will not be allowed to
further complicate the code.
----------------------------------------------------------------------
Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-02-15 07:13
Message:
Logged In: YES
user_id=3066
Ugh! I don't think that's legal HTML at all. I'll have to
think about the right way to deal with it.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=505747&group_id=5470