[ python-Bugs-592441 ] Webchecker error on http://www.naleo.org
SourceForge.net
noreply at sourceforge.net
Sat Aug 7 22:49:59 CEST 2004
Bugs item #592441, was opened at 2002-08-08 05:40
Message generated for change (Comment added) made by mwh
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=592441&group_id=5470
Category: Demos and Tools
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Carlos Conti (mcsolrac)
Assigned to: Jeremy Hylton (jhylton)
Summary: Webchecker error on http://www.naleo.org
Initial Comment:
Webchecker version 1.25.6.1 on Windows 2000
Professional.
Run webchecker with this argument
http://www.naleo.org/WSJArticle002.htm
Webchecker will return this traceback:
Traceback (most recent call last):
File "C:\Python22\Tools\webchecker\webchecker.py",
line 858, in ?
main()
File "C:\Python22\Tools\webchecker\webchecker.py",
line 222, in main
c.run()
File "C:\Python22\Tools\webchecker\webchecker.py",
line 349, in run
self.dopage(url)
File "C:\Python22\Tools\webchecker\webchecker.py",
line 403, in dopage
page = self.getpage(url_pair)
File "C:\Python22\Tools\webchecker\webchecker.py",
line 507, in getpage
return Page(text, url, maxpage=self.maxpage,
checker=self)
File "C:\Python22\Tools\webchecker\webchecker.py",
line 671, in __init__
self.parser.feed(self.text)
File "C:\Python22\lib\sgmllib.py", line 95, in feed
self.goahead(0)
File "C:\Python22\lib\sgmllib.py", line 161, in goahead
k = self.parse_declaration(i)
File "C:\Python22\lib\markupbase.py", line 66, in
parse_declaration
decltype, j = self._scan_name(j, i)
File "C:\Python22\lib\markupbase.py", line 313, in
_scan_name
self.error("expected name token")
File "C:\Python22\lib\sgmllib.py", line 102, in error
raise SGMLParseError(message)
sgmllib.SGMLParseError: expected name token
I believe this is because of the xml in the source code
(see WSJArticle002_source.txt attached to this bug
report).
Even if the code in this page is poorly formatted,
webchecker should be able continue checking other
links in this domain (rather than stopping). For example
webchecker could report unable to check
http://www.naleo.org/WSJArticle002.htm and return
traceback like the above, and then continue with the rest
of the domain.
----------------------------------------------------------------------
>Comment By: Michael Hudson (mwh)
Date: 2004-08-07 21:49
Message:
Logged In: YES
user_id=6656
jlgijsbers reports this as fixed by revision 1.30 of
webchecker.py on #python-dev IRC.
----------------------------------------------------------------------
Comment By: Jeremy Hylton (jhylton)
Date: 2002-08-13 14:36
Message:
Logged In: YES
user_id=31392
No need to apologize. Everyone is welcome to submit bug
reports here. There are, however, lots of programmers who
submit bugs, so I find it helpful to ask :-). I'll look
into this, but it's not the highest priority.
----------------------------------------------------------------------
Comment By: Carlos Conti (mcsolrac)
Date: 2002-08-08 23:06
Message:
Logged In: YES
user_id=591396
I'd love to submit a patch, but I am a newbie to both Python
and programming. I apologize if this space is only intended
for programmers; I am a QA engineer just getting acquainted
to the wonderful world of Python.
----------------------------------------------------------------------
Comment By: Jeremy Hylton (jhylton)
Date: 2002-08-08 20:20
Message:
Logged In: YES
user_id=31392
I've seen a variety of parsing problems kill webchecker. I
agree that these exceptions should be caught somewhere so
that they are not fatal. Care to submit a patch?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=592441&group_id=5470
More information about the Python-bugs-list
mailing list