[ python-Bugs-1612729 ] webchecker/urllib chokes on 404 pages
SourceForge.net
noreply at sourceforge.net
Sun Dec 10 20:35:39 CET 2006
Bugs item #1612729, was opened at 2006-12-10 20:35
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1612729&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Demos and Tools
Group: Python 2.5
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: Fredrik Lundh (effbot)
Assigned to: Nobody/Anonymous (nobody)
Summary: webchecker/urllib chokes on 404 pages
Initial Comment:
platform: standard Python 2.5 on Windows XP.
webchecker chokes on reponse code 404, which is a bit unfortunate...
the error occurs deep down in urllib, but a plain urllib request to the same page don't result in the same errors, so it's probably related to how webchecker is using the library.
here's an example:
C:\Python25\Tools\webchecker> python webchecker.py http://www.python.org/foo
webchecker version 50851
Round 1 (1 total, 1 to do, 0 done, 0 bad)
No need to save checkpoint
Traceback (most recent call last):
File "webchecker.py", line 892, in <module>
main()
File "webchecker.py", line 222, in main
c.run()
File "webchecker.py", line 349, in run
self.dopage(url)
File "webchecker.py", line 404, in dopage
page = self.getpage(url_pair)
File "webchecker.py", line 509, in getpage
text, nurl = self.readhtml(url_pair)
File "webchecker.py", line 523, in readhtml
f, url = self.openhtml(url_pair)
File "webchecker.py", line 531, in openhtml
f = self.openpage(url_pair)
File "webchecker.py", line 543, in openpage
return self.urlopener.open(url)
File "c:\python25\lib\urllib.py", line 190, in open
return getattr(self, name)(url)
File "c:\python25\lib\urllib.py", line 334, in open_http
return self.http_error(url, fp, errcode, errmsg, headers)
File "c:\python25\lib\urllib.py", line 351, in http_error
return self.http_error_default(url, fp, errcode, errmsg, headers)
File "c:\python25\lib\urllib.py", line 357, in http_error_default
raise IOError, ('http error', errcode, errmsg, headers)
TypeError: EnvironmentError expected at most 3 arguments, got 4
running the same test under Python 2.4 works fine:
C:\python24\Tools\webchecker>python webchecker.py http://www.python.org/foo
webchecker version 36560
Round 1 (1 total, 1 to do, 0 done, 0 bad)
Error ('http error', 404, 'Not Found')
HREF http://www.python.org/foo
from <root>
Final Report (1 total, 0 to do, 1 done, 1 bad)
Error Report:
Error in <root>
HREF http://www.python.org/foo
msg ('http error', 404, 'Not Found')
Saving checkpoint to @webchecker.pickle ...
Done.
Use ``webchecker.py -R'' to restart.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1612729&group_id=5470
More information about the Python-bugs-list
mailing list