[Python-bugs-list] [ python-Bugs-813986 ] robotparser interactively
prompts for username and password
SourceForge.net
noreply at sourceforge.net
Sun Sep 28 09:06:03 EDT 2003
Bugs item #813986, was opened at 2003-09-28 09:06
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=813986&group_id=5470
Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Erik Demaine (edemaine)
Assigned to: Nobody/Anonymous (nobody)
Summary: robotparser interactively prompts for username and password
Initial Comment:
This is a rare occurrence, but if a /robots.txt file is
password-protected on an http server, robotparser
interactively prompts (via raw_input) for a username
and password, because that is urllib's default
behavior. One example of such a URL, at least at the
time of this writing, is
http://www.cosc.canterbury.ac.nz/robots.txt
Given that robotparser and robots.txt is all about
*robots* (not interactive users), I don't think this
interactive behavior is terribly appropriate. Attached
is a simple patch to robotparser.py to fix this
behavior, forcing urllib to return the 401 error that
it ought to.
Another issue is whether a 401 (Authorization Required)
URL means that everything should be allowed or
everything should be disallowed. I'm not sure what's
"right". Reading the spec, it says 'This file must be
accessible via HTTP on the local URL "/robots.txt"'
which I would read to mean it should be accessible
without username/password. On the other hand, the
current robotparser.py code says "if self.errcode ==
401 or self.errcode == 403: self.disallow_all = 1"
which has the opposite effect. I'll leave deciding
which is most appropriate to the powers that be.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=813986&group_id=5470
More information about the Python-bugs-list
mailing list