[Python-bugs-list] [ python-Bugs-522898 ] Robotparser does not handle empty paths
noreply@sourceforge.net
noreply@sourceforge.net
Tue, 26 Feb 2002 02:40:53 -0800
Bugs item #522898, was opened at 2002-02-26 02:40
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=522898&group_id=5470
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Costas Malamas (cmalamas)
Assigned to: Nobody/Anonymous (nobody)
Summary: Robotparser does not handle empty paths
Initial Comment:
The robotparser module handles incorrectly empty paths
in the allow/disallow directives.
According to: http://www.robotstxt.org/wc/norobots-
rfc.html, the following rule should be a global
*allow*:
User-agent: *
Disallow:
My reading of the RFC is that an empty path is always
a global allow (for both Allow and Disallow
directives) so that the syntax is backwards
compatible --there was no Allow directive in the
original syntax.
Suggested fix:
robotparser.RuleLine.applies_to() becomes:
def applies_to(self, filename):
if not self.path:
self.allowance = 1
return self.path=="*" or re.match(self.path,
filename)
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=522898&group_id=5470