[Python-bugs-list] [ python-Bugs-522898 ] Robotparser does not handle empty paths

noreply@sourceforge.net noreply@sourceforge.net
Thu, 28 Feb 2002 07:32:04 -0800


Bugs item #522898, was opened at 2002-02-26 11:40
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=522898&group_id=5470

Category: Python Library
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Costas Malamas (cmalamas)
Assigned to: Nobody/Anonymous (nobody)
Summary: Robotparser does not handle empty paths

Initial Comment:
The robotparser module handles incorrectly empty paths 
in the allow/disallow directives.

According to: http://www.robotstxt.org/wc/norobots-
rfc.html, the following rule should be a global 
*allow*:
User-agent: *
Disallow: 
      
My reading of the RFC is that an empty path is always 
a global allow (for both Allow and Disallow 
directives) so that the syntax is backwards 
compatible --there was no Allow directive in the 
original syntax.

Suggested fix:
robotparser.RuleLine.applies_to() becomes:
    def applies_to(self, filename):
        if not self.path:
           self.allowance = 1
        return self.path=="*" or re.match(self.path, 
filename)

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-02-28 16:32

Message:
Logged In: YES 
user_id=21627

This is fixed in robotparser.py 1.11.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=522898&group_id=5470