RERP 0.8.0, an alternative robots.txt parser

Philip Semanchuk NikitaTheSpider at
Mon Jun 12 03:57:53 CEST 2006

RERP (Robot Exclusion Rules Parser) is an alternative to Python's 
standard robotparser module. I was motivated to write this because the 
Python's robotparser doesn't gracefully handle non-ASCII which occurs in 
about .1% of robots.txt files. This module (RERP) handles non-ASCII and 
also adds a few other niceties (like the ability to customize the 
user-agent string sent when fetching a robots.txt file). 

The code, documentation, background, discussion of the specs and 
examples are all here:


Bulk HTML validation, link checking and more

More information about the Python-announce-list mailing list