RERP 0.8.0, an alternative robots.txt parser

Philip Semanchuk NikitaTheSpider at gmail.com
Mon Jun 12 03:57:53 CEST 2006


RERP (Robot Exclusion Rules Parser) is an alternative to Python's 
standard robotparser module. I was motivated to write this because the 
Python's robotparser doesn't gracefully handle non-ASCII which occurs in 
about .1% of robots.txt files. This module (RERP) handles non-ASCII and 
also adds a few other niceties (like the ability to customize the 
user-agent string sent when fetching a robots.txt file). 

The code, documentation, background, discussion of the specs and 
examples are all here:
http://NikitaTheSpider.com/articles/rerp.html


Enjoy!

-- 
Philip
http://NiktaTheSpider.com/
Bulk HTML validation, link checking and more


More information about the Python-announce-list mailing list