RERP 0.8.0, an alternative robots.txt parser
Philip Semanchuk
NikitaTheSpider at gmail.com
Mon Jun 12 03:57:53 CEST 2006
RERP (Robot Exclusion Rules Parser) is an alternative to Python's
standard robotparser module. I was motivated to write this because the
Python's robotparser doesn't gracefully handle non-ASCII which occurs in
about .1% of robots.txt files. This module (RERP) handles non-ASCII and
also adds a few other niceties (like the ability to customize the
user-agent string sent when fetching a robots.txt file).
The code, documentation, background, discussion of the specs and
examples are all here:
http://NikitaTheSpider.com/articles/rerp.html
Enjoy!
--
Philip
http://NiktaTheSpider.com/
Bulk HTML validation, link checking and more
More information about the Python-announce-list
mailing list