[New-bugs-announce] [issue27065] robotparser user agent considered hostile by mod_security rules.
report at bugs.python.org
Thu May 19 19:21:48 EDT 2016
New submission from John Nagle:
"robotparser" uses the default Python user agent when reading the "robots.txt" file, and there's no parameter for changing that.
Unfortunately, the "mod_security" add-on for Apache web server, when used with the standard OWASP rule set, blacklists the default Python USER-AGENT in Rule 990002, User Agent Identification. It doesn't like certain HTTP USER-AGENT values. One of them is "python-httplib2". So any program in Python which accesses the web site will trigger this rule and be blocked form access.
For regular HTTP accesses, it's possible to put a user agent string in the Request object and work around this. But "robotparser" has no such option.
Worse, if "robotparser" has its read of "robots.txt" rejected, it interprets that as a "deny all" robots.txt file, and returns False for all "can_fetch()" requests.
components: Library (Lib)
title: robotparser user agent considered hostile by mod_security rules.
versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce