[New-bugs-announce] [issue15851] Lib/robotparser.py doesn't accept setting a user agent string, instead it uses the default.
Eduardo A. Bustamante López
report at bugs.python.org
Sun Sep 2 20:36:04 CEST 2012
New submission from Eduardo A. Bustamante López:
I found that http://en.wikipedia.org/robots.txt returns 403 if the provided user agent is in a specific blacklist.
And since robotparser doesn't provide a mechanism to change the default user agent used by the opener, it becomes unusable for that site (and sites that have a similar policy).
I think the user should have the possibility to set a specific user agent string, to better identify their bot.
I attach a patch that allows the user to change the opener used by RobotFileParser, in case the need of some specific behavior arises.
I also attach a simple example of how it solves the issue, at least with wikipedia.
components: Library (Lib)
title: Lib/robotparser.py doesn't accept setting a user agent string, instead it uses the default.
versions: Python 2.7
Added file: http://bugs.python.org/file27100/robotparser.py.diff
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce