[New-bugs-announce] [issue15851] Lib/robotparser.py doesn't accept setting a user agent string, instead it uses the default.

Eduardo A. Bustamante López report at bugs.python.org
Sun Sep 2 20:36:04 CEST 2012


New submission from Eduardo A. Bustamante López:

I found that http://en.wikipedia.org/robots.txt returns 403 if the provided user agent is in a specific blacklist.

And since robotparser doesn't provide a mechanism to change the default user agent used by the opener, it becomes unusable for that site (and sites that have a similar policy).

I think the user should have the possibility to set a specific user agent string, to better identify their bot.

I attach a patch that allows the user to change the opener used by RobotFileParser, in case the need of some specific behavior arises.

I also attach a simple example of how it solves the issue, at least with wikipedia.

----------
components: Library (Lib)
files: robotparser.py.diff
keywords: patch
messages: 169718
nosy: Eduardo.A..Bustamante.López
priority: normal
severity: normal
status: open
title: Lib/robotparser.py doesn't accept setting a user agent string, instead it uses the default.
type: enhancement
versions: Python 2.7
Added file: http://bugs.python.org/file27100/robotparser.py.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15851>
_______________________________________


More information about the New-bugs-announce mailing list