simple spider in python

Lawrence D'Oliveiro ldo at geek-central.gen.new_zealand
Sat Sep 1 02:55:52 EDT 2007


In message <mailman.202.1187902138.32294.python-list at python.org>, Michael
Bentley wrote:

> First thing to know is that google doesn't like the User-agent header
> urllib2 uses by default -- you'll have to masquerade as a browser
> (google throws a 403 error if you connect as 'User-Agent: Python-
> urllib/2.5': look into urllib2.build_opener()).

A bit small-minded of Google, don't you think. They also block the default
user-agent header for wget.



More information about the Python-list mailing list