simple spider in python
Lawrence D'Oliveiro
ldo at geek-central.gen.new_zealand
Sat Sep 1 02:55:52 EDT 2007
In message <mailman.202.1187902138.32294.python-list at python.org>, Michael
Bentley wrote:
> First thing to know is that google doesn't like the User-agent header
> urllib2 uses by default -- you'll have to masquerade as a browser
> (google throws a 403 error if you connect as 'User-Agent: Python-
> urllib/2.5': look into urllib2.build_opener()).
A bit small-minded of Google, don't you think. They also block the default
user-agent header for wget.
More information about the Python-list
mailing list