TRying to read sercah results from googles web page
James Matthews
nytrokiss at gmail.com
Thu Aug 21 02:04:12 EDT 2008
I use this code.
import urllib2
def urlopen(url):
request = urllib2.Request(url)
opener = urllib2.build_opener()
request.add_header('User-Agent','Mozilla/5.0
(compatible;JewelersLoungeBot/1.0)')
web_page = opener.open(request)
return web_page
On Wed, Aug 20, 2008 at 6:58 AM, Wojtek Walczak <gminick at bzt.bzt> wrote:
> On Wed, 20 Aug 2008 05:42:34 -0700 (PDT), tedpottel at gmail.com wrote:
>
> > the web page. When I try to load in a url with the search results,
> > http://www.google.com/search?hl=en&q=ted', I get a web page that says
> > I do not have permissions. Is theree a way around this, or is Google
> > just to smart????
>
> Try to imitate the web browser. Add 'User-Agent' (with add_header
> method) to your http request. If it won't help, try to add more
> browser-specific variables to your headers. Also, take a look at
> mechanize and its Browser class:
>
> http://wwwsearch.sourceforge.net/mechanize/
>
> FYI and AFAIK, google doesn't allow to use their search engine
> in this way. They even block certain IP addresses it it's constantly
> abusing the search engine with too many requests.
>
> --
> Regards,
> Wojtek Walczak,
> http://tosh.pl/gminick/
> --
> http://mail.python.org/mailman/listinfo/python-list
>
--
http://www.goldwatches.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20080820/c19b8630/attachment-0001.html>
More information about the Python-list
mailing list