using urllib2

Maric Michaud maric at aristote.info
Fri Jun 27 12:27:16 CEST 2008


Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écrit :
> I have never used the urllib or the urllib2. I really have looked online
> for help on this issue, and mailing lists, but I can't figure out my
> problem because people haven't been helping me, which is why I am here! :].
> Okay, so basically I want to be able to submit a word to dictionary.com and
> then get the definitions. However, to start off learning urllib2, I just
> want to do a simple google search. Before you get mad, what I have found on
> urllib2 hasn't helped me. Anyway, How would you go about doing this. No, I
> did not post the html, but I mean if you want, right click on your browser
> and hit view source of the google homepage. Basically what I want to know
> is how to submit the values(the search term) and then search for that
> value. Heres what I know:
>
> import urllib2
> response = urllib2.urlopen("http://www.google.com/")
> html = response.read()
> print html
>
> Now I know that all this does is print the source, but thats about all I
> know. I know it may be a lot to ask to have someone show/help me, but I
> really would appreciate it.

This example is for google, of course using pygoogle is easier in this case, 
but this is a valid example for the general case :

>>>[207]: import urllib, urllib2

You need to trick the server with an imaginary User-Agent.

>>>[208]: def google_search(terms) :
    return urllib2.urlopen(urllib2.Request("http://www.google.com/search?" +  
urllib.urlencode({'hl':'fr', 'q':terms}),
                                           headers={'User-Agent':'MyNav 1.0 
(compatible; MSIE 6.0; Linux'})
                          ).read()
   .....:

>>>[212]: res = google_search("python & co")

Now you got the whole html response, you'll have to parse it to recover datas, 
a quick & dirty try on google response page :

>>>[213]: import re

>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2 class=r>.*?</h2>', 
res) ]
...[229]:
['Python Gallery',
 'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty ...',
 'Re: os x, panther, python &amp; co: msg#00041',
 'Re: os x, panther, python &amp; co: msg#00040',
 'Cardiff Web Site Design, Professional web site design services ...',
 'Python Properties',
 'Frees &lt; Programs &lt; Python &lt; Bin-Co',
 'Torb: an interface between Tcl and CORBA',
 'Royal Python Morphs',
 'Python &amp; Co']


-- 
_____________

Maric Michaud



More information about the Python-list mailing list