using urllib2

Jeff McNeil jeff at jmcneil.net
Fri Jun 27 22:10:03 CEST 2008


I stumbled across this a while back: http://www.voidspace.org.uk/python/articles/urllib2.shtml.
It covers quite a bit. The urllib2 module is pretty straightforward
once you've used it a few times.  Some of the class naming and whatnot
takes a bit of getting used to (I found that to be the most confusing
bit).

On Jun 27, 1:41 pm, Alexnb <alexnbr... at gmail.com> wrote:
> Okay, I tried to follow that, and it is kinda hard. But since you obviously
> know what you are doing, where did you learn this? Or where can I learn
> this?
>
>
>
>
>
> Maric Michaud wrote:
>
> > Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écrit :
> >> I have never used the urllib or the urllib2. I really have looked online
> >> for help on this issue, and mailing lists, but I can't figure out my
> >> problem because people haven't been helping me, which is why I am here!
> >> :].
> >> Okay, so basically I want to be able to submit a word to dictionary.com
> >> and
> >> then get the definitions. However, to start off learning urllib2, I just
> >> want to do a simple google search. Before you get mad, what I have found
> >> on
> >> urllib2 hasn't helped me. Anyway, How would you go about doing this. No,
> >> I
> >> did not post the html, but I mean if you want, right click on your
> >> browser
> >> and hit view source of the google homepage. Basically what I want to know
> >> is how to submit the values(the search term) and then search for that
> >> value. Heres what I know:
>
> >> import urllib2
> >> response = urllib2.urlopen("http://www.google.com/")
> >> html = response.read()
> >> print html
>
> >> Now I know that all this does is print the source, but thats about all I
> >> know. I know it may be a lot to ask to have someone show/help me, but I
> >> really would appreciate it.
>
> > This example is for google, of course using pygoogle is easier in this
> > case,
> > but this is a valid example for the general case :
>
> >>>>[207]: import urllib, urllib2
>
> > You need to trick the server with an imaginary User-Agent.
>
> >>>>[208]: def google_search(terms) :
> >     return urllib2.urlopen(urllib2.Request("http://www.google.com/search?"
> > +
> > urllib.urlencode({'hl':'fr', 'q':terms}),
> >                                            headers={'User-Agent':'MyNav
> > 1.0
> > (compatible; MSIE 6.0; Linux'})
> >                           ).read()
> >    .....:
>
> >>>>[212]: res = google_search("python & co")
>
> > Now you got the whole html response, you'll have to parse it to recover
> > datas,
> > a quick & dirty try on google response page :
>
> >>>>[213]: import re
>
> >>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2
> class=r>.*?</h2>',
> > res) ]
> > ...[229]:
> > ['Python Gallery',
> >  'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty ...',
> >  'Re: os x, panther, python &amp; co: msg#00041',
> >  'Re: os x, panther, python &amp; co: msg#00040',
> >  'Cardiff Web Site Design, Professional web site design services ...',
> >  'Python Properties',
> >  'Frees &lt; Programs &lt; Python &lt; Bin-Co',
> >  'Torb: an interface between Tcl and CORBA',
> >  'Royal Python Morphs',
> >  'Python &amp; Co']
>
> > --
> > _____________
>
> > Maric Michaud
> > --
> >http://mail.python.org/mailman/listinfo/python-list
>
> --
> View this message in context:http://www.nabble.com/using-urllib2-tp18150669p18160312.html
> Sent from the Python - python-list mailing list archive at Nabble.com.






More information about the Python-list mailing list