[Tutor] downloader-script

Fri, 27 Sep 2002 16:30:03 +0700

How to download a web page? Try this ...
#####
import urllib
url = 'http://www.python.org/'
f = urllib.urlopen(url)
html = f.read()
f.close()

print html
#####
It uses "urllib" module. The url is the url you wish to download.

How to convert html page to text?
It won't be easy. But you can use "HTMLParser" or "sgmllib". They both are
almost the same. Try this ...
#####
from HTMLParser import HTMLParser
class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print 'Tag & attrs:', tag, attrs
    def handle_endtag(self, tag):
        print 'Tag:', tag
    def handle_data(self, data):
        print 'Data:', data

import urllib
url = 'http://www.python.org/'
f = urllib.urlopen(url)
html = f.read()
f.close()

p = MyHTMLParser()
p.feed(html)
p.close()
#####

To make it works as you wish, you should read the Python documentation. Hope
it helps you.

-----
Arief

>-----Original Message-----
>From: tutor-admin@python.org [mailto:tutor-admin@python.org]On Behalf Of
>nano
>Sent: Friday, 27 September 2002 3:11 PM
>To: python-tutor
>Subject: [Tutor] downloader-script
>
>
>hi pythoners,
>
>from the mail subject u guys can guess that i'm a newbie too.
>now i'm using webware to develop a website. i want to know how to make a
>function for downloading an object from the page. the object is in
>html-format, (contains html tag) and i want to convert them (without the
>tags) into text format.
>anybody can help me (sure there are...)
>sorry for my english.
>
>thanks in advance,
>
>nano'
>
>
>
>_______________________________________________
>Tutor maillist  -  Tutor@python.org
>http://mail.python.org/mailman/listinfo/tutor
>