scraping a web page
grante at visi.com
Tue Sep 11 17:03:15 CEST 2001
In article <mailman.1000174453.25270.python-list at python.org>, Richard Jones wrote:
> On Tuesday 11 September 2001 11:58, Tom Harris wrote:
>> I used to have a script that would automatically renew my
>> library books by talking to the library catalogue via telnet. I
>> tried to use it again and I find that they have removed the
>> telnet option, and just left the CGI web page. What is the
>> suggested Pythonic solution to retreiving information from http
>> pages, I could use a regex but this would take some care to be
>> less than extremely fragile, and would probably break every
>> time a small change was made to the page.
> Use HTMLParser from the standard library. It lets you define
> do_foo() methods when you extend it - where "foo" is the tag
> name you wish to handle. For example, do_img(self, attributes)
> will be called for every <img> tag in the source, with the
> attributes of the tag passed in as a list of 2-tuples.
Or you could take the books back...
Grant Edwards grante Yow! Can I have an IMPULSE
at ITEM instead?
More information about the Python-list