scraping a web page

Grant Edwards grante at
Tue Sep 11 17:03:15 CEST 2001

In article <mailman.1000174453.25270.python-list at>, Richard Jones wrote:
> On Tuesday 11 September 2001 11:58, Tom Harris wrote:

>> I used to have a script that would automatically renew my
>> library books by talking to the library catalogue via telnet. I
>> tried to use it again and I find that they have removed the
>> telnet option, and just left the CGI web page. What is the
>> suggested Pythonic solution to retreiving information from http
>> pages, I could use a regex but this would take some care to be
>> less than extremely fragile, and would probably break every
>> time a small change was made to the page.
> Use HTMLParser from the standard library. It lets you define
> do_foo() methods when you extend it - where "foo" is the tag
> name you wish to handle. For example, do_img(self, attributes)
> will be called for every <img> tag in the source, with the
> attributes of the tag passed in as a list of 2-tuples.

Or you could take the books back...


Grant Edwards                   grante             Yow!  Can I have an IMPULSE
                                  at               ITEM instead?

More information about the Python-list mailing list