scraping a web page

Grant Edwards grante at visi.com
Tue Sep 11 11:03:15 EDT 2001


In article <mailman.1000174453.25270.python-list at python.org>, Richard Jones wrote:
> On Tuesday 11 September 2001 11:58, Tom Harris wrote:

>> I used to have a script that would automatically renew my
>> library books by talking to the library catalogue via telnet. I
>> tried to use it again and I find that they have removed the
>> telnet option, and just left the CGI web page. What is the
>> suggested Pythonic solution to retreiving information from http
>> pages, I could use a regex but this would take some care to be
>> less than extremely fragile, and would probably break every
>> time a small change was made to the page.
> 
> Use HTMLParser from the standard library. It lets you define
> do_foo() methods when you extend it - where "foo" is the tag
> name you wish to handle. For example, do_img(self, attributes)
> will be called for every <img> tag in the source, with the
> attributes of the tag passed in as a list of 2-tuples.

Or you could take the books back...

;)

-- 
Grant Edwards                   grante             Yow!  Can I have an IMPULSE
                                  at               ITEM instead?
                               visi.com            



More information about the Python-list mailing list