[Tutor] html scrapeing
Bob Gailer
bgailer at sbcglobal.net
Thu Jun 30 04:49:18 CEST 2005
At 10:36 AM 6/26/2005, Nathan Hughes wrote:
>Ive been looking for way to scrape the data from a html table, but dont
>know even where to start, or how to do..
>
>an example can be found here of the table (
><http://www.dragon256.plus.com/timer.html>http://www.dragon256.plus.com/timer.html
>) - i'd like to extract all the data except for the delete column and then
>just print each row..
Use module urllib2 for obtaining the page source:
import urllib2
page = urllib2.urlopen("http://www.dragon256.plus.com/timer.html")
html = page.readlines()
You now have a list of lines.
Now you can use any number of string parsing tools to locate lines starting
with <tr> to find each new row, then <td> to find each cell, then search
past the tag(s) to find the cell text.
You have 3 cases to deal with:
<td class='normal' align='left'><a href='javascript:OnTimer
(1)'>Glastonbury 2005</a></td>
<td class='normal' align='left'>BBC THREE</td>
<td class='normal' align='middle'><input type='checkbox' onclick='OnDelete
(1)'></td>
Is that enough to get you started?
Bob Gailer
mailto:bgailer at alum.rpi.edu
510 558 3275 home
720 938 2625 cell
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20050629/71d52573/attachment.htm
More information about the Tutor
mailing list