extracting html table rows into a list
Peter Campbell
pc at acs.co.nz
Thu Nov 22 16:49:55 EST 2001
I have something that does this (sort of), I extract data from a web site
for stock prices.
The main code revolves around the regexp function, I am looking for SPAN
tags here but you will want to change so you are looking for TR or TD tags
plus any other stuff as appropriate.
# stock price grabber (c) Peter Campbell, 5th July 2001. Email pc at acs.co.nz
package require http 2.3
load ./fbsql.so
sql connect localhost fbase ""
sql selectdb STOCKS
# grab the data for each letter of the alphabet
set letters {A B C D E F G H I J K L M N O P Q R S T U V W X Y Z}
set date [clock format [clock seconds] -format "%Y-%m-%d"]
foreach code $letters {
set token [http::geturl "www.nzse.co.nz/market/price_by_stock/[string
tolower $code].html"]
set data [set $token\(body)]
http::cleanup $token
# remove all non-breaking spaces
regsub -all { } $data {} data
set matches [lrange [regexp -inline -all {<SPAN[^>]*\>([^<]*)\</SPAN}
$data] 26 end]
regsub -all {,} $matches "" matches
regsub -all {\$} $matches "" matches
foreach {d1 code d2 bid d3 offer d4 first d5 high d6 low d7 last d8 market
d9 volume d10 value} $matches {
if {$code == ""} { continue }
if {[string toupper $code] != $code} { continue }
if {[sql "SELECT STOCK FROM STOCK_PRICE where STOCK = '$code' and DATE =
'$date'"] != ""} {
if {[catch {sql "UPDATE STOCK_PRICE SET BID = $bid, OFFER = $offer, FIRST
= $first, HIGH = $high, LOW = $low,
LAST = $last, MARKET = $market, VOLUME = $volume, VALUE = $value
where STOCK = '$code' and DATE = '$date'"} msg]} {
puts "Error updating $code, $msg"
}
} else {
if {[catch {sql "INSERT INTO STOCK_PRICE (STOCK, DATE, BID, OFFER, FIRST,
HIGH, LOW, LAST, MARKET, VOLUME, VALUE)
VALUES
('$code','$date',$bid,$offer,$first,$high,$low,$last,$market,$volume,$value)
"} msg]} {
puts "Error inserting $code, $msg"
}
}
}
}
damien Wetzel <dwetzel at altern.org> wrote in message
news:9aadf0f.0111220750.719466de at posting.google.com...
> hi ,
> does any body has a script which parse a big table from an html file
> and create a list of rows ?
> thanks for any responses
More information about the Python-list
mailing list