extracting html table rows into a list

Peter Campbell pc at acs.co.nz
Thu Nov 22 16:49:55 EST 2001


I have something that does this (sort of), I extract data from a web site
for stock prices.
The main code revolves around the regexp function, I am looking for SPAN
tags here but you will want to change so you are looking for TR or TD tags
plus any other stuff as appropriate.



# stock price grabber (c) Peter Campbell, 5th July 2001. Email pc at acs.co.nz

package require http 2.3
load ./fbsql.so

sql connect localhost fbase ""
sql selectdb STOCKS

# grab the data for each letter of the alphabet
set letters {A B C D E F G H I J K L M N O P Q R S T U V W X Y Z}
set date [clock format [clock seconds] -format "%Y-%m-%d"]

foreach code $letters {
 set token [http::geturl "www.nzse.co.nz/market/price_by_stock/[string
tolower $code].html"]
 set data [set $token\(body)]
 http::cleanup $token

 # remove all non-breaking spaces
 regsub -all { } $data {} data

 set matches [lrange [regexp -inline -all {<SPAN[^>]*\>([^<]*)\</SPAN}
$data] 26 end]
 regsub -all {,} $matches "" matches
 regsub -all {\$} $matches "" matches

 foreach {d1 code d2 bid d3 offer d4 first d5 high d6 low d7 last d8 market
d9 volume d10 value} $matches {
  if {$code == ""} { continue }
  if {[string toupper $code] != $code} { continue }

  if {[sql "SELECT STOCK FROM STOCK_PRICE where STOCK = '$code' and DATE =
'$date'"] != ""} {
   if {[catch {sql "UPDATE STOCK_PRICE SET BID = $bid, OFFER = $offer, FIRST
= $first, HIGH = $high, LOW = $low,
   LAST = $last, MARKET = $market, VOLUME = $volume, VALUE = $value
   where STOCK = '$code' and DATE = '$date'"} msg]} {
    puts "Error updating $code, $msg"
   }
  } else {
   if {[catch {sql "INSERT INTO STOCK_PRICE (STOCK, DATE, BID, OFFER, FIRST,
HIGH, LOW, LAST, MARKET, VOLUME, VALUE)
   VALUES
('$code','$date',$bid,$offer,$first,$high,$low,$last,$market,$volume,$value)
"} msg]} {
    puts "Error inserting $code, $msg"
   }
  }
 }
}



damien Wetzel <dwetzel at altern.org> wrote in message
news:9aadf0f.0111220750.719466de at posting.google.com...
> hi ,
> does any body has a script which parse a big table from an html file
> and create a list of rows ?
> thanks for any responses





More information about the Python-list mailing list