Extracting text using Beautifulsoup

TC no.one at no.where.com
Sun Oct 25 20:13:27 CET 2009


Greetings all.

Working with data from 'http://www.finviz.com/quote.ashx?t=SRS',  I was able 
to  get the info using re, however I thought using Beautifulsoup a more 
elegant approach.
Having a bit of a problem though...

Trying to extract text:

SMA20 -1.77%
SMA50 -9.73%

utilizing attribute body in <td... body=[Distance from 20-Day Simple Moving 
Average].... >

From:
-----------------------HTML 
snippet------------------------------------------------------------
      <td width="7%" class="snapshot-td2-cp" align="left" 
title="cssbody=[tooltip_short_bdy] cssheader=[tooltip_short_hdr] 
body=[Distance from 20-Day Simple Moving Average] offsetx=[10] offsety=[20] 
delay=[300]">
       SMA20
      </td>
      <td width="8%" class="snapshot-td2" align="left">
       <b>
        <span style="color:#aa0000;">
         -1.77%
        </span>
       </b>
      </td>
      <td width="7%" class="snapshot-td2-cp" align="left" 
title="cssbody=[tooltip_short_bdy] cssheader=[tooltip_short_hdr] 
body=[Distance from 50-Day Simple Moving Average] offsetx=[10] offsety=[20] 
delay=[300]">
       SMA50
      </td>
      <td width="8%" class="snapshot-td2" align="left">
       <b>
        <span style="color:#aa0000;">
         -9.73%
        </span>
       </b>
      </td>
-----------------------HTML 
snippet------------------------------------------------------------
Using:

import urllib
from BeautifulSoup import BeautifulSoup
archives_url = 'http://www.finviz.com/quote.ashx?t=SRS'
archives_html = urllib.urlopen(archives_url).read()
soup = BeautifulSoup(archives_html)
t = soup.findAll('table')
for table in t:
    g.write(str(table.name) + '\r\n')
    rows = table.findAll('tr')
    for tr in rows:
        g.write('\r\n\t')
        cols = tr.findAll('td')
        for td in cols:
            ret = str(td.find(name='title'))
            g.write('\t\t' + str(td) + '\r\n')
g.close()

Total failure of course.
Any ideas?
Thanks in advance... 




More information about the Python-list mailing list