[Tutor] weather scraping with Beautiful Soup

Sat Jul 18 02:32:50 CEST 2009

> Date: Sat, 18 Jul 2009 01:09:32 +0200
> From: sander.sweers at gmail.com
> To: tutor at python.org
> Subject: Re: [Tutor] weather scraping with Beautiful Soup
> 
> 2009/7/17 Che M <pine508 at hotmail.com>:
> > table = soup.find("td",id="dataTable tm10")
> 
> Almost right. attrs should normall be a dict so {'class':'dataTable
> tm10'} but you can use a shortcut, read on.
> 
> > ------------------------
> >
> > When I look at the page source for that page, there is this section, which
> > contains the "dataTable tm10" table I want to zoom in on:
> >
> > ------------
> > <table cellspacing="0" cellpadding="0" class="dataTable tm10">
> > 		<thead>
> > 		<tr>
> > 		<td style="width: 25%;">&nbsp;</td>
> > 		<td>Current:</td>
> > 		<td>High:</td>
> > 		<td>Low:</td>
> > 		<td>Average:</td>
> > 		</tr>
> > 		</thead>
> > 		<tbody>
> > 		<tr>
> > 		<td>Temperature:</td>
> > 		<td>
> >   <span class="nobr"><span class="b">73.6</span>&nbsp;&#176;F</span>
> > </td>
> > 		<td>
> >   <span class="nobr"><span class="b">83.3</span>&nbsp;&#176;F</span>
> > </td>
> > 		<td>
> >   <span class="nobr"><span class="b">64.2</span>&nbsp;&#176;F</span>
> > </td>
> > 		<td>
> >   <span class="nobr"><span class="b">74.1</span>&nbsp;&#176;F</span>
> > </td>
> > --------------
> 
> The tag you are looking for is table not td. The tag td is inside the
> table tag. So with shortcut it looks like,
> 
> table = soup.find("table","dataTable tm10")
> 
> or without shortcut,
> 
> table = soup.find("table",{'class':'dataTable tm10'})
> 
> Greets
> Sander

Thank you.  I was able to find the table in the soup this way.  

After a surprising amount of tinkering (for some reason this Soup is more like
chowder than broth for me still), I was able to get my goal, that 74.1 above,
using this:

-------------------
import urllib2
from BeautifulSoup import BeautifulSoup

url = "http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KPAJAMES1&month=7&day=16&year=2009"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)

table = soup.find("table","dataTable tm10")  #find the table
tbody = table.find("tbody")                  #find the table's body

alltd = tbody.findAll('td')                  #find all the td's
temp_full = alltd[4]                         #identify the 4th td, the one I want.
print 'temp_full = ', temp_full
temp = temp_full.findNext('span','b').renderContents()  #into the span and b and render
print 'temp = ', temp
----------------------

Does this seem like the right (most efficient/readable) way to do this?

Thanks for your time.
CM

_________________________________________________________________
Hotmail® has ever-growing storage! Don’t worry about storage limits. 
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage_062009
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090717/9b800c8b/attachment-0001.htm>