[Tutor] weather scraping with Beautiful Soup
Che M
pine508 at hotmail.com
Sat Jul 18 02:32:50 CEST 2009
> Date: Sat, 18 Jul 2009 01:09:32 +0200
> From: sander.sweers at gmail.com
> To: tutor at python.org
> Subject: Re: [Tutor] weather scraping with Beautiful Soup
>
> 2009/7/17 Che M <pine508 at hotmail.com>:
> > table = soup.find("td",id="dataTable tm10")
>
> Almost right. attrs should normall be a dict so {'class':'dataTable
> tm10'} but you can use a shortcut, read on.
>
> > ------------------------
> >
> > When I look at the page source for that page, there is this section, which
> > contains the "dataTable tm10" table I want to zoom in on:
> >
> > ------------
> > <table cellspacing="0" cellpadding="0" class="dataTable tm10">
> > <thead>
> > <tr>
> > <td style="width: 25%;"> </td>
> > <td>Current:</td>
> > <td>High:</td>
> > <td>Low:</td>
> > <td>Average:</td>
> > </tr>
> > </thead>
> > <tbody>
> > <tr>
> > <td>Temperature:</td>
> > <td>
> > <span class="nobr"><span class="b">73.6</span> °F</span>
> > </td>
> > <td>
> > <span class="nobr"><span class="b">83.3</span> °F</span>
> > </td>
> > <td>
> > <span class="nobr"><span class="b">64.2</span> °F</span>
> > </td>
> > <td>
> > <span class="nobr"><span class="b">74.1</span> °F</span>
> > </td>
> > --------------
>
> The tag you are looking for is table not td. The tag td is inside the
> table tag. So with shortcut it looks like,
>
> table = soup.find("table","dataTable tm10")
>
> or without shortcut,
>
> table = soup.find("table",{'class':'dataTable tm10'})
>
> Greets
> Sander
Thank you. I was able to find the table in the soup this way.
After a surprising amount of tinkering (for some reason this Soup is more like
chowder than broth for me still), I was able to get my goal, that 74.1 above,
using this:
-------------------
import urllib2
from BeautifulSoup import BeautifulSoup
url = "http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KPAJAMES1&month=7&day=16&year=2009"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)
table = soup.find("table","dataTable tm10") #find the table
tbody = table.find("tbody") #find the table's body
alltd = tbody.findAll('td') #find all the td's
temp_full = alltd[4] #identify the 4th td, the one I want.
print 'temp_full = ', temp_full
temp = temp_full.findNext('span','b').renderContents() #into the span and b and render
print 'temp = ', temp
----------------------
Does this seem like the right (most efficient/readable) way to do this?
Thanks for your time.
CM
_________________________________________________________________
Hotmail® has ever-growing storage! Don’t worry about storage limits.
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage_062009
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090717/9b800c8b/attachment-0001.htm>
More information about the Tutor
mailing list