Beautiful Soup Table Parsing
Andreas Perstinger
andipersti at gmail.com
Thu Aug 9 03:25:49 EDT 2012
On 09.08.2012 01:58, Tom Russell wrote:
> For instance this code below:
>
> soup = BeautifulSoup(urlopen('http://online.wsj.com/mdc/public/page/2_3021-tradingdiary2.html?mod=mdc_pastcalendar'))
>
> table = soup.find("table",{"class": "mdcTable"})
> for row in table.findAll("tr"):
> for cell in row.findAll("td"):
> print cell.findAll(text=True)
>
> brings in a list that looks like this:
[snip]
> What I want to do is only be getting the data for NYSE and nothing
> else so I do not know if that's possible or not. Also I want to do
> something like:
>
> If cell.contents[0] == "Advances":
> Advances = next cell or whatever??---> this part I am not sure how to do.
>
> Can someone help point me in the right direction to get the first data
> point for the Advances row? I have others I will get as well but
> figure once I understand how to do this I can do the rest.
To get the header row you could do something like:
header_row = table.find(lambda tag: tag.td.string == "NYSE")
From there you can look for the next row you are interested in:
advances_row = header_row.findNextSibling(lambda tag: tag.td.string ==
"Advances")
You could also iterate through all next siblings of the header_row:
for row in header_row.findNextSiblings("tr"):
# do something
Bye, Andreas
More information about the Python-list
mailing list