Suitable Python code to scrape specific details from web pages.
Denis McMahon
denismfmcmahon at gmail.com
Wed Aug 13 10:53:41 EDT 2014
On Tue, 12 Aug 2014 13:00:30 -0700, Simon Evans wrote:
> in accessing from the 'Racing Post' on a daily basis. Anyhow, the code
Following is some starter code. You will have to look at the output,
compare it to the web page, and work out how you want to process it
further. Note that I use beautifulsoup and requests. The output is the
html for each cell in the table with a line of "+" characters at the
table row breaks. I suggest you look at the beautifulsoup documentation
at http://www.crummy.com/software/BeautifulSoup/bs4/doc/ to work out how
you may wish to select which table cells contain data you are interested
in and how to extract it.
#!/usr/bin/python
"""
Program to extract data from racingpost.
"""
from bs4 import BeautifulSoup
import requests
r = requests.get( "http://www.racingpost.com/horses2/cards/card.sd?
race_id=607466&r_date=2014-08-13#raceTabs=sc_" )
if r.status_code == 200:
soup = BeautifulSoup( r.content )
table = soup.find( "table", id="sc_horseCard" )
for row in table.find_all( "tr" ):
for cell in row.find_all( "td" ):
print cell
print "+++++++++++++++++++++++++++++++++++++"
else:
print "HTTP Status", r.status_code
--
Denis McMahon, denismfmcmahon at gmail.com
More information about the Python-list
mailing list