Suitable Python code to scrape specific details from web pages.
Roy Smith
roy at panix.com
Tue Aug 12 17:28:15 EDT 2014
In article <a8f10c4f-d4a0-48ed-ae92-2a43e9a094c3 at googlegroups.com>,
Simon Evans <musicalhacksaw at yahoo.co.uk> wrote:
> Dear Programmers,
> I have been looking at the You tube 'Web Scraping Tutorials' of Chris Reeves.
> I have tried a few of his python programs in the Python27 command prompt, but
> altered them from accessing data using links say from the Dow Jones index, to
> accessing the details I would be interested in accessing from the 'Racing
> Post' on a daily basis. Anyhow, the code it returns is not in the example I
> am going to give, is not the information I am seeking, instead of returning
> the given odds on a horse, it only returns a [], which isn't much use.
> I would be glad if you could tell me where I am going wrong.
Rather than comment on your specific code (but, thank you for posting
it), I'll make a couple of more generic suggestions.
First, if you're doing anything with fetching web pages, install the
wonderful requests module (http://docs.python-requests.org/en/latest/).
It's so much easier to work with than urllib.
Second, if you're going to be parsing web pages, trying to use regexes
is a losing game. You need something that knows how to parse HTML. The
canonical answer is lxml (http://lxml.de/), but Beautiful Soup
(http://www.crummy.com/software/BeautifulSoup/) is less intimidating to
use.
More information about the Python-list
mailing list