[Tutor] more scraping and saving
Tommy Kaas
tommy.kaas at kaasogmulvad.dk
Mon Jan 3 13:03:32 CET 2011
Hi - I was helped the other day in an attempt to scrape and save a simple
web page. I'm using what I learned and trying another. It should be very
simple, but I only get the first row of names saved.
Can anybody help with an explanation?
(It's a public list of names of doctors with knows connections to the
farmaceutical industry).
This time I try to start at the right table (the third on the page) by using
the class attribute. Does that make sense?
Thanks in advance - here is the code:
import urllib2
from BeautifulSoup import BeautifulSoup
import codecs
f = codecs.open("laeger.txt", "w", encoding="Latin-1")
soup =
BeautifulSoup(urllib2.urlopen('http://www.laegemiddelstyrelsen.dk/include/88
06/tilladelse_laeger.asp').read())
for row in soup('table', {'class' : 'tableLeftRight3030'}):
tds = row('td')
output = ";".join(tds[i].string for i in (0, 1, 2, 3, 4))
f.write(output + '\n')
f.close()
Tommy Kaas
Kaas & Mulvad
Lykkesholms Alle 2A, 3.
1902 Frederiksberg C
Mobil: 27268818
Mail: <mailto:tommy.kaas at kaasogmulvad.dk> tommy.kaas at kaasogmulvad.dk
Web: <http://www.kaasogmulvad.dk> www.kaasogmulvad.dk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110103/12e02308/attachment.html>
More information about the Tutor
mailing list