[Tutor] more scraping and saving

Tommy Kaas tommy.kaas at kaasogmulvad.dk
Mon Jan 3 13:03:32 CET 2011


Hi - I was helped the other day in an attempt to scrape and save a simple
web page. I'm using what I learned and trying another. It should be very
simple, but I only get the first row of names saved.

Can anybody help with an explanation?

 

(It's a public list of names of doctors with knows connections to the
farmaceutical industry).

This time I try to start at the right table (the third on the page) by using
the class attribute. Does that make sense?

Thanks in advance - here is the code:

 

import urllib2 

from BeautifulSoup import BeautifulSoup

 

 

import codecs

 

f = codecs.open("laeger.txt", "w", encoding="Latin-1")

 

 

soup =
BeautifulSoup(urllib2.urlopen('http://www.laegemiddelstyrelsen.dk/include/88
06/tilladelse_laeger.asp').read())

 

for row in soup('table', {'class' : 'tableLeftRight3030'}):

    tds = row('td')

 

    output = ";".join(tds[i].string for i in (0, 1, 2, 3, 4))

    f.write(output + '\n')

f.close()

 

 

Tommy Kaas

 

Kaas & Mulvad

Lykkesholms Alle 2A, 3.

1902 Frederiksberg C

 

Mobil: 27268818

Mail:  <mailto:tommy.kaas at kaasogmulvad.dk> tommy.kaas at kaasogmulvad.dk

Web:  <http://www.kaasogmulvad.dk> www.kaasogmulvad.dk

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110103/12e02308/attachment.html>


More information about the Tutor mailing list