[Tutor] scraping and saving in file

Tommy Kaas tommy.kaas at kaasogmulvad.dk
Wed Dec 29 10:54:02 CET 2010


Hi,

I’m trying to learn basic web scraping and starting from scratch. I’m using
Activepython 2.6.6

 

I have uploaded a simple table on my web page and try to scrape it and will
save the result in a text file. I will separate the columns in the file with
#.

It works fine but besides # I also get spaces between the columns in the
text file. How do I avoid that?

 

This is the script:

 

import urllib2 

from BeautifulSoup import BeautifulSoup 

 

f = open('tabeltest.txt', 'w')

 

soup =
BeautifulSoup(urllib2.urlopen('http://www.kaasogmulvad.dk/unv/python/tabelte
st.htm').read())

 

rows = soup.findAll('tr')

 

for tr in rows:

    cols = tr.findAll('td')

    print >> f,
cols[0].string,'#',cols[1].string,'#',cols[2].string,'#',cols[3].string

f.close()

 

And the text file looks like this:

 

Kommunenr # Kommune # Region # Regionsnr

101 # København # Hovedstaden # 1084

147 # Frederiksberg # Hovedstaden # 1084

151 # Ballerup # Hovedstaden # 1084

153 # Brøndby # Hovedstaden # 1084

155 # Dragør # Hovedstaden # 1084

 

Thanks in advance

 

Tommy Kaas

 

Kaas & Mulvad

Lykkesholms Alle 2A, 3.

1902 Frederiksberg C

 

Mobil: 27268818

Mail:  <mailto:tommy.kaas at kaasogmulvad.dk> tommy.kaas at kaasogmulvad.dk

Web: www.kaasogmulvad.dk

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20101229/d9f6b8f8/attachment.html>


More information about the Tutor mailing list