[Tutor] scraping and saving in file

Peter Otten __peter__ at web.de
Wed Dec 29 11:46:20 CET 2010


Tommy Kaas wrote:

> I’m trying to learn basic web scraping and starting from scratch. I’m
> using Activepython 2.6.6

> I have uploaded a simple table on my web page and try to scrape it and
> will save the result in a text file. I will separate the columns in the
> file with
> #.
 
> It works fine but besides # I also get spaces between the columns in the
> text file. How do I avoid that?

> This is the script:

> import urllib2
> from BeautifulSoup import BeautifulSoup
> f = open('tabeltest.txt', 'w')
> soup = 
BeautifulSoup(urllib2.urlopen('http://www.kaasogmulvad.dk/unv/python/tabelte
> st.htm').read())
 
> rows = soup.findAll('tr')

> for tr in rows:
>     cols = tr.findAll('td')
>     print >> f,
> cols[0].string,'#',cols[1].string,'#',cols[2].string,'#',cols[3].string
> 
> f.close()

> And the text file looks like this:

> Kommunenr # Kommune # Region # Regionsnr
> 101 # København # Hovedstaden # 1084
> 147 # Frederiksberg # Hovedstaden # 1084
> 151 # Ballerup # Hovedstaden # 1084
> 153 # Brøndby # Hovedstaden # 1084

The print statement automatically inserts spaces, so you can either resort 
to the write method

for i in range(4):
    if i:
        f.write("#")
    f.write(cols[i].string)

which is a bit clumsy, or you build the complete line and then print it as a 
whole:

print >> f, "#".join(col.string for col in cols)

Note that you have non-ascii characters in your data -- I'm surprised that 
writing to a file works for you. I would expect that

import codecs
f = codecs.open("tmp.txt", "w", encoding="utf-8")

is needed to successfully write your data to a file

Peter



More information about the Tutor mailing list