[Tutor] scraping and saving in file

Tommy Kaas tommy.kaas at kaasogmulvad.dk
Wed Dec 29 11:55:52 CET 2010


Steven D'Aprano wrote:
> But in your case, the best way is not to use print at all. You are writing
to a
> file -- write to the file directly, don't mess about with print. Untested:
> 
> 
> f = open('tabeltest.txt', 'w')
> url = 'http://www.kaasogmulvad.dk/unv/python/tabeltest.htm'
> soup = BeautifulSoup(urllib2.urlopen(url).read())
> rows = soup.findAll('tr')
> for tr in rows:
>      cols = tr.findAll('td')
>      output = "#".join(cols[i].string for i in (0, 1, 2, 3))
>      f.write(output + '\n')  # don't forget the newline after each row
> f.close()

Steven, thanks for the advice. 
I see the point. But now I have problems with the Danish characters. I get
this:

Traceback (most recent call last):
  File "C:/pythonlib/kursus/kommuner-regioner_ny.py", line 36, in <module>
    f.write(output + '\n')  # don't forget the newline after each row
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position
5: ordinal not in range(128)

I have tried to add # -*- coding: utf-8 -*- to the top of the script, but It
doesn't help?

Tommy




More information about the Tutor mailing list