[Tutor] scraping and saving in file SOLVED

Tommy Kaas tommy.kaas at kaasogmulvad.dk
Wed Dec 29 12:25:19 CET 2010


With Stevens help about writing and Peters help about import codecs - and when I used \r\n instead of \r to give me new lines everything worked. 
I just thought that \n would be necessary?
Thanks.
Tommy

> -----Oprindelig meddelelse-----
> Fra: tutor-bounces+tommy.kaas=kaasogmulvad.dk at python.org
> [mailto:tutor-bounces+tommy.kaas=kaasogmulvad.dk at python.org] På
> vegne af Peter Otten
> Sendt: 29. december 2010 11:46
> Til: tutor at python.org
> Emne: Re: [Tutor] scraping and saving in file
> 
> Tommy Kaas wrote:
> 
> > I’m trying to learn basic web scraping and starting from scratch. I’m
> > using Activepython 2.6.6
> 
> > I have uploaded a simple table on my web page and try to scrape it and
> > will save the result in a text file. I will separate the columns in
> > the file with #.
> 
> > It works fine but besides # I also get spaces between the columns in
> > the text file. How do I avoid that?
> 
> > This is the script:
> 
> > import urllib2
> > from BeautifulSoup import BeautifulSoup f = open('tabeltest.txt', 'w')
> > soup =
> BeautifulSoup(urllib2.urlopen('http://www.kaasogmulvad.dk/unv/python/ta
> belte
> > st.htm').read())
> 
> > rows = soup.findAll('tr')
> 
> > for tr in rows:
> >     cols = tr.findAll('td')
> >     print >> f,
> > cols[0].string,'#',cols[1].string,'#',cols[2].string,'#',cols[3].strin
> > g
> >
> > f.close()
> 
> > And the text file looks like this:
> 
> > Kommunenr # Kommune # Region # Regionsnr
> > 101 # København # Hovedstaden # 1084
> > 147 # Frederiksberg # Hovedstaden # 1084
> > 151 # Ballerup # Hovedstaden # 1084
> > 153 # Brøndby # Hovedstaden # 1084
> 
> The print statement automatically inserts spaces, so you can either resort to
> the write method
> 
> for i in range(4):
>     if i:
>         f.write("#")
>     f.write(cols[i].string)
> 
> which is a bit clumsy, or you build the complete line and then print it as a
> whole:
> 
> print >> f, "#".join(col.string for col in cols)
> 
> Note that you have non-ascii characters in your data -- I'm surprised that
> writing to a file works for you. I would expect that
> 
> import codecs
> f = codecs.open("tmp.txt", "w", encoding="utf-8")
> 
> is needed to successfully write your data to a file
> 
> Peter
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list