[Tutor] scraping and saving in file SOLVED
Tommy Kaas
tommy.kaas at kaasogmulvad.dk
Wed Dec 29 12:25:19 CET 2010
With Stevens help about writing and Peters help about import codecs - and when I used \r\n instead of \r to give me new lines everything worked.
I just thought that \n would be necessary?
Thanks.
Tommy
> -----Oprindelig meddelelse-----
> Fra: tutor-bounces+tommy.kaas=kaasogmulvad.dk at python.org
> [mailto:tutor-bounces+tommy.kaas=kaasogmulvad.dk at python.org] På
> vegne af Peter Otten
> Sendt: 29. december 2010 11:46
> Til: tutor at python.org
> Emne: Re: [Tutor] scraping and saving in file
>
> Tommy Kaas wrote:
>
> > I’m trying to learn basic web scraping and starting from scratch. I’m
> > using Activepython 2.6.6
>
> > I have uploaded a simple table on my web page and try to scrape it and
> > will save the result in a text file. I will separate the columns in
> > the file with #.
>
> > It works fine but besides # I also get spaces between the columns in
> > the text file. How do I avoid that?
>
> > This is the script:
>
> > import urllib2
> > from BeautifulSoup import BeautifulSoup f = open('tabeltest.txt', 'w')
> > soup =
> BeautifulSoup(urllib2.urlopen('http://www.kaasogmulvad.dk/unv/python/ta
> belte
> > st.htm').read())
>
> > rows = soup.findAll('tr')
>
> > for tr in rows:
> > cols = tr.findAll('td')
> > print >> f,
> > cols[0].string,'#',cols[1].string,'#',cols[2].string,'#',cols[3].strin
> > g
> >
> > f.close()
>
> > And the text file looks like this:
>
> > Kommunenr # Kommune # Region # Regionsnr
> > 101 # København # Hovedstaden # 1084
> > 147 # Frederiksberg # Hovedstaden # 1084
> > 151 # Ballerup # Hovedstaden # 1084
> > 153 # Brøndby # Hovedstaden # 1084
>
> The print statement automatically inserts spaces, so you can either resort to
> the write method
>
> for i in range(4):
> if i:
> f.write("#")
> f.write(cols[i].string)
>
> which is a bit clumsy, or you build the complete line and then print it as a
> whole:
>
> print >> f, "#".join(col.string for col in cols)
>
> Note that you have non-ascii characters in your data -- I'm surprised that
> writing to a file works for you. I would expect that
>
> import codecs
> f = codecs.open("tmp.txt", "w", encoding="utf-8")
>
> is needed to successfully write your data to a file
>
> Peter
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
More information about the Tutor
mailing list