[Tutor] how avoid writing a newline?
Tommy Kaas
tommy.kaas at kaasogmulvad.dk
Wed Jan 12 16:39:46 CET 2011
I'm using Activepython 2.6.6 on PC/Win7
I have made a small scraper script as an exercise for myself.
It scrapes the name and some details of the first 25 billionaires on the
Forbes list.
It works and write the result in a text file, with the columns separated by
"#"
It takes the name from the link (t = i.string) - open the link and scrape
details from the next page.
But I can't find a way to write the name (the variable t) one and only one
time in the beginning of the line.
As t is written now I get it in the beginning of the line but I also get a
newline.
Can I avoid that in a simple way?
Thanks in advance for any help
Tommy
from BeautifulSoup import BeautifulSoup
from mechanize import Browser
f = open("forbes.txt", "w")
br = Browser()
url =
"http://www.forbes.com/lists/2010/10/billionaires-2010_The-Worlds-Billionair
es_Rank.html"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
table = soup.find("table")
l = table.findAll('a')
for i in l[5:]:
t = i.string
print t #to the monitor
br.follow_link(text_regex=r"(.*?)"+t+"(.*?)")
tekst = br.response().read()
soup = BeautifulSoup(tekst)
table1 = soup.find('table', id='billTable')
rows = table1.findAll('tr')
print >> f, t,"#"
for tr in rows:
tds = tr.findAll(text=True)
print >> f, tds[1].string,"#",tds[2].string,"#",
print >> f, '\r\n'
f.close()
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110112/0d64dc2c/attachment.html>
More information about the Tutor
mailing list