<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=iso-8859-1"><meta name=Generator content="Microsoft Word 14 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";
        mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Calibri","sans-serif";
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri","sans-serif";
        mso-fareast-language:EN-US;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:3.0cm 2.0cm 3.0cm 2.0cm;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=DA link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span lang=EN-US>Hi,<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>I’m trying to learn basic web scraping and starting from scratch. I’m using Activepython 2.6.6<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>I have uploaded a simple table on my web page and try to scrape it and will save the result in a text file. I will separate the columns in the file with #.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>It works fine but besides # I also get spaces between the columns in the text file. How do I avoid that?<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>This is the script:<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>import urllib2 <o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>from BeautifulSoup import BeautifulSoup <o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>f = open('tabeltest.txt', 'w')<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>soup = BeautifulSoup(urllib2.urlopen('http://www.kaasogmulvad.dk/unv/python/tabeltest.htm').read())<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>rows = soup.findAll('tr')<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>for tr in rows:<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> cols = tr.findAll('td')<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US> print >> f, cols[0].string,'#',cols[1].string,'#',cols[2].string,'#',cols[3].string<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US>f.close()<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>And the text file looks like this:<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal>Kommunenr # Kommune # Region # Regionsnr<o:p></o:p></p><p class=MsoNormal>101 # København # Hovedstaden # 1084<o:p></o:p></p><p class=MsoNormal>147 # Frederiksberg # Hovedstaden # 1084<o:p></o:p></p><p class=MsoNormal>151 # Ballerup # Hovedstaden # 1084<o:p></o:p></p><p class=MsoNormal>153 # Brøndby # Hovedstaden # 1084<o:p></o:p></p><p class=MsoNormal><span lang=EN-US>155 # Dragør # Hovedstaden # 1084<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US>Thanks in advance<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:DA'>Tommy Kaas<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:DA'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:DA'>Kaas & Mulvad<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:DA'>Lykkesholms </span><span style='mso-fareast-language:DA'>Alle 2A, 3.<o:p></o:p></span></p><p class=MsoNormal><span style='mso-fareast-language:DA'>1902 Frederiksberg C<o:p></o:p></span></p><p class=MsoNormal><span style='mso-fareast-language:DA'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:DA'>Mobil: 27268818<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:DA'>Mail: </span><span style='mso-fareast-language:DA'><a href="mailto:tommy.kaas@kaasogmulvad.dk"><span lang=EN-US>tommy.kaas@kaasogmulvad.dk</span></a></span><span lang=EN-US style='mso-fareast-language:DA'><o:p></o:p></span></p><p class=MsoNormal><span style='mso-fareast-language:DA'>Web: <a href="http://www.kaasogmulvad.dk">www.kaasogmulvad.dk</a><o:p></o:p></span></p><p class=MsoNormal><o:p> </o:p></p></div></body></html>