[Tutor] BeautifulSoup - getting cells without new line characters

jonasmg at softhome.net jonasmg at softhome.net
Fri Mar 31 18:47:05 CEST 2006


Kent Johnson writes: 

> jonasmg at softhome.net wrote:
>> You have reason but the problem is that some cells have anchors.
>> Sorry, I forgot myself to say it.  
>> 
>> and using:  
>> 
>> for row in table('tr'):
>>     cellText = [cell.string for cell in row('td')]
>>     print cellText  
>> 
>> I get null values in cell with anchors. 
> 
> Can you give an example of your actual data and the result you want to 
> generate from it? I can't give you a correct answer if you don't tell me 
> the real question. 
> 
> Kent 
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor

List of states:
http://en.wikipedia.org/wiki/U.S._state 

: soup = BeautifulSoup(html)
: # Get the second table (list of states).
: table = soup.first('table').findNext('table')
: print table 

...
<tr>
<td>WY</td>
<td>Wyo.</td>
<td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td>
<td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne, 
Wyoming">Cheyenne</a></td>
<td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne, 
Wyoming">Cheyenne</a></td>
<td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image" title=""><img 
src="http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin 
g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30" 
longdesc="/wiki/Image:Flag_of_Wyoming.svg" /></a></td>
</tr>
</table> 

Of each row (tr), I want to get the cells (td): 1,3,4 
(postal,state,capital). But cells 3 and 4 have anchors. 

Thanks Kent. 


More information about the Tutor mailing list