[Tutor] BeautifulSoup - getting cells without new line characters

Kent Johnson kent37 at tds.net
Sat Apr 1 13:12:02 CEST 2006


jonasmg at softhome.net wrote:
> Kent Johnson writes: 
> 
> 
>>jonasmg at softhome.net wrote: 
>>
>>
>>>List of states:
>>>http://en.wikipedia.org/wiki/U.S._state  
>>>
>>>: soup = BeautifulSoup(html)
>>>: # Get the second table (list of states).
>>>: table = soup.first('table').findNext('table')
>>>: print table  
>>>
>>>...
>>><tr>
>>><td>WY</td>
>>><td>Wyo.</td>
>>><td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td>
>>><td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne, 
>>>Wyoming">Cheyenne</a></td>
>>><td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne, 
>>>Wyoming">Cheyenne</a></td>
>>><td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image" title=""><img 
>>>src="http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin 
>>>g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30" 
>>>longdesc="/wiki/Image:Flag_of_Wyoming.svg" /></a></td>
>>></tr>
>>></table>  
>>>
>>>Of each row (tr), I want to get the cells (td): 1,3,4 
>>>(postal,state,capital). But cells 3 and 4 have anchors. 
>>
>>So dig into the cells and get the data from the anchor. 
>>
>>cells = row('td')
>>cells[0].string
>>cells[2]('a').string
>>cells[3]('a').string 
>>
>>Kent 
>>
>>_______________________________________________
>>Tutor maillist  -  Tutor at python.org
>>http://mail.python.org/mailman/listinfo/tutor
> 
> 
> for row in table('tr'):
>    cells = row('td')
>    print cells[0] 
> 
> IndexError: list index out of range 

It works for me:


In [1]: from BeautifulSoup import BeautifulSoup as bs

In [2]: soup=bs('''<tr>
    ...: <td>WY</td>
    ...: <td>Wyo.</td>
    ...: <td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td>
    ...: <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
    ...: Wyoming">Cheyenne</a></td>
    ...: <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
    ...: Wyoming">Cheyenne</a></td>
    ...: <td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image" 
title=""><img
    ...: 
src="http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin
    ...: g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30"
    ...: longdesc="/wiki/Image:Flag_of_Wyoming.svg" /></a></td>
    ...: </tr>
    ...: </table> '''
    ...:
    ...:
    ...:
    ...: )

In [18]: rows=soup('tr')

In [19]: rows
Out[19]:
[<tr>
<td>WY</td>
<td>Wyo.</td>
<td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td>
<td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
Wyoming">Cheyenne</a></td>
<td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
Wyoming">Cheyenne</a></td>
<td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image" 
title=""><img src="http://upload.

g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30" 
longdesc="/wiki/Image:Flag_
</tr>]

In [21]: cells=rows[0]('td')

In [22]: cells
Out[22]:
[<td>WY</td>,
  <td>Wyo.</td>,
  <td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td>,
  <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
Wyoming">Cheyenne</a></td>,
  <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
Wyoming">Cheyenne</a></td>,
  <td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image" 
title=""><img src="http://upload
n
g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30" 
longdesc="/wiki/Image:Flag_

In [23]: cells[0].string
Out[23]: 'WY'

In [24]: cells[2].a.string
Out[24]: 'Wyoming'

In [25]: cells[3].a.string


Kent



More information about the Tutor mailing list