[Tutor] BeautifulSoup - getting cells without new line characters
Kent Johnson
kent37 at tds.net
Sat Apr 1 13:12:02 CEST 2006
jonasmg at softhome.net wrote:
> Kent Johnson writes:
>
>
>>jonasmg at softhome.net wrote:
>>
>>
>>>List of states:
>>>http://en.wikipedia.org/wiki/U.S._state
>>>
>>>: soup = BeautifulSoup(html)
>>>: # Get the second table (list of states).
>>>: table = soup.first('table').findNext('table')
>>>: print table
>>>
>>>...
>>><tr>
>>><td>WY</td>
>>><td>Wyo.</td>
>>><td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td>
>>><td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
>>>Wyoming">Cheyenne</a></td>
>>><td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
>>>Wyoming">Cheyenne</a></td>
>>><td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image" title=""><img
>>>src="http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin
>>>g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30"
>>>longdesc="/wiki/Image:Flag_of_Wyoming.svg" /></a></td>
>>></tr>
>>></table>
>>>
>>>Of each row (tr), I want to get the cells (td): 1,3,4
>>>(postal,state,capital). But cells 3 and 4 have anchors.
>>
>>So dig into the cells and get the data from the anchor.
>>
>>cells = row('td')
>>cells[0].string
>>cells[2]('a').string
>>cells[3]('a').string
>>
>>Kent
>>
>>_______________________________________________
>>Tutor maillist - Tutor at python.org
>>http://mail.python.org/mailman/listinfo/tutor
>
>
> for row in table('tr'):
> cells = row('td')
> print cells[0]
>
> IndexError: list index out of range
It works for me:
In [1]: from BeautifulSoup import BeautifulSoup as bs
In [2]: soup=bs('''<tr>
...: <td>WY</td>
...: <td>Wyo.</td>
...: <td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td>
...: <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
...: Wyoming">Cheyenne</a></td>
...: <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
...: Wyoming">Cheyenne</a></td>
...: <td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image"
title=""><img
...:
src="http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin
...: g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30"
...: longdesc="/wiki/Image:Flag_of_Wyoming.svg" /></a></td>
...: </tr>
...: </table> '''
...:
...:
...:
...: )
In [18]: rows=soup('tr')
In [19]: rows
Out[19]:
[<tr>
<td>WY</td>
<td>Wyo.</td>
<td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td>
<td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
Wyoming">Cheyenne</a></td>
<td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
Wyoming">Cheyenne</a></td>
<td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image"
title=""><img src="http://upload.
g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30"
longdesc="/wiki/Image:Flag_
</tr>]
In [21]: cells=rows[0]('td')
In [22]: cells
Out[22]:
[<td>WY</td>,
<td>Wyo.</td>,
<td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td>,
<td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
Wyoming">Cheyenne</a></td>,
<td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
Wyoming">Cheyenne</a></td>,
<td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image"
title=""><img src="http://upload
n
g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30"
longdesc="/wiki/Image:Flag_
In [23]: cells[0].string
Out[23]: 'WY'
In [24]: cells[2].a.string
Out[24]: 'Wyoming'
In [25]: cells[3].a.string
Kent
More information about the Tutor
mailing list