[Tutor] BeautifulSoup - getting cells without new line characters

jonasmg at softhome.net jonasmg at softhome.net
Fri Mar 31 18:29:33 CEST 2006


Kent Johnson writes: 

> jonasmg at softhome.net wrote:
>> Kent Johnson writes:  
>> 
>> 
>>>jonasmg at softhome.net wrote: 
>>>
>>>> From a table, I want to get the cells for then only choose some of them.   
>>>>
>>>><table>
>>>><tr>
>>>><td>WY</td>
>>>><td>Wyo.</td>
>>>></tr>
>>>>...
>>>></table>   
>>>>
>>>>Using:   
>>>>
>>>>for row in table('tr'): print row.contents   
>>>>
>>>>   ['\n', <td>WY</td>, '\n', <td>Wyo.</td>, '\n']
>>>>   [...]   
>>>>
>>>>I get a new line character between each cell.   
>>>>
>>>>Is possible get them without those '\n'? 
>>>
>>>Well, the newlines are in your data, so you need to strip them or ignore 
>>>them somewhere. 
>> 
>> I want only (for each row) to get some positions (i.e. 
>> row.contents[0],row.contents[2]) 
> 
> It sounds like you should just work with row('td') instead of 
> row.contents. That will give you a list of just the <td> elements. 
> 
> Kent 
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor

You have reason but the problem is that some cells have anchors.
Sorry, I forgot myself to say it. 

and using: 

for row in table('tr'):
    cellText = [cell.string for cell in row('td')]
    print cellText 

I get null values in cell with anchors. 


More information about the Tutor mailing list