newb: BeautifulSoup
crybaby
joemystery123 at gmail.com
Fri Sep 21 08:42:34 EDT 2007
I added extra td tags to your example, for whatever reason I am
getting None. When I do the following:
print all_tds[0].string
print all_tds[8].string
from BeautifulSoup import BeautifulSoup
doc = """
<html>
<head>
<title></title>
</head>
<body>
<table>
</table>
<table>
<tr><td>hello</td></tr>
<tr><td>world</td><td>goodbye</td></tr>
<tr>
<td width=1 height=0 bgcolor="#800000"><img src="/img/
spacer.gif" width=1 height=0 alt="|"/></td>
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> 48.884 </font></td>
<td width=1 height=0 bgcolor="#800000"><img src="/img/
spacer.gif" width=1 height=0 alt="|"/></td>
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> 49.950 </font></td>
<td width=1 height=0 bgcolor="#800000"><img src="/img/
spacer.gif" width=1 height=0 alt="|"/></td>
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> 69.322 </font></td>
<td width=1 height=0 bgcolor="#800000"><img src="/img/
spacer.gif" width=1 height=0 alt="|"/></td>
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> 99.740 </font></td>
<td width=1 height=0 bgcolor="#800000"><img src="/img/
spacer.gif" width=1 height=0 alt="|"/></td>
</tr>
</table>
</body>
</html>
"""
soup = BeautifulSoup(doc)
tables = soup.findAll('table')
target_table = tables[1]
all_tds = target_table.findAll('td')
print all_tds[0].string
print all_tds[8].string
tds_str = all_tds[8].string
print tds_str
Output I am getting is following:
>>> hello
None
None
I am not sure why I am getting None for these lines:
print all_tds[0].string
print all_tds[8].string
On Sep 21, 3:38 am, 7stud <bbxx789_0... at yahoo.com> wrote:
> On Sep 20, 9:04 pm, crybaby <joemystery... at gmail.com> wrote:
>
> > I need to traverse a html page with big table that has many row and
> > columns. For example, how to go 35th td tag and do regex to retireve
> > the content. After that is done, you move down to 15th td tag from
> > 35th tag (35+15) and do regex to retrieve the content?
>
> 1) You can find your table using one of these methods:
>
> a)
> target_table = soup.find('table', id='car_parts')
>
> b)
> tables = soup.findall('table')
> target_table = tables[2]
>
> The tables are put in a list in the order that they appear on the
> page.
>
> 2) You can get all the td's in the table using this statement:
>
> all_tds = target_table.findall('td')
>
> 3) You can get the contents of the tags using these statements:
>
> print all_tds[34].string
> print all_tds[49].string
>
> Here is an example:
>
> from BeautifulSoup import BeautifulSoup
>
> doc = """
> <html>
> <head>
> <title></title>
> </head>
> <body>
> <table>
> </table>
>
> <table>
> <tr><td>hello</td></tr>
> <tr><td>world</td><td>goodbye</td></tr>
> </table>
> </body>
> </html>
> """
>
> soup = BeautifulSoup(doc)
>
> tables = soup.findAll('table')
> target_table = tables[1]
>
> all_tds = target_table.findAll('td')
> print all_tds[0].string
> print all_tds[2].string
>
> --output:--
> hello
> goddbye
More information about the Python-list
mailing list