python-parser running Beautiful Soup needs to be reviewed

Stef Mientki stef.mientki at gmail.com
Sat Dec 11 16:38:43 EST 2010


On 11-12-2010 17:24, Martin Kaspar wrote:
> Hello commnity
>
> i am new to Python and to Beatiful Soup also!
> It is told to be a great tool to parse and extract content. So here i
> am...:
>
> I want to take the content of a <td>-tag of a table in a html
> document. For example, i have this table
>
> <table class="bp_ergebnis_tab_info">
>     <tr>
>             <td>
>                      This is a sample text
>             </td>
>
>             <td>
>                      This is the second sample text
>             </td>
>     </tr>
> </table>
>
> How can i use beautifulsoup to take the text "This is a sample text"?
>
> Should i make use
> soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'}) to get
> the whole table.
>
> See the target http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323
>
> Well - what have we to do first:
>
> The first thing is t o find the table:
>
> i do this with Using find rather than findall returns the first item
> in the list
> (rather than returning a list of all finds - in which case we'd have
> to add an extra [0]
> to take the first element of the list):
>
>
> table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
>
> Then use find again to find the first td:
>
> first_td = soup.find('td')
>
> Then we have to use renderContents() to extract the textual contents:
>
> text = first_td.renderContents()
>
> ... and the job is done (though we may also want to use strip() to
> remove leading and trailing spaces:
>
> trimmed_text = text.strip()
>
> This should give us:
>
>
> print trimmed_text
> This is a sample text
>
> as desired.
>
>
> What do you think about the code? I love to hear from you!?
I've no opinion.
I'm just struggling with BeautifulSoup myself, finding it one of the toughest libs I've seen ;-)

So the simplest solution I came up with:

Text = """
<table class="bp_ergebnis_tab_info">
    <tr>
            <td>
                     This is a sample text
            </td>

            <td>
                     This is the second sample text
            </td>
    </tr>
</table>
"""
Content = BeautifulSoup ( Text )
print Content.find('td').contents[0].strip()
>>> This is a sample text

And now I wonder how to get the next contents !!

cheers,
Stef
> greetings
> matze




More information about the Python-list mailing list