python-parser running Beautiful Soup needs to be reviewed

Martin Kaspar martin.kaspar at campus-24.com
Sat Dec 11 11:24:26 EST 2010


Hello commnity

i am new to Python and to Beatiful Soup also!
It is told to be a great tool to parse and extract content. So here i
am...:

I want to take the content of a <td>-tag of a table in a html
document. For example, i have this table

<table class="bp_ergebnis_tab_info">
    <tr>
            <td>
                     This is a sample text
            </td>

            <td>
                     This is the second sample text
            </td>
    </tr>
</table>

How can i use beautifulsoup to take the text "This is a sample text"?

Should i make use
soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'}) to get
the whole table.

See the target http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323

Well - what have we to do first:

The first thing is t o find the table:

i do this with Using find rather than findall returns the first item
in the list
(rather than returning a list of all finds - in which case we'd have
to add an extra [0]
to take the first element of the list):


table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})

Then use find again to find the first td:

first_td = soup.find('td')

Then we have to use renderContents() to extract the textual contents:

text = first_td.renderContents()

... and the job is done (though we may also want to use strip() to
remove leading and trailing spaces:

trimmed_text = text.strip()

This should give us:


print trimmed_text
This is a sample text

as desired.


What do you think about the code? I love to hear from you!?

greetings
matze



More information about the Python-list mailing list