[Tutor] parsing html.

Kent Johnson kent37 at tds.net
Wed Jan 16 12:46:35 CET 2008


Shriphani Palakodety wrote:
> Hello,
> I have a html document here which goes like this:
> 
> <A name=4></a><b>Table of Contents</b>
> .........
> <A name=5></a><b>Preface</b>
> 
> Can someone tell me how I can get the string between the <b> tag for
> an a tag for a given value of the name attribute.

In [30]: from BeautifulSoup import BeautifulSoup
In [31]: text = '''<A name=4></a><b>Table of Contents</b>
    ....: .........
    ....: <A name=5></a><b>Preface</b>'''
In [32]: soup = BeautifulSoup(text)
In [40]: soup.find('a', dict(name='5'))
Out[40]: <a name="5"></a>
In [41]: soup.find('a', dict(name='5')).next
Out[41]: <b>Preface</b>
In [42]: soup.find('a', dict(name='5')).next.string
Out[42]: u'Preface'

Note BeautifulSoup lower-cases the tag name.
http://www.crummy.com/software/BeautifulSoup/

Kent


More information about the Tutor mailing list