Parsing markup.
Javier Collado
javier.collado at gmail.com
Fri Nov 26 00:11:07 EST 2010
Hello,
2010/11/26 Joe Goldthwaite <joe at goldthwaites.com>:
> I’m attempting to parse some basic tagged markup.
>
> Elementree and lxml seem to want a full formatted
> page, not a small segment like this one.
BeautifulSoup (http://www.crummy.com/software/BeautifulSoup/) could
help in the parsing:
>>> from BeautifulSoup import BeautifulSoup as Soup
>>> s = Soup(text)
>>> print s.prettify()
<p>
This is a paragraph with
<b>
bold
</b>
and
<i>
italic
</i>
elements in it
</p>
<p>
It can be made up of multiple lines separated by pagagraph tags.
</p>
>>> s.findAll('b')
[<b>bold</b>]
>>> s.findAll('i')
[<i>italic</i>]
>>> s.findAll('p')
[<p>This is a paragraph with <b>bold</b> and <i>italic</i> elements in it</p>,
<p>It can be made up of multiple lines separated by pagagraph tags.</p>]
Best regards,
Javier
More information about the Python-list
mailing list