Parsing markup.

Javier Collado javier.collado at gmail.com
Fri Nov 26 06:11:07 CET 2010


Hello,

2010/11/26 Joe Goldthwaite <joe at goldthwaites.com>:
> I’m attempting to parse some basic tagged markup.
>
>  Elementree and lxml seem to want a full formatted
> page, not a small segment like this one.

BeautifulSoup (http://www.crummy.com/software/BeautifulSoup/) could
help in the parsing:

>>> from BeautifulSoup import BeautifulSoup as Soup
>>> s = Soup(text)
>>> print s.prettify()
<p>
 This is a paragraph with
 <b>
  bold
 </b>
 and
 <i>
  italic
 </i>
 elements in it
</p>
<p>
 It can be made up of multiple lines separated by pagagraph tags.
</p>
>>> s.findAll('b')
[<b>bold</b>]
>>> s.findAll('i')
[<i>italic</i>]
>>> s.findAll('p')
[<p>This is a paragraph with <b>bold</b> and <i>italic</i> elements in it</p>,
 <p>It can be made up of multiple lines separated by pagagraph tags.</p>]

Best regards,
    Javier



More information about the Python-list mailing list