Parsing markup.

Joe Goldthwaite joe at goldthwaites.com
Fri Nov 26 04:28:17 CET 2010


I'm attempting to parse some basic tagged markup.  The output of the TinyMCE
editor returns a string that looks something like this;

 

<p>This is a paragraph with <b>bold</b> and <i>italic</i> elements in
it</p><p>It can be made up of multiple lines separated by pagagraph
tags.</p>

 

I'm trying to render the paragraph into a bit mapped image.  I need to parse
it out into the various paragraph and bold/italic pieces.  I'm not sure the
best way to approach it.  Elementree and lxml seem to want a full formatted
page, not a small segment like this one.  When I tried to feed a line
similar to the above to lxml I got an error; "XMLSyntaxError: Extra content
at the end of the document".

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20101125/62ec6132/attachment.html>


More information about the Python-list mailing list