<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 11 (filtered medium)">
<style>
<!--
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman";}
a:link, span.MsoHyperlink
{color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:Arial;
color:windowtext;}
@page Section1
{size:8.5in 11.0in;
margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
{page:Section1;}
-->
</style>
</head>
<body lang=EN-US link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>I’m attempting to parse some basic tagged markup. The
output of the TinyMCE editor returns a string that looks something like this;<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><p>This is a paragraph with <b>bold</b>
and <i>italic</i> elements in it</p><p>It can be made
up of multiple lines separated by pagagraph tags.</p><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>I’m trying to render the paragraph into a bit mapped
image. I need to parse it out into the various paragraph and bold/italic pieces.
I’m not sure the best way to approach it. Elementree and lxml seem to
want a full formatted page, not a small segment like this one. When I tried to
feed a line similar to the above to lxml I got an error; “XMLSyntaxError:
Extra content at the end of the document”.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
</div>
</body>
</html>