[XML-SIG] Round-tripping HTML fragment to XML node

Andrew Ittner andrew.ittner@usa.net
Sat, 26 Apr 2003 08:53:52 -0700


I have an HTML fragment: <P>this is<BR>a paragraph</P>
I want to convert it to XHTML: <p>this is<br/>a paragraph</p>
And store it as a Node in an XML document.

Then, I want to pull the Node back out and convert back to an HTML fragment.

I want to do this automatically (not using regexp, etc.) because:
-each HTML fragment is a separate weblog entry (for Yet Another Weblog Maker
(c))
-I store it in XML to publish using XSL
-even though I'm probably not going to use any other singletons besides <BR>
& <IMG>, I want the parser to handle conversion to well-formed XML
automagically
-my HTML viewer (courtesy wxPython) needs HTML and cannot understand XHTML

I tried converting the fragment to a full XHTML document (works OK), pulling
the body element's content nodes out (can't), and copying them to the XML
doc (nope).  And the reverse is failing on converting XHTML back to HTML.

Since I've only used PyXML's xml.dom.minidom for XML work, I haven't yet
figured out how to do this.  Any ideas?

Andrew Ittner
http://rhymingpanda.com/