
On 12:56 am, phil@bubblehouse.org wrote:
On Mar 28, 2008, at 7:33 PM, Jean-Paul Calderone wrote:
On Fri, 28 Mar 2008 22:59:21 -0000, glyph@divmod.com wrote:
On 02:55 pm, exarkun@divmod.com wrote:
On Fri, 28 Mar 2008 10:51:10 -0400, Phil Christensen phil@bubblehouse.org wrote:
from twisted.web.microdom import parseString s = '<div><span>hello</span> <span>world</span></div>' parseString(s).toxml()
'<?xml version="1.0"?><div><span>hello</span><span>world</span></ div>'
So if you need such advanced XML features as correct whitespace handling, steer clear. ;)
I have to say, I don't find this to be that big an issue. I think if you're using XML as a data interchange format (as I know the original poster was), whitespace is generally syntactically meaningless.
Like many things in Microdom, whitespace handling does not strive to be particularly spec-compliant (the spec does say "An XML processor MUST always pass all characters in a document that are not markup through to the application."), but to be useful for simple cases and stable enough that your code won't break. If you want whitespace you can probably cram it in there. For example, it has a creative misinterpretation of the "xml:space" attribute:
from twisted.web.microdom import parseString s = '<div xml:space="preserve"><span>hello</span> <span>world</span></div>' parseString(s).toxml()
'<?xml version="1.0"?><div xml:space="preserve"><span>hello</span> <span>world</span></div>'
It is also hard-coded to preserve space in <pre> tags, which is also broken because it doesn't really honor namespaces, and therefore has no idea if your document is HTML or not, and it can't read DTDs so it doesn't know if your elements have this attribute set implicitly (and so on and so on).
This could be made into *slightly* less of a hack with a preserveSpace argument to parse*(), of course; the implementation would probably be very straightforward (c.f. MicroDOMParser.shouldPreserveSpace). Maybe someone who actually likes Microdom, such as Phil, will add one, since all I'm committing to here is not totally hating it ;).