On 7 June 2012 12:47, Andreas Maier <andreas.r.maier@gmx.de> wrote:
Hi,
-> Is there a way to get lxml's HTML serialization support to produce the explicitly closed form ?
I actually found a solution to this problem, by specifying more parameters to the etree.tostring() function:
fp.write(etree.tostring(out_tree, encoding="utf-8", method="html", xml_declaration=None, pretty_print=False, with_tail=True, standalone=None, doctype='<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">'))
However, this means that the output parameters specified in the XSLT become meaningless, because they are overwritten in the Python program.
-> Is there a way to let lxml honor the output parameters specified in the XSLT ? It seems to me that would be a good default behavior.
Yes, an XSLTResultTree has a __str__ method which respects the <xsl:output> options set, so you should be using: fp.write(str(out_tree)) see: http://lxml.de/FAQ.html#what-is-the-difference-between-str-xslt-doc-and-xslt... When no <xsl:output method="..."> is specified then the XSLT spec says method="html" should be used when the root element is <html>. That will invoked the HTMLSerializer (lxml.html.tostring) and give you HTML output (<br> instead of <br />). If you need HTML compatible XHTML output (i.e. <br /> rather than <br> along with no self-closing script tags) then you need to specify <xsl:output method="xml"> along with an XHTML 1.0 doctype.
-> Also, is there a way to get any HTML comments in the element tree to be serialized ? The tostring() parameter with_comments is documented to apply to the C14N output method only, and does not cause the HTML comments to be created when set to True.
To generate comment elements make sure that your XSLT is not inadvertently stripping them out, that probably means you are missing the identity template somewhere. Laurence