Mechiel Lukkien wrote:
just tried lxml-0.8 and than the subversion lxml trunc, but both versions manage to segfault my python. libxml-2.6.16, libxslt-1.1.12, Pyrex-0.9.3, python-2.3.5, openbsd 3.8 is the configuration.
in short: funicode at src/lxml/etree.pyx:1841 can get called with null/None as argument, after which isutf8 segfaults on it.
this happens when i call tostring() on an lxml.etree.XSLT() object, with an empty document as argument (which was a result of a transformation). this is the code i ran:
xsltfile = sys.argv[1] xmlfile = sys.argv[2] xsltdoc = lxml.etree.parse(open(xsltfile, 'r')) xslt = lxml.etree.XSLT(xsltdoc) xml = lxml.etree.parse(open(xmlfile, 'r')) result = xslt.apply(xml) print result print xslt.tostring(result) # can segfault python if result contains "empty document"
this is the xslt file:
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"> </xsl:template> </xsl:stylesheet>
and near-empty xml file:
<?xml version="1.0"?> <blah></blah>
i admit that i know next to nothing about xslt and very little about xml (i was just playing around), but lxml should never make python segfault, whatever stupid thing i do.
my quick fix was to return an empty string at the start of funicode if the string is null. after this, it stopped segfaulting on this small example. good chance that breaks tostring() though.
Hi! Thank you for the bug report. I can reproduce this both on the trunk and my branch using the test case below. It's simply modeled after your example. I'll check if I can figure out something. Stefan Index: src/lxml/tests/test_etree.py =================================================================== --- src/lxml/tests/test_etree.py (Revision 19669) +++ src/lxml/tests/test_etree.py (Arbeitskopie) @@ -2192,6 +2192,21 @@ etree.tostring(result.getroot()) + def test_xslt_empty(self): + # could segfault if result contains "empty document" + xml = '<blah/>' + xslt = ''' + <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> + <xsl:template match="/" /> + </xsl:stylesheet> + ''' + + source = self.parse(xml) + styledoc = self.parse(xslt) + style = etree.XSLT(styledoc) + result = style.apply(source) + xslt.tostring(result) + def test_xslt_shortcut(self): tree = self.parse('<a><b>B</b><c>C</c></a>') style = self.parse('''\