[lxml-dev] Test case that segfaults lxml RelaxNG validation

Hi! The following test case results in memory corruption with lxml 0.8, error message being something like: *** glibc detected *** double free or corruption (!prev): 0x08100440 *** Note the missing ".org" after "http://relaxng", i.e. the RNG namespace is wrong. I happened to stumble over this when I tried integrating David Mertz' RNC parser into lxml, which seems to output this wrong namespace. The problem is at the end of RelagNG.__init__: if self._c_schema is NULL: relaxng.xmlRelaxNGFreeParserCtxt(parser_ctxt) raise RelaxNGParseError, "Document is not valid Relax NG" The test case triggers this exception which seems to free the document twice (I'm not sure about that). Does anyone know if xmlRelaxNGFreeParserCtxt already frees the associated document? That would be the wrong thing to do... Stefan Test case: Index: src/lxml/tests/test_etree.py =================================================================== --- src/lxml/tests/test_etree.py (Revision 19500) +++ src/lxml/tests/test_etree.py (Arbeitskopie) @@ -2113,6 +2113,13 @@ self.assertRaises(etree.RelaxNGParseError, etree.RelaxNG, schema) + def test_relaxng_invalid_schema2(self): + schema = self.parse('''\ +<element name="a" xmlns="http://relaxng/ns/structure/1.0" /> +''') + self.assertRaises(etree.RelaxNGParseError, + etree.RelaxNG, schema) + def test_relaxng_include(self): # this will only work if we access the file through path or # file object..

Hi, On Fri, 2005-11-04 at 13:53 +0100, Stefan Behnel wrote:
Hi!
The following test case results in memory corruption with lxml 0.8, error message being something like:
*** glibc detected *** double free or corruption (!prev): 0x08100440 ***
Note the missing ".org" after "http://relaxng", i.e. the RNG namespace is wrong. I happened to stumble over this when I tried integrating David Mertz' RNC parser into lxml, which seems to output this wrong namespace.
The problem is at the end of RelagNG.__init__: if self._c_schema is NULL: relaxng.xmlRelaxNGFreeParserCtxt(parser_ctxt) raise RelaxNGParseError, "Document is not valid Relax NG"
The test case triggers this exception which seems to free the document twice (I'm not sure about that). Does anyone know if xmlRelaxNGFreeParserCtxt already frees the associated document? That would be the wrong thing to do...
[...] Yes, xmlRelaxNGFreeParserCtxt() frees the parsed RelaxNG XML document. Looks like this xmlDoc is intended for internal use only, i.e. for the parsing process only. Even if xmlRelaxNGNewDocParserCtxt() is used to create the context, only a copy of the given xmlDoc is used. Regards, Kasimier

Another try on this one. Stefan Behnel wrote:
The following test case results in memory corruption with lxml 0.8, error message being something like:
*** glibc detected *** double free or corruption (!prev): 0x08100440 ***
The problem is at the end of RelagNG.__init__: if self._c_schema is NULL: relaxng.xmlRelaxNGFreeParserCtxt(parser_ctxt) raise RelaxNGParseError, "Document is not valid Relax NG"
The test case triggers this exception which seems to free the document twice
The complete extract: self._c_schema = relaxng.xmlRelaxNGParse(parser_ctxt) if self._c_schema is NULL: problem-> relaxng.xmlRelaxNGFreeParserCtxt(parser_ctxt) raise RelaxNGParseError, "Document is not valid Relax NG" relaxng.xmlRelaxNGFreeParserCtxt(parser_ctxt) Actually the problem is not at the position where the exception is raised but where we free the parser context, so it looks like it is wrong to free it here. I couldn't check that, but maybe libxml2 already releases it when parsing fails. That also sounds somewhat wrong IMHO, so it may just as well be a bug in libxml2. Still, removing the FreeParserCtxt in the "if" statement keeps it from segfaulting, so I'll commit that to trunk and branches for now, until we know what is the correct solution. I will also add a couple of test cases for broken RNGs. "memory leaks are less harmful than segfaults"-ly, Stefan
Test case: Index: src/lxml/tests/test_etree.py =================================================================== --- src/lxml/tests/test_etree.py (Revision 19500) +++ src/lxml/tests/test_etree.py (Arbeitskopie) @@ -2113,6 +2113,13 @@ self.assertRaises(etree.RelaxNGParseError, etree.RelaxNG, schema)
+ def test_relaxng_invalid_schema2(self): + schema = self.parse('''\ +<element name="a" xmlns="mynamespace" /> +''') + self.assertRaises(etree.RelaxNGParseError, + etree.RelaxNG, schema) + def test_relaxng_include(self): # this will only work if we access the file through path or # file object.. _______________________________________________ lxml-dev mailing list lxml-dev@codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev
participants (2)
-
Kasimier Buchcik
-
Stefan Behnel