[lxml-dev] Validation against an external DTD

Hello, I read through the documentation and I did not find a way to validate an XML-File against an external DTD with lxml. I searched the ML-archive and found several posts but I still do not know exactly, if this functionality is available or not. Right now I am using pyxml to do validation, but since the whole process will switch to xml schemes I am looking for a tool that can validate both (XSD,DTD). Is it possible to extend lxml to do this? I know that I can use a tool like "trang" to convert the DTDs but since there are DTDs that get added or removed from the process this means a lot of manual work. Kind regards, Michael

Hello, Since I did not get any answer and the maillinglist seems to be a little bit more alive I am asking again. Is it possible to extend lxml to validate against external DTDs the same way as it is possible with relax-ng and xsd files now? I have to validate against both (DTDs and XSDs) in the near future and I would prefer to use only ONE xml library and not pyxml and lxml together. Kind regards, Michael On Jan 30, 2007, at 12:59 PM, mike@it-loops.com wrote:
Hello,
I read through the documentation and I did not find a way to validate an XML-File against an external DTD with lxml. I searched the ML- archive and found several posts but I still do not know exactly, if this functionality is available or not. <snip>

Greetings!
help(etree.XMLParser) Help on class XMLParser:
class XMLParser(_BaseParser) | The XML parser. Parsers can be supplied as additional argument to | various parse functions of the lxml API. A default parser is always | available and can be replaced by a call to the global function | 'set_default_parser'. New parsers can be created at any time without a | major run-time overhead. | | The keyword arguments in the constructor are mainly based on the libxml2 | parser configuration. A DTD will also be loaded if validation or | attribute default values are requested. | | Available boolean keyword arguments: | * attribute_defaults - read default attributes from DTD | * dtd_validation - validate (if DTD is available) | * load_dtd - use DTD for parsing | * no_network - prevent network access | * ns_clean - clean up redundant namespace declarations | * recover - try hard to parse through broken XML | * remove_blank_text - discard blank text nodes -----Original Message----- From: lxml-dev-bounces@codespeak.net [mailto:lxml-dev-bounces@codespeak.net] On Behalf Of Michael Guntsche Sent: Tuesday, February 06, 2007 3:47 PM To: lxml-dev@codespeak.net Subject: Re: [lxml-dev] Validation against an external DTD Hello, Since I did not get any answer and the maillinglist seems to be a little bit more alive I am asking again. Is it possible to extend lxml to validate against external DTDs the same way as it is possible with relax-ng and xsd files now? I have to validate against both (DTDs and XSDs) in the near future and I would prefer to use only ONE xml library and not pyxml and lxml together. Kind regards, Michael On Jan 30, 2007, at 12:59 PM, mike@it-loops.com wrote:
Hello,
I read through the documentation and I did not find a way to validate an XML-File against an external DTD with lxml. I searched the ML- archive and found several posts but I still do not know exactly, if this functionality is available or not. <snip>
lxml-dev mailing list lxml-dev@codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev

On Feb 6, 2007, at 9:52 PM, Lee Brown wrote:
class XMLParser(_BaseParser) | The XML parser. Parsers can be supplied as additional argument to | various parse functions of the lxml API. A default parser is always | available and can be replaced by a call to the global function | 'set_default_parser'. New parsers can be created at any time without a | major run-time overhead.
I had a look at this as well, but I do not understand, how I specify the DTD that should be used for validation. I unterstand that the Parser validates against a DTD if it is specified in the XML file and found by the parser during execution. But in my case I need something like this PyXML example: dtd = xmldtd.load_dtd("my dtd file") parser = xmlproc.XMLProcessor() parser.set_application(xmlval.ValidationApp(dtd, parser)) .... parser.parse_file("my xml file that needs to be validated") Kind regards, Michael

Greetings! I do not know if lxml can load a DTD from an external file. And the docinfo attributes on the etree instance are read-only, so there's no help there. As a workaround, though, you might be able to prepend a DOCTYPE string to the beginning of the file before you parse it. -----Original Message----- From: lxml-dev-bounces@codespeak.net [mailto:lxml-dev-bounces@codespeak.net] On Behalf Of Michael Guntsche Sent: Tuesday, February 06, 2007 4:21 PM To: lxml-dev@codespeak.net Subject: Re: [lxml-dev] Validation against an external DTD On Feb 6, 2007, at 9:52 PM, Lee Brown wrote:
class XMLParser(_BaseParser) | The XML parser. Parsers can be supplied as additional argument to | various parse functions of the lxml API. A default parser is always | available and can be replaced by a call to the global function | 'set_default_parser'. New parsers can be created at any time without a | major run-time overhead.
I had a look at this as well, but I do not understand, how I specify the DTD that should be used for validation. I unterstand that the Parser validates against a DTD if it is specified in the XML file and found by the parser during execution. But in my case I need something like this PyXML example: dtd = xmldtd.load_dtd("my dtd file") parser = xmlproc.XMLProcessor() parser.set_application(xmlval.ValidationApp(dtd, parser)) .... parser.parse_file("my xml file that needs to be validated") Kind regards, Michael _______________________________________________ lxml-dev mailing list lxml-dev@codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev

Hi, Lee Brown wrote:
I do not know if lxml can load a DTD from an external file. And the docinfo attributes on the etree instance are read-only, so there's no help there.
lxml does not currently have support for adding/updating DTD subsets, though we already had a couple of requests to make this work - patches are very welcome.
As a workaround, though, you might be able to prepend a DOCTYPE string to the beginning of the file before you parse it.
No guarantee, but that should generally work. Stefan
-----Original Message----- From: lxml-dev-bounces@codespeak.net [mailto:lxml-dev-bounces@codespeak.net] On Behalf Of Michael Guntsche Sent: Tuesday, February 06, 2007 4:21 PM To: lxml-dev@codespeak.net Subject: Re: [lxml-dev] Validation against an external DTD
On Feb 6, 2007, at 9:52 PM, Lee Brown wrote:
class XMLParser(_BaseParser) | The XML parser. Parsers can be supplied as additional argument to | various parse functions of the lxml API. A default parser is always | available and can be replaced by a call to the global function | 'set_default_parser'. New parsers can be created at any time without a | major run-time overhead.
I had a look at this as well, but I do not understand, how I specify the DTD that should be used for validation. I unterstand that the Parser validates against a DTD if it is specified in the XML file and found by the parser during execution. But in my case I need something like this
PyXML example:
dtd = xmldtd.load_dtd("my dtd file") parser = xmlproc.XMLProcessor() parser.set_application(xmlval.ValidationApp(dtd, parser)) .... parser.parse_file("my xml file that needs to be validated")
Kind regards, Michael
_______________________________________________ lxml-dev mailing list lxml-dev@codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev
_______________________________________________ lxml-dev mailing list lxml-dev@codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev
participants (4)
-
Lee Brown
-
Michael Guntsche
-
mike@it-loops.com
-
Stefan Behnel