[lxml-dev] Validating xml files against a DTD
I have a dumb question. This is my first experience with libxml2 and lxml and I was wonder how to validate an xml file given a DTD either on the internet or in a catalog (whatever that is). I have read the api.txt file and it has a nice example on how to do this with a RelaxNG file but it does not show an example of how to validate against a DTD. Is it enough to have the declaration in the top of xml file and the parser will automatically validated it? If that is the case, I would think you would want to turn the validation off in certain cases. I briefly looked at the etree.c file but given that I've not done any wrapping of 'C' code, it's kind of hard to figure out the api. ...jeff
Jeff_Gayle@sil.org wrote:
I have a dumb question.
No it's not, especially as I'm not able to give a good answer. :)
This is my first experience with libxml2 and lxml and I was wonder how to validate an xml file given a DTD either on the internet or in a catalog (whatever that is). I have read the api.txt file and it has a nice example on how to do this with a RelaxNG file but it does not show an example of how to validate against a DTD.
I haven't looked into DTDs at all yet in the context of lxml, I'm afraid. Perhaps helpful, though not really an answer to your question, is to point out that there is a tool called 'trang' which, I believe, can translate DTDs into Relax NG schemas automatically. You could then perhaps use the RelaxNG facilities of lxml.
Is it enough to have the declaration in the top of xml file and the parser will automatically validated it?
I suspect that this will indeed happen; this is the normal XML pattern and libxml2 definitely has a validating parser. It's possible I've turned this off somewhere, but I'm not sure.
If that is the case, I would think you would want to turn the validation off in certain cases.
I think that is probably possible on the libxml2 C level.
I briefly looked at the etree.c file but given that I've not done any wrapping of 'C' code, it's kind of hard to figure out the api.
You should be looking at the etree.pyx file; the etree.c file is autogenerated by Pyrex and you shouldn't normally have to look at it at all. Pyrex allows one to write C wrappers on a higher level. Still, the libxml2 API is huge, not extensively documented, and fairly intimidating, so to figure this out would still be a bit of a task. I myself lack the time to work on it (unless I got paid to do it of course :), and DTD support is not currently high on my agenda, but if you want to discuss this and work towards patches (and tests) to support this, I'd be happy to give you lots of assistance. Regards, Martijn
On Wed, 27 Apr 2005 14:29:37 +0200, Martijn Faassen wrote:
I haven't looked into DTDs at all yet in the context of lxml, I'm afraid. ... Still, the libxml2 API is huge, not extensively documented, and fairly intimidating, so to figure this out would still be a bit of a task. I myself lack the time to work on it (unless I got paid to do it of course :), and DTD support is not currently high on my agenda, but if you want to discuss this and work towards patches (and tests) to support this, I'd be happy to give you lots of assistance.
I recently had the same problem, wanting to validate against DTDs specified in the XML files. I came up with the following libxml2 code, which takes a string parameter and appears to work fine: import libxml2 def _getValidationErrors(payload): errors = [] def store_error(ctx, str): errors.append(str) libxml2.registerErrorHandler(store_error, None) ctx = libxml2.createMemoryParserCtxt(payload, len(payload)) ctx.validate(1) ctx.parseDocument() ctx.doc().freeDoc() if ctx.isValid(): return None else: return "\n".join(errors) I don't have time to work out where/how best to integrate this into lxml, so I just thought I'd post the code here in case it helps somebody else either to do that integration or just to validate their files. Thanks, Malcolm. -- [] j a m k i t web solutions for charities malcolm cleaton T: 020 7549 0520 F: 020 7490 1152 M: 07986 563852 W: www.jamkit.com
participants (3)
-
Jeff_Gayle@sil.org
-
Malcolm Cleaton
-
Martijn Faassen