Jeff_Gayle@sil.org wrote:
I have a dumb question.
No it's not, especially as I'm not able to give a good answer. :)
This is my first experience with libxml2 and lxml and I was wonder how to validate an xml file given a DTD either on the internet or in a catalog (whatever that is). I have read the api.txt file and it has a nice example on how to do this with a RelaxNG file but it does not show an example of how to validate against a DTD.
I haven't looked into DTDs at all yet in the context of lxml, I'm afraid. Perhaps helpful, though not really an answer to your question, is to point out that there is a tool called 'trang' which, I believe, can translate DTDs into Relax NG schemas automatically. You could then perhaps use the RelaxNG facilities of lxml.
Is it enough to have the declaration in the top of xml file and the parser will automatically validated it?
I suspect that this will indeed happen; this is the normal XML pattern and libxml2 definitely has a validating parser. It's possible I've turned this off somewhere, but I'm not sure.
If that is the case, I would think you would want to turn the validation off in certain cases.
I think that is probably possible on the libxml2 C level.
I briefly looked at the etree.c file but given that I've not done any wrapping of 'C' code, it's kind of hard to figure out the api.
You should be looking at the etree.pyx file; the etree.c file is autogenerated by Pyrex and you shouldn't normally have to look at it at all. Pyrex allows one to write C wrappers on a higher level. Still, the libxml2 API is huge, not extensively documented, and fairly intimidating, so to figure this out would still be a bit of a task. I myself lack the time to work on it (unless I got paid to do it of course :), and DTD support is not currently high on my agenda, but if you want to discuss this and work towards patches (and tests) to support this, I'd be happy to give you lots of assistance. Regards, Martijn