lxml precaching DTD for document verification.
Stefan Behnel
stefan_ml at behnel.de
Mon Nov 28 02:38:12 EST 2011
Gelonida N, 27.11.2011 18:57:
> I'd like to verify some (x)html / / html5 / xml documents from a server.
>
> These documents have a very limited number of different doc types / DTDs.
>
> So what I would like to do is to build a small DTD cache and some code,
> that would avoid searching the DTDs over and over from the net.
>
> What would be the best way to do this?
Configure your XML catalogues.
> I guess, that
> the fields od en ElementTre, that I have to look at are
> docinfo.public_id
> docinfo.system_uri
Yes, catalogue lookups generally happen through the public ID.
> There's also mentioning af a catalogue, but I don't know how to
> use a catalog and how to know what is inside my catalogue
> and what isn't.
Does this help?
http://lxml.de/resolvers.html#xml-catalogs
http://xmlsoft.org/catalog.html
They should normally come pre-configured on Linux distributions, but you
may have to install additional packages with the respective DTDs. Look for
any packages with "dtd" and "html" in their name, for example.
Stefan
More information about the Python-list
mailing list