lxml precaching DTD for document verification.

Gelonida N gelonida at gmail.com
Mon Nov 28 05:15:58 EST 2011


Thanks Stefan,


On 11/28/2011 08:38 AM, Stefan Behnel wrote:
> Gelonida N, 27.11.2011 18:57:
>> I'd like to verify some (x)html / / html5 / xml documents from a server.
>>
>> These documents have a very limited number of different doc types / DTDs.
>>
>> So what I would like to do is to build a small DTD cache and some code,
>> that would avoid searching the DTDs over and over from the net.
>>
>> What would be the best way to do this?
> 
> Configure your XML catalogues.
. . .
> 
> Yes, catalogue lookups generally happen through the public ID.
> 
> . . . 
> Does this help?
> 
> http://lxml.de/resolvers.html#xml-catalogs
> 
> http://xmlsoft.org/catalog.html

These links look perfect.
> 
> They should normally come pre-configured on Linux distributions, but you
> may have to install additional packages with the respective DTDs. Look
> for any packages with "dtd" and "html" in their name, for example.
> 
Thanks once more.
Indeed the package w3c-dtd-xhtml wasn't installed on my Ubuntu host.

I'll check this lateron today.
(Just have to remove my own hackish cashing solution, which downloads if
not found in a cash dir)






More information about the Python-list mailing list