lxml precaching DTD for document verification.
Gelonida N
gelonida at gmail.com
Mon Nov 28 05:15:58 EST 2011
Thanks Stefan,
On 11/28/2011 08:38 AM, Stefan Behnel wrote:
> Gelonida N, 27.11.2011 18:57:
>> I'd like to verify some (x)html / / html5 / xml documents from a server.
>>
>> These documents have a very limited number of different doc types / DTDs.
>>
>> So what I would like to do is to build a small DTD cache and some code,
>> that would avoid searching the DTDs over and over from the net.
>>
>> What would be the best way to do this?
>
> Configure your XML catalogues.
. . .
>
> Yes, catalogue lookups generally happen through the public ID.
>
> . . .
> Does this help?
>
> http://lxml.de/resolvers.html#xml-catalogs
>
> http://xmlsoft.org/catalog.html
These links look perfect.
>
> They should normally come pre-configured on Linux distributions, but you
> may have to install additional packages with the respective DTDs. Look
> for any packages with "dtd" and "html" in their name, for example.
>
Thanks once more.
Indeed the package w3c-dtd-xhtml wasn't installed on my Ubuntu host.
I'll check this lateron today.
(Just have to remove my own hackish cashing solution, which downloads if
not found in a cash dir)
More information about the Python-list
mailing list