[issue2124] xml.sax and xml.dom fetch DTDs by default

ajaksu report at bugs.python.org
Mon Feb 18 00:58:11 CET 2008


ajaksu added the comment:

Martin, I agree that simply not resolving DTDs is an unreasonable
request (and said so in the blog post). But IMHO there are lots of
possible optimizations, and the most valuable would be those darn easy
for newcomers to understand and use.

In Python, a winning combo would be an arbitrary (and explicit) FS
"dtdcache" that people could use with simple a drop-in import (from a
third-party module?). Perhaps the cache lives in a pickled dictionary
with IDs, checksums and DTDs. Could also be a sqlite DB, if updating the
dict becomes problematic.

In that scenario, AMK could save latter W3C hits with:

from xml.sax import make_parser
from dtdcache.sax.saxutils import prepare_input_source # <- dtdcache
parser = make_parser()
inp = prepare_input_source('file:file.xhtml', cache="/tmp/xmlcache")

It might be interesting to have read-only, force-write and read-write
modes. Not sure how to map that on EntityResolver and DTD consumers (I'm
no XML user myself).

Regarding the std-lib, I believe effective caching hooks for DTDs trump
implementing in-memory or sqlite/FS. IMNSHO, correct, accessible support
for catalogs shouldn't be the only change, as caching should give better
performance on both ends.

----------
nosy: +ajaksu2

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue2124>
__________________________________


More information about the Python-bugs-list mailing list