[issue2124] xml.sax and xml.dom fetch DTDs by default

Jean-Paul Calderone report at bugs.python.org
Tue Feb 3 22:40:15 CET 2009

Jean-Paul Calderone <exarkun at divmod.com> added the comment:

Though it's inconvenient to do so, you can arrange to have the locator
available from the entity resolver.  The content handler's
setDocumentLocator method will be called early on with the locator
object.  So you can give your entity resolver a reference to your
content handler and save a reference to the document locator in the
content handler.  Then in the entity resolver's resolveEntity method you
can reach over into the content handler and grab the document locator to
call its getSystemId method.

Note that you have to be careful with the InputStreams you return from
resolveEntity.  I wasn't aware of this before (and perhaps I've
misinterpreted some observer), but I just noticed that if you return an
InputSource based on a file object, the file object's name will be used
as the document id!  This is quite not what you want.  InputStream has a
setSystemId method, but even if you call it before you call
setByteStream, the system id will be the name of the file object passed
to setByteStream.  Perhaps calling these two methods in the opposite
order will fix this, I'm not sure, I haven't tried.

Python tracker <report at bugs.python.org>

More information about the Python-bugs-list mailing list