[XML-SIG] [Baypiggies] News flash: Python possibly guilty in excessive DTD traffic

"Martin v. Löwis" martin at v.loewis.de
Sun Feb 24 06:54:14 CET 2008


> I think this is worth filing a bug, but I'd like to understand better
> where the call is made. I can't find any places in the standard xml
> package that does this -- but I'm not all that familiar with the code.
> Do you know if it's in the base xml package, or in etree, or in the
> separately distributed "XMLplus"? Any details you have would be
> appreciated (like a traceback from the point where the call is made).

In case you didn't get an answer yet: I don't know about the OP's
stack trace, but the standard library accesses the internet in
xml.sax.saxutils.prepare_input_source, which in turn may be called
from xml.sax.expatreader.ExpatParser.external_entity_ref (unless
the feature_external_ges is off). That, in turn, is called by the
parser when it sees the DOCTYPE declaration.

The OP was referring to validation, so more likely he was talking
about the xmlproc parser (which is only in PyXML).

I also agree with Mike Brown: The author of this W3C article apparently
confuses a number of things, in particular whether an XML parser
*should* fetch the SYSTEM identifier in a document. According to the
XML spec, it should indeed. Now, the other question is whether there
should be caching; and yes, there should be, and no caching is
implemented (except in xmlproc, which supports catalogs). As for
accessing URLs that are used as namespace URIs: our XML libraries
never do that.

In any case, AMK created issue2124.

Regards,
Martin


More information about the XML-SIG mailing list