[XML-SIG] Python XML docs / questions

Lars Marius Garshol larsga@garshol.priv.no
19 Jun 2000 18:09:48 +0200

Hi Greg,

* Greg Wilson
| Hi, everyone. I'm playing with the Python SAX library, and have a
| couple of questions.  I'd be happy to turn their answers into
| contributions to the docs, if you think that it's worth adding to
| the SAX-1 docs at this point.  (Alternatively, I'd be happy to help
| with SAX-2 docs if that would be more useful.)

At this stage contributions to the SAX 2 docs would definitely be the
most useful thing to have.
| First, is there a standard 'EntityResolver' in the library that will
| handle or define all of the basics HTML entities, such as < and
|   (or is there an example of how to create such a beast)?

This can't be done, because the EntityResolver is only for external
entites, not for internal ones like the HTML character entities. And
in any case, the XML parser should take care of those for you when it
reads the DTD.

In SAX 2.0 you can deal with entities skipped by the SAX parser by
overriding the skippedEntity callback which should be fired by XML
parsers that do not read the external DTD subset and thus haven't seen
the definitions for the internal entities. Unfortunately, I can't see
any way to implement it with pyexpat. (It is implemented for xmllib in
my private CVS tree, and not needed for xmlproc.)
| Second, is there a way to access the current document location (line
| and column number) from within the handler, for tracing/debugging
| purposes?  The home page for SAX talks about a 'Locator' interface,
| but I can't find hooks for this in the Python version.

The ContentHandler has a setDocumentLocator callback that the parser
calls to give the application a Locator. This exists in the Python
version as well and is implemented by all the drivers. Using it should
be straightforward.

--Lars M.