Validating XML DOM parser with PyXML (0.7.1)

Martin v. Löwis loewis at informatik.hu-berlin.de
Fri May 24 13:29:00 EDT 2002


"Gillou" <nospam at bigfoot.com> writes:

> * Returned objects should have the xml.dom.minidom (like) API.

Then I recommend that you create a SAX parser (e.g. through
xml.sax.sax2ext.XMLValParserFactory, or by directly specifying
"xml.sax.drivers2.drv_xmlproc" to make_parser).

With the SAX parser, you build a 4DOM tree (using
xml.dom.ext.reader.Sax2).

> * I want to "plug in" my own DTD in the document to validate XML without
> doctype declaration.

This *should* work by specifying a SAX entity handler, but you will
need to experiment.

I suggest that you get to work the rest of this first, and change your
solution to incorporate that feature afterwards.

> * Register my handlers for parser errors.  

You need to set an error handler with the SAX parser.

>* I need to get original encoding from xml declaration <?xml
>version="1.0" > encoding="whatsthat"?>

You will need to look at the input_encoding attribute of the xmlproc
parser when parsing is done.

Notice that, in presence of external entities, different parts of the
document may have different encodings - so "the original encoding" may
not be a meaningful term.

> Can someone post me some sample that does this or give me a "good"
> howto URL.

Sample code that does exactly this is not available, as your
requirements are quite specific.

I suggest that you, instead, post the fragments that you have (or will
come up with), and ask specific questions about those.

Regards,
Martin



More information about the Python-list mailing list