Validating XML DOM parser with PyXML (0.7.1)
Martin v. Löwis
loewis at informatik.hu-berlin.de
Fri May 24 13:29:00 EDT 2002
"Gillou" <nospam at bigfoot.com> writes:
> * Returned objects should have the xml.dom.minidom (like) API.
Then I recommend that you create a SAX parser (e.g. through
xml.sax.sax2ext.XMLValParserFactory, or by directly specifying
"xml.sax.drivers2.drv_xmlproc" to make_parser).
With the SAX parser, you build a 4DOM tree (using
xml.dom.ext.reader.Sax2).
> * I want to "plug in" my own DTD in the document to validate XML without
> doctype declaration.
This *should* work by specifying a SAX entity handler, but you will
need to experiment.
I suggest that you get to work the rest of this first, and change your
solution to incorporate that feature afterwards.
> * Register my handlers for parser errors.
You need to set an error handler with the SAX parser.
>* I need to get original encoding from xml declaration <?xml
>version="1.0" > encoding="whatsthat"?>
You will need to look at the input_encoding attribute of the xmlproc
parser when parsing is done.
Notice that, in presence of external entities, different parts of the
document may have different encodings - so "the original encoding" may
not be a meaningful term.
> Can someone post me some sample that does this or give me a "good"
> howto URL.
Sample code that does exactly this is not available, as your
requirements are quite specific.
I suggest that you, instead, post the fragments that you have (or will
come up with), and ask specific questions about those.
Regards,
Martin
More information about the Python-list
mailing list