[XML-SIG] Re: While we're on the subject of xmlproc, DTDs and validation ...

Lars Marius Garshol larsga@ifi.uio.no
09 Jun 1999 00:26:53 +0200


* Jim Fulton
| 
| I'd like to have a very fast and simple parser that can do
| validation.

Hmmm. Maybe a better option than what you've been looking at would be
RXP, which is an all-C validating parser. 

<URL: http://www.cogsci.ed.ac.uk/~richard/rxp.html>

It's a little bit slower than expat, but that should drown in the time
occupied by the Python callbacks anyway. 

I've been thinking about writing a Python interface to RXP, but am not
really into C extensions yet and haven't got the time at the moment.

|   - Using (or stealing parts of) xmlproc to parse DTDs,

This is easily possible, and it will buy you some performance,
although probably not as much as you'd wish. (Especially for large
DTDs xmlproc is slow.)
 
| (I plan to post an updated pyexpat that implements the full
| C expat interface defined in the latest stable expat release,
| unless someone beats me to it. ;)

Great! When you do I'll update the SAX driver.
 
| I find that if I tell xmlproc to parse a file containing only a DTD,
| it will build the DTD related data structures for me, but:
| 
|   - I wonder if there is or should be a tool designed
|     just to do this.  Maybe there already is one that I've
|     missed.

xmlproc comes with a dtdparser.py module which gives you an
event-based interface to DTDs. Combined with the classes in xmldtd.py
this gives you the ability to parse a DTD without an associated
document. Look in the demo directory for dtddoc.py, which is an
example of this.
 
|   - Can I rely on the data structures created by the current
|     xmlproc?

Sorry, I don't understand the question. What do you mean by 'rely'?
 
| I'd like to have a tool for processing DTDs independent of
| parsing XML:
|
| [excellent reasons snipped]

Yup. These were all part of my motivation for making the DTD parsing
module of xmlproc separate from the rest.

--Lars M.