[XML-SIG] Re: While we're on the subject of xmlproc, DTDs and validation ...

Jim Fulton jim@digicool.com
Wed, 09 Jun 1999 01:25:45 +0000


Lars Marius Garshol wrote:
> 
> * Jim Fulton
> |
> | I'd like to have a very fast and simple parser that can do
> | validation.
> 
> Hmmm. Maybe a better option than what you've been looking at would be
> RXP, which is an all-C validating parser.
> 
> <URL: http://www.cogsci.ed.ac.uk/~richard/rxp.html>

I'll check it out.  I'm a little bit worried about the license, 
which is GPL. Maybe I can get him to change it to LGPL.

> It's a little bit slower than expat, but that should drown in the time
> occupied by the Python callbacks anyway.

True, although for alot of our projects, we'll probably write 
many (most?) of the callbacks in C.
 
> I've been thinking about writing a Python interface to RXP, but am not
> really into C extensions yet and haven't got the time at the moment.
> 
> |   - Using (or stealing parts of) xmlproc to parse DTDs,
> 
> This is easily possible, and it will buy you some performance,
> although probably not as much as you'd wish. (Especially for large
> DTDs xmlproc is slow.)

In Most cases, I'd expect to amortize DTD parsing over many documents, 
either by preprocessing standard DTDs or catching DTDs.
 
(snip)
 
> | I find that if I tell xmlproc to parse a file containing only a DTD,
> | it will build the DTD related data structures for me, but:
> |
> |   - I wonder if there is or should be a tool designed
> |     just to do this.  Maybe there already is one that I've
> |     missed.
> 
> xmlproc comes with a dtdparser.py module which gives you an
> event-based interface to DTDs. Combined with the classes in xmldtd.py
> this gives you the ability to parse a DTD without an associated
> document.

I suspected this, but I had trouble figuring out the interface.

> Look in the demo directory for dtddoc.py, which is an
> example of this.

Ah, thanks.  That should help alot.
 
> |   - Can I rely on the data structures created by the current
> |     xmlproc?
> 
> Sorry, I don't understand the question. What do you mean by 'rely'?

I'll write something that takes as input the data structures
created internally whan xmlproc parses a document.  If you change 
those data structures, my software will break. :)
 
> | I'd like to have a tool for processing DTDs independent of
> | parsing XML:
> |
> | [excellent reasons snipped]
> 
> Yup. These were all part of my motivation for making the DTD parsing
> module of xmlproc separate from the rest.

Cool.

Jim

--
Jim Fulton           mailto:jim@digicool.com
Technical Director   (888) 344-4332              Python Powered!
Digital Creations    http://www.digicool.com     http://www.python.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.