[XML-SIG] While we're on the subject of xmlproc, DTDs and validation ...

Jim Fulton jim@digicool.com
Tue, 08 Jun 1999 18:28:54 +0000


Some musings....

I'd like to have a very fast and simple parser that can do 
validation.  I'm looking at:

  - Using (or stealing parts of) xmlproc to parse DTDs,

  - Using pyexpat,

  - Writing a C thing that does the validation using 
    data structures (possibly derived from data structures)
    produced by xmlproc.

  - Writing a simple C thing that plugs into the C
    validator, which plugs into pyexpat and takes tables of
    start and end tag handlers and processes XML to produce
    Python objects.

I've modified pyexpat so that it will spit out the DTD info.
(I plan to post an updated pyexpat that implements the full
C expat interface defined in the latest stable expat release,
unless someone beats me to it. ;)

I find that if I tell xmlproc to parse a file containing only
a DTD, it will build the DTD related data structures for me, but:

  - I wonder if there is or should be a tool designed
    just to do this.  Maybe there already is one that I've
    missed.

  - Can I rely on the data structures created by the current
    xmlproc?

I'd like to have a tool for processing DTDs independent of
parsing XML:

  - To make it possible to bolt validation onto non-validating
    parsers,

  - To separate implementation of validation from implementation
    of basic parsing and from application object building code.
    For example, I think handlers that build application objects
    can be alot simpler if they don't have to check validity.

  - Allow applications to provide DTDs for documents that don't
    have them (e.g. xml-rpc marchals).

Thoughts?

Jim

--
Jim Fulton           mailto:jim@digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.