[XML-SIG] Setting the DOCTYPE in a new XML DOM
Sylvain Thenault
Sylvain.Thenault@logilab.fr
Wed, 16 Jan 2002 09:37:09 +0100 (CET)
On 15 Jan 2002, Douglas Bates wrote:
> Sylvain Thenault <Sylvain.Thenault@logilab.fr> writes:
> > On 11 Jan 2002, Douglas Bates wrote:
> > > I have been unable to determine how to set the SYSTEM in the doctype
> > > of a document read by the PyExpat reader. I am rather new to this so
> > > it is possible that I am doing something foolish. I have mostly been
> > > following demo's and examples as I haven't been able to track down a
> > > lot of documentation. A sample program is
> > >
> > > #!/usr/bin/env python2.2
> > >
> > > from xml.dom.ext.reader.PyExpat import Reader
> > > from xml.dom.ext import PrettyPrint
> > >
> > > if __name__ == "__main__":
> > > reader = Reader()
> > > doc = reader.fromUri("/tmp/foo1.xml")
> > > PrettyPrint(doc)
> > >
> > > The file /tmp/foo1.xml begins
> > >
> > > <?xml version="1.0"?>
> > > <!DOCTYPE booklist SYSTEM "file:////home/deepayan/python/book.dtd">
> > > <booklist>
> > > <book>
> > >
> > > but the output file begins
> > >
> > > <?xml version='1.0' encoding='UTF-8'?>
> > > <!DOCTYPE booklist>
> > > <booklist>
> > > <book>
> > >
> > > Can anyone tell me what I do to maintain the SYSTEM designation?
> >
> > this is a bug in pyexpat. It should work if you use xmlproc instead of
> > pyexpat to generate your dom tree.
>
> Could you tell me how I would use xmlproc to create a reader or to
> somehow load and XML from a URI? Is there a Reader class that uses
> xmlproc? I couldn't see one when I looked through the libraries.
here is an example:
---------------------------------------------------------
from xml.dom.ext.reader import Sax2
from xml.sax import make_parser
parser = make_parser(['xml.sax.drivers2.drv_xmlproc'])
reader = Sax2.Reader(parser=parser)
---------------------------------------------------------
by default (without the "parser" argument), Sax and Sax2 readers will use
xmlproc when you ask for a validating parser (in other cases, they try to
use the faster parser, say pyexpat or sgmlop which are Python wrapper for
C libraries).
I have tried this with your example and it works correctly (doesn't loose
the doctype node :)
cheers
--
Sylvain Thenault
LOGILAB http://www.logilab.org