[XML-SIG] DOCTYPE problem loading XML file.

Brendon Costa brendon at christian.net
Sat Apr 14 13:00:19 CEST 2007

Hi all,

I have a manual i am writing for a project I have been developing in
docbook format. This manual contains "programlisting" nodes that show
output generated from some scripts.

I want to write a small application using python XML libraries that will
load this docbook file and for each programlisting node with an id that
starts with script_... i want to execute the script ... and replace the
programlisting nodes value with the resulting output.

Firstly does anyone know of an existing tool that could do this for me
(I haven't been successful in finding one)?

Otherwise i have been trying to create my own tool in python. The first
stage which is loading the docbook XML file into python using the DOM
parser. This is my first time dealing with python and XML.

The code is so far VERY simple:

import sys
from xml.dom.ext.reader import Sax2
reader = Sax2.Reader()
doc = reader.fromStream(sys.argv[1])

Running that using:
python update_docbook.py manual.xml

fails to load the manual.xml file. The XML file has a DOCTYPE. Now for
my needs in modifying the document is don't care about the DOCTYPE, i
just want to keep it intact as it is. Is there any way to tell the DOM
parser that i don't care about the DOCTYPE?

If this is not possible, following are the errors i get trying to load
the docbook xml file.

Firstly without a DTD available at all:
ValueError: unknown url type: docbookx.dtd

If i then copy across my DTD data into the current directory (DOCTYPE
references a file in the current directory at the moment to avoid having
to go to the internet all the time) it seems to find it as i would
expect, but there are still other errors:
xml.Sax._exceptions.SAXParseException: dbnotnx.mod:60:80: error in
processing external entity reference

and if i change the doctype back to the correct URL, i get the same
error but:
http://www.oasis-open.org/docbook/xml/4.5/dbnotnx.mod:60:80: error in
processing external entity reference

So how would i go about loading this docbook xml file in python using
DOM so i can then manipulate it? Would you recommend that i change to
use a Sax parser and if so can it be used to ignore the DOCTYPE?

Thanks for any info.

