[XML-SIG] Bug in expatreader module...
Achim Gaedke
Achim.Gaedk@zpr.uni-koeln.de
Sat, 16 Jun 2001 16:12:06 +0200
(This is the same topic as in comp.lang.python)
Hello everybody!
My intention is to write a recursive parser for nested data structures.
In order to collect the data it is necessary to switch the
contenthandler each step.
This does NOT work for the character handler: This is my (lean) test
program:
import xml.sax.handler
parser=3Dxml.sax.make_parser()
class second_ch(xml.sax.handler.ContentHandler):
def startElement(self,name,attrs):
print "start second"
def endElement(self,name):
print "end second"
def characters(self,content):
print "second: ",content.strip()
class first_ch(xml.sax.handler.ContentHandler):
def startElement(self,name,attrs):
print "start first"
self.second=3Dsecond_ch()
parser.setContentHandler(self.second)
def endElement(self,name):
print "end first"
def characters(self,content):
print "first: ",content.strip()
first=3Dfirst_ch()
parser.setContentHandler(first)
parser.parse('members.xml')
and this is the xml file members.xml:
<?xml version=3D"1.0"?>
<a>a1<b>b1</b>a2</a>
more is not necessary. This is the output with python2.0 and
expat-1.95.2
python2.0 xml_test.py
start first
first: a1
start second
first: b1
end second
first: a2
end second
After the first line the second content handler should get the
characters!
The second test is with python2.1 and expat1_1:
python2.1 xml_test.py
start first
first: a1
start second
first: b1
end second
first: a2
end second
the result is the same. What a pity.
In expat reference it is stated, that changing of handler is possible
and
expected.
I am running Redhat Linux 7.1 with self built python interpreters.
Ok, here is a workaround, I found in order to go on coding:
Add the following line after each
parser.setContenHandler(new_handler)
for the missing functionality of parser.setContentHandler():
parser._parser.CharacterDataHandler=3Dnew_handler.characters
This line is taken from xml.sax.expatreader.ExpatReader.reset()
This does work after parser.reset() or after parser.parse(...) is
called once (because parse() calls reset()).
I think this error should be corrected somewhere else and not in my
code, but I don't know where and I hope that I will address maintainers
here, too.
Please let me know when this bug is fixed.
Yours
Achim
--
Achim Gaedke, ZPR
Weyertal 80, 50931 K=F6ln
Tel: +49 221 470 6021