[XML-SIG] [BUG] xml.dom.minidom on Windows.
Mark Bucciarelli
mark@easymailings.com
Wed, 9 Apr 2003 12:18:32 -0400
--Boundary-00=_YfEl+DvTdBjt+Ug
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
The attached code snippet produces peculiar results on Windows.
Namely, it truncates text node data when there is only one grandchild
tag.
This error occurs on Windows XP, with Python 2.2.2 (#37, Oct 14, 2002,
17:02:34) [MSC 32 bit (Intel)] but not on Linux with Python 2.2.1
(#1, Aug 30 2002, 12:15:30) [GCC 3.2 20020822 (Red Hat Linux Rawhide
3.2-4)]
The output I get on Windows is shown below. The test script includes
a very similar text string that works correctly. What's kind of fun
is that if you start to reduce the length of the second granchild
tag, from "abcdef" to "abcde" and then to "abcd" at some point the
"works" example will fail as well. The shorter you make it the more
trunctated the nodeValue of the text node becomes.
I'll log this at sourceforge. In the meantime, can anyone suggest a
work around?
XML: <REQUEST><TYPE>AUTHORIZATION</TYPE><abcdef>123</abcdef></REQUEST>
Node: <TYPE>AUTHORIZATION</TYPE>
xml : AUTHORIZATION
data: "AUTHORIZATION"
Node: <abcdef>123</abcdef>
xml : 123
data: "123"
XML: <REQUEST><TYPE>AUTHORIZATION</TYPE></REQUEST>
Node: <TYPE>AUTHORIZATION</TYPE>
xml : AUTHORI
data: "AUTHORI"
Traceback (most recent call last):
File "testparse.py", line 40, in ?
assert does_not_work['TYPE'] == 'AUTHORIZATION',
does_not_work['TYPE']
AssertionError: AUTHORI
--Boundary-00=_YfEl+DvTdBjt+Ug
Content-Type: text/x-python;
charset="us-ascii";
name="testparse.py"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="testparse.py"
import xml.dom.minidom
def parseTwoDeep(strXml):
retval = {}
document = xml.dom.minidom.parseString(strXml)
rootnode = document.firstChild
for n in rootnode.childNodes:
if n.nodeType == xml.dom.Node.ELEMENT_NODE:
retval = parseOneDeep(n.toxml())
document.unlink()
return retval
def parseOneDeep(strXml):
retval = {}
document = xml.dom.minidom.parseString(strXml)
rootnode = document.firstChild
print 'XML:', strXml
for n in rootnode.childNodes:
if n.nodeType == xml.dom.Node.ELEMENT_NODE:
print ' Node:', n.toxml()
if n.hasChildNodes() and n.firstChild.nodeType == n.TEXT_NODE:
print ' xml :', n.firstChild.toxml()
print ' data: "%s"' % n.firstChild.nodeValue
retval[n.localName] = n.firstChild.nodeValue.strip()
else:
retval[n.localName] = ''
document.unlink()
return retval
works = parseTwoDeep('<ROOT><REQUEST><TYPE>AUTHORIZATION</TYPE><abcdef>123</abcdef></REQUEST></ROOT>')
assert works['TYPE'] == 'AUTHORIZATION', works['TYPE']
does_not_work = parseTwoDeep('<ROOT><REQUEST><TYPE>AUTHORIZATION</TYPE></REQUEST></ROOT>')
assert does_not_work['TYPE'] == 'AUTHORIZATION', does_not_work['TYPE']
--Boundary-00=_YfEl+DvTdBjt+Ug--