[XML-SIG] [BUG] xml.dom.minidom on Windows.

Mark Bucciarelli mark@easymailings.com
Wed, 9 Apr 2003 12:18:32 -0400


--Boundary-00=_YfEl+DvTdBjt+Ug
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

The attached code snippet produces peculiar results on Windows.  
Namely, it truncates text node data when there is only one grandchild 
tag.

This error occurs on Windows XP, with Python 2.2.2 (#37, Oct 14, 2002, 
17:02:34) [MSC 32 bit (Intel)]  but not on Linux with Python 2.2.1 
(#1, Aug 30 2002, 12:15:30) [GCC 3.2 20020822 (Red Hat Linux Rawhide 
3.2-4)]

The output I get on Windows is shown below.  The test script includes 
a very similar text string that works correctly.  What's kind of fun 
is that if you start to reduce the length of the second granchild 
tag, from "abcdef" to "abcde" and then to "abcd" at some point the 
"works" example will fail as well.  The shorter you make it the more 
trunctated the nodeValue of the text node becomes.

I'll log this at sourceforge.  In the meantime, can anyone suggest a 
work around?

XML: <REQUEST><TYPE>AUTHORIZATION</TYPE><abcdef>123</abcdef></REQUEST>
  Node: <TYPE>AUTHORIZATION</TYPE>
    xml : AUTHORIZATION
    data: "AUTHORIZATION"
  Node: <abcdef>123</abcdef>
    xml : 123
    data: "123"
XML: <REQUEST><TYPE>AUTHORIZATION</TYPE></REQUEST>
  Node: <TYPE>AUTHORIZATION</TYPE>
    xml : AUTHORI
    data: "AUTHORI"
Traceback (most recent call last):
  File "testparse.py", line 40, in ?
    assert does_not_work['TYPE'] == 'AUTHORIZATION', 
does_not_work['TYPE']
AssertionError: AUTHORI

--Boundary-00=_YfEl+DvTdBjt+Ug
Content-Type: text/x-python;
  charset="us-ascii";
  name="testparse.py"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="testparse.py"

import xml.dom.minidom

def parseTwoDeep(strXml):

    retval = {}
    document = xml.dom.minidom.parseString(strXml)
    rootnode = document.firstChild

    for n in rootnode.childNodes:
        if n.nodeType == xml.dom.Node.ELEMENT_NODE:
            retval = parseOneDeep(n.toxml())
    document.unlink()
    return retval

def parseOneDeep(strXml):

    retval = {}
    document = xml.dom.minidom.parseString(strXml)
    rootnode = document.firstChild

    print 'XML:', strXml

    for n in rootnode.childNodes:
        if n.nodeType == xml.dom.Node.ELEMENT_NODE:
            print '  Node:', n.toxml()
            if n.hasChildNodes() and n.firstChild.nodeType == n.TEXT_NODE:
                print '    xml :', n.firstChild.toxml()
                print '    data: "%s"' % n.firstChild.nodeValue
                retval[n.localName] = n.firstChild.nodeValue.strip()
            else:
                retval[n.localName] = ''
    document.unlink()
    
    return retval

works = parseTwoDeep('<ROOT><REQUEST><TYPE>AUTHORIZATION</TYPE><abcdef>123</abcdef></REQUEST></ROOT>')
assert works['TYPE'] == 'AUTHORIZATION', works['TYPE']

does_not_work = parseTwoDeep('<ROOT><REQUEST><TYPE>AUTHORIZATION</TYPE></REQUEST></ROOT>')
assert does_not_work['TYPE'] == 'AUTHORIZATION', does_not_work['TYPE']

--Boundary-00=_YfEl+DvTdBjt+Ug--