[ python-Bugs-1628902 ] xml.dom.minidom parse bug

SourceForge.net noreply at sourceforge.net
Fri Jan 5 17:37:21 CET 2007


Bugs item #1628902, was opened at 2007-01-05 17:37
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1628902&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Pierre Imbaud (pmi)
Assigned to: Nobody/Anonymous (nobody)
Summary: xml.dom.minidom parse bug

Initial Comment:
xml.dom.minidom was unable to parse an xml file that came from an example provided by an official organism.(http://www.iptc.org/IPTC4XMP)
The parsed file was somewhat hairy, but I have been able to reproduce the bug with a simplified
version, attached. (ends with .xmp: its supposed
to be an xmp file, the xmp standard being built on
xml. Well, thats the short story).

The offending part is the one that goes: xmpPLUS='....'
it triggers an exception: ValueError: too many values to unpack,
in  _parse_ns_name. Some debugging showed an obvious mistake
in the scanning of the name argument, that goes beyond the closing
" ' ".
I digged a little further thru a pdb session, but the bug seems to be located in c code.
Thats the very first time I report a bug, chances are I provide too much or too little information...
To whoever it may concern, here is the invoking code:
from xml.dom import minidom
...
class xmp(dict):
    def __init__(self, inStream):
        xmldoc = minidom.parse(inStream)
        ....

x = xmp('/home/pierre/devt/port/IPTCCore-Full/x.xmp')


traceback:
/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/xmpLib.py in __init__(self, inStream)
     26     def __init__(self, inStream):
     27         print minidom
---> 28         xmldoc = minidom.parse(inStream)
     29         xmpmeta = xmldoc.childNodes[1]
     30         rdf     = xmpmeta.childNodes[1]

/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/nxml/dom/minidom.py in parse(file, parser, bufsize)

/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/xml/dom/expatbuilder.py in parse(file, namespaces)
    922         fp = open(file, 'rb')
    923         try:
--> 924             result = builder.parseFile(fp)
    925         finally:
    926             fp.close()

/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/xml/dom/expatbuilder.py in parseFile(self, file)
    205                 if not buffer:
    206                     break
--> 207                 parser.Parse(buffer, 0)
    208                 if first_buffer and self.document.documentElement:
    209                     self._setup_subset(buffer)

/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/xml/dom/expatbuilder.py in start_element_handler(self, name, attributes)
    743     def start_element_handler(self, name, attributes):
    744         if ' ' in name:
--> 745             uri, localname, prefix, qname = _parse_ns_name(self, name)
    746         else:
    747             uri = EMPTY_NAMESPACE
/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/xml/dom/expatbuilder.py in _parse_ns_name(builder, name)
    125         localname = intern(localname, localname)
    126     else:
--> 127         uri, localname = parts
    128         prefix = EMPTY_PREFIX
    129         qname = localname = intern(localname, localname)

ValueError: too many values to unpack

The offending c statement:
/usr/src/packages/BUILD/Python-2.4/Modules/pyexpat.c(582)StartElement()
The returned 'name':
(Pdb) name
Out[5]: u'XMP Photographic Licensing Universal System (xmpPLUS, http://ns.adobe.com/xap/1.0/PLUS/) CreditLineReq xmpPLUS'
Its obvious the scanning went beyond the attribute.





----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1628902&group_id=5470


More information about the Python-bugs-list mailing list