[ python-Bugs-1627096 ] xml.dom.minidom parse bug

Thu Jan 4 12:18:18 CET 2007

Bugs item #1627096, was opened at 2007-01-03 17:06
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1627096&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
>Status: Closed
>Resolution: Invalid
Priority: 5
Private: No
Submitted By: Pierre Imbaud (pmi)
Assigned to: Nobody/Anonymous (nobody)
Summary: xml.dom.minidom parse bug

Initial Comment:
xml.dom.minidom was unable to parse an xml file that came from an example provided by an official organism.(http://www.iptc.org/IPTC4XMP)
The parsed file was somewhat hairy, but I have been able to reproduce the bug with a simplified
version, attached. (ends with .xmp: its supposed
to be an xmp file, the xmp standard being built on
xml. Well, thats the short story).

The offending part is the one that goes: xmpPLUS='....'
it triggers an exception: ValueError: too many values to unpack,
in  _parse_ns_name. Some debugging showed an obvious mistake
in the scanning of the name argument, that goes beyond the closing
" ' ".
I digged a little further thru a pdb session, but the bug seems to be located in c code.
Thats the very first time I report a bug, chances are I provide too much or too little information...
To whoever it may concern, here is the invoking code:
from xml.dom import minidom
...
class xmp(dict):
    def __init__(self, inStream):
        xmldoc = minidom.parse(inStream)
        ....

x = xmp('/home/pierre/devt/port/IPTCCore-Full/x.xmp')

traceback:
/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/xmpLib.py in __init__(self, inStream)
     26     def __init__(self, inStream):
     27         print minidom
---> 28         xmldoc = minidom.parse(inStream)
     29         xmpmeta = xmldoc.childNodes[1]
     30         rdf     = xmpmeta.childNodes[1]

/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/nxml/dom/minidom.py in parse(file, parser, bufsize)

/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/xml/dom/expatbuilder.py in parse(file, namespaces)
    922         fp = open(file, 'rb')
    923         try:
--> 924             result = builder.parseFile(fp)
    925         finally:
    926             fp.close()

/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/xml/dom/expatbuilder.py in parseFile(self, file)
    205                 if not buffer:
    206                     break
--> 207                 parser.Parse(buffer, 0)
    208                 if first_buffer and self.document.documentElement:
    209                     self._setup_subset(buffer)

/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/xml/dom/expatbuilder.py in start_element_handler(self, name, attributes)
    743     def start_element_handler(self, name, attributes):
    744         if ' ' in name:
--> 745             uri, localname, prefix, qname = _parse_ns_name(self, name)
    746         else:
    747             uri = EMPTY_NAMESPACE
/home/pierre/devt/fileInfo/svnRep/branches/xml/xmpLib/xml/dom/expatbuilder.py in _parse_ns_name(builder, name)
    125         localname = intern(localname, localname)
    126     else:
--> 127         uri, localname = parts
    128         prefix = EMPTY_PREFIX
    129         qname = localname = intern(localname, localname)

ValueError: too many values to unpack

The offending c statement:
/usr/src/packages/BUILD/Python-2.4/Modules/pyexpat.c(582)StartElement()
The returned 'name':
(Pdb) name
Out[5]: u'XMP Photographic Licensing Universal System (xmpPLUS, http://ns.adobe.com/xap/1.0/PLUS/) CreditLineReq xmpPLUS'
Its obvious the scanning went beyond the attribute.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2007-01-04 12:18

Message:
Logged In: YES 
user_id=21627
Originator: NO

This is not a bug in Python, but a bug in the XML document. According to
section 2.1 of

http://www.w3.org/TR/2006/REC-xml-names-20060816/

an XML namespace must be an URI reference; according to RFC 3986, the
string "XMP Photographic Licensing Universal System (xmpPLUS,
http://ns.adobe.com/xap/1.0/PLUS/)" is not an URI reference, as it
contains spaces.

Closing this report as invalid.

If you want to work around this bug, you can parse the file in
non-namespace mode, using

xml.dom.expatbuilder.parse("/tmp/x.xmp", namespaces=False)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1627096&group_id=5470