SAXParseException: not well-formed (invalid token)
Stefan Behnel
stefan.behnel-n05pAM at web.de
Thu Aug 30 08:37:10 EDT 2007
Pablo Rey wrote:
> I am getting the following error with a XML page:
>
>> File "/home/prey/RAL-CESGA/bin/voms2users/voms2users.py", line 69,
>> in getItems
>> d = minidom.parseString(xml.read())
>> File "/usr/lib/python2.2/site-packages/_xmlplus/dom/minidom.py",
>> line 967, in parseString
>> return _doparse(pulldom.parseString, args, kwargs)
>> File "/usr/lib/python2.2/site-packages/_xmlplus/dom/minidom.py",
>> line 954, in _doparse
>> toktype, rootNode = events.getEvent()
>> File "/usr/lib/python2.2/site-packages/_xmlplus/dom/pulldom.py",
>> line 265, in getEvent
>> self.parser.feed(buf)
>> File "/usr/lib/python2.2/site-packages/_xmlplus/sax/expatreader.py",
>> line 208, in feed
>> self._err_handler.fatalError(exc)
>> File "/usr/lib/python2.2/site-packages/_xmlplus/sax/handler.py",
>> line 38, in fatalError
>> raise exception
>> xml.sax._exceptions.SAXParseException: <unknown>:553:48: not
>> well-formed (invalid token)
>
>
>> def getItems(page):
>> opener =urllib.URLopener(key_file=HOSTKEY,cert_file=HOSTCERT) ;
>> try:
>> xml = opener.open(page)
>> except:
>> return []
>>
>> d = minidom.parseString(xml.read())
>> items = d.getElementsByTagName('item')
>> data = []
>> for i in items:
>> data.append(getText(i.childNodes))
>>
>> return data
>
> The page is
> https://lcg-voms.cern.ch:8443/voms/cms/services/VOMSCompatibility?method=getGridmapUsers
> and the line with the invalid character is (the invalid character is the
> final é of Université):
>
> <item>/C=BE/O=BEGRID/OU=Physique/OU=Univesité Catholique de
> Louvain/CN=Roberfroid</item>
>
>
> I have tried several options but I am not able to avoid this
> problem. Any idea?.
Looks like the page is not well-formed XML (i.e. not XML at all). If it
doesn't specify an encoding (<?xml encoding="..."?>), you can try recoding the
input, possibly decoding it from latin-1 and re-encoding it as UTF-8 before
passing it to the SAX parser.
Alternatively, tell the page authors to fix their page.
Stefan
More information about the Python-list
mailing list