"encoding specified in XML declaration is incorrect"

Gustaf Liljegren gustafl at algonet.se
Thu Dec 2 08:59:14 CET 2004


I'm using xml.sax.parseString to read an XML file. The XML file contains 
a few words in Russian, and is encoded in UTF-8 using C#. In the example 
below, MyParser() is my SAX ContentHandler class. My first try was:

f = open('words.xml', 'r')
s = f.read()
xml.sax.parseString(s, MyParser())

This produced the following error:

Traceback (most recent call last):
   File "sax5.py", line 87, in ?
     xml.sax.parseString(s, MyParser())
   File "D:\Python\lib\xml\sax\__init__.py", line 49, in parseString
     parser.parse(inpsrc)
   File "D:\Python\lib\xml\sax\expatreader.py", line 107, in parse
     xmlreader.IncrementalParser.parse(self, source)
   File "D:\Python\lib\xml\sax\xmlreader.py", line 125, in parse
     self.close()
   File "D:\Python\lib\xml\sax\expatreader.py", line 218, in close
     self._cont_handler.endDocument()
   File "sax5.py", line 81, in endDocument
     f.write(header + self.all + footer)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 
745-751: ordinal not in range(128)

The XML declaration should be enough to tell the encoding. Anyway, I 
read some previous posts, and found that the unicode() function may help:

f = open('words.xml', 'r')
s = f.read()
u = unicode(s, "utf-8")
xml.sax.parseString(u, MyParser())

But I just got another error:

Traceback (most recent call last):
   File "sax5.py", line 87, in ?
     xml.sax.parseString(u, MyParser())
   File "D:\Python\lib\xml\sax\__init__.py", line 49, in parseString
     parser.parse(inpsrc)
   File "D:\Python\lib\xml\sax\expatreader.py", line 107, in parse
     xmlreader.IncrementalParser.parse(self, source)
   File "D:\Python\lib\xml\sax\xmlreader.py", line 123, in parse
     self.feed(buffer)
   File "D:\Python\lib\xml\sax\expatreader.py", line 211, in feed
     self._err_handler.fatalError(exc)
   File "D:\Python\lib\xml\sax\handler.py", line 38, in fatalError
     raise exception
xml.sax._exceptions.SAXParseException: <unknown>:1:30: encoding 
specified in XML declaration is incorrect

I see nothing wrong with my XML declaration:

<?xml version="1.0" encoding="utf-8"?>

And the file is indeed in UTF-8 (or I wouldn't be able to open it in IE 
and FF). I tried removing the BOM, but it didn't help. What more can be 
wrong?

Gustaf



More information about the Python-list mailing list