SAX unicode and ascii parsing problem

goldtech goldtech at worldpost.com
Tue Nov 30 21:43:55 CET 2010


Hi,

I'm trying to parse an xml file using SAX. About half-way through a
file I get this error:

Traceback (most recent call last):
  File "C:\Python26\Lib\site-packages\pythonwin\pywin\framework
\scriptutils.py", line 325, in RunScript
    exec codeObject in __main__.__dict__
  File "E:\sc\b2.py", line 58, in <module>
    parser.parse(open(r'ppb5.xml'))
  File "C:\Python26\Lib\xml\sax\expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "C:\Python26\Lib\xml\sax\xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "C:\Python26\Lib\xml\sax\expatreader.py", line 207, in feed
    self._parser.Parse(data, isFinal)
  File "C:\Python26\Lib\xml\sax\expatreader.py", line 304, in
end_element
    self._cont_handler.endElement(name)
  File "E:\sc\b2.py", line 51, in endElement
    d.write(csv+"\n")
UnicodeEncodeError: 'ascii' codec can't encode characters in position
146-147: ordinal not in range(128)

I'm using ActivePython 2.6. I trying to figure out the simplest fix.
If there's a Python way to just take the source XML file and covert/
process it so this will not happen - that would be best. Or should I
just update to Python 3 ?

I tried this but nothing changed, I thought this might convert it and
then I'd paerse the new file - didn't work:

uc = open(r'E:\sc\ppb4.xml').read().decode('utf8')
ascii = uc.decode('ascii')
mex9 = open( r'E:\scrapes\ppb5.xml', 'w' )
mex9.write(ascii)

Again I'm looking for something simple even it's a few more lines of
codes...or upgrade(?)

Thanks, appreciate any help.
mex9.close()



More information about the Python-list mailing list