Sax2 encoding

Juan M. Casillas juanm.casillas at
Fri Aug 30 12:20:43 CEST 2002

Hello folks!

I have an xml document that only begins with

<?xml version="1.0"?>

That is, without no info about the encoding. This document has special
characters encoded in ISO-8859-1 format (spanish characters just like
á, or ñ). When I try to parse the document with expat it works ok, but
I have to give it the default encoding:

import xml.parsers.expat
import sys

p = xml.parsers.expat.ParserCreate('ISO-8859-1')


f = open(sys.argv[1])
xmldocument =


But I need DOM ... and here comes my problem! when I create the 
DOM object and so on in the same way that the documentations says..

import sys
from xml.dom.ext.reader import Sax2

# create Reader object
reader = Sax2.Reader()

# parse the document
f = open(sys.argv[1])
doc = reader.fromStream(f)

It bombs and gets me the following error:

Traceback (most recent call last):
  File "./", line 11, in ?
    doc = reader.fromStream(f)
  File "/usr/local/python2/lib/python2.2/site-packages/_xmlplus/dom/ext/reader/", line 373, in fromStream
  File "/usr/local/python2/lib/python2.2/site-packages/_xmlplus/sax/", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/local/python2/lib/python2.2/site-packages/_xmlplus/sax/", line 123, in parse
  File "/usr/local/python2/lib/python2.2/site-packages/_xmlplus/sax/", line 211, in feed
  File "/usr/local/python2/lib/python2.2/site-packages/_xmlplus/dom/ext/reader/", line 341, in fatalError
    raise exception
xml.sax._exceptions.SAXParseException: efe56.00.xml:6:43: not well-formed (invalid token)

and poking arround the file, I found a 'á' character at this position.
So my question is... how can I set the default encoding for the sax2
reader so the XML parser works for me ?

Thanks in advance,
Python Rocks!

Juan M. Casillas

More information about the Python-list mailing list