Newbie XML SAX Parsing: How do I ignore an invalid token?
scott at crybabymaternity.com
scott at crybabymaternity.com
Fri Jan 5 16:50:18 EST 2007
I've got an XML feed from a vendor that is not well-formed, and having
them change it is not an option. I'm trying to figure out how to
create an error-handler that will ignore the invalid token and continue
on.
The file is large, so I'd prefer not to put it all in memory or save it
off and strip out the bad characters before I parse it.
I've included one of the problematic characters in a small XML snippet
below.
I'm new to Python, and I don't know how to accomplish this. Any help is
greatly appreciated!
-----------------------------------------------------------------
Here is my code:
from xml.sax import make_parser
from xml.sax.handler import ContentHandler
import StringIO
class ErrorHandler:
def __init__(self, parser):
self.parser = parser
def warning(self, msg):
print '*** (ErrorHandler.warning) msg:', msg
def error(self, msg):
print '*** (ErrorHandler.error) msg:', msg
def fatalError(self, msg):
print msg
class ContentHandler(ContentHandler):
def __init__ (self):
pass
def startElement(self, name, attrs):
pass
def characters (self, ch):
pass
def endElement(self, name):
pass
xmlstr = """
<cities>
<city>
<name>Tampa</name>
<description>A great city
and place to live</description>
</city>
<city>
<name>Clearwater</name>
<description>Beautiful beaches</description>
</city>
</cities>
"""
parser = make_parser()
curHandler = ContentHandler()
errorHandler = ErrorHandler(parser)
parser.setContentHandler(curHandler)
parser.setErrorHandler(errorHandler)
parser.parse(StringIO.StringIO(xmlstr))
More information about the Python-list
mailing list