trouble with xml.sax and unknow entities
Antony Lesuisse
al2000 at udev.org
Sun Apr 27 00:30:55 EDT 2003
I'm not on the list, please cc: me the answers.
I'm having trouble to parse the folowing xml with the default python xml.sax
api. I'm using python2.2 on debian unstable powerpc (python2.2-xmlbase).
'<?xml version="1.0"?><html><body>hello </body></html>'
See the code at the end.
xml.sax._exceptions.SAXParseException: <unknown>:1:39: undefined entity
The parser halt on &nsbsp; because it doesn't know about this entity. The
problem is cannot find a way to tell him what this entity is.
(1)
Is there a way to have a callback the parser arrive on ? None of the
folowing handler functions (resolveEntity,notationDecl,unparsedEntityDecl) are
called.
I thought resolveEntity had to be called in that situation but i probably
misunderstand the sax api.
(2)
Is there a way to register entities before the parsing begin ?
Something like:
parser.registerEntity(' ','blahblah')
(3)
Or is there a way to register an external DTD where those entities can be
defined ? Something like:
parser.registerExternalDTD('xhtml.dtd')
Thank you for your help.
-----------------------------------------------------------
#!/usr/bin/python
import StringIO,sys,xml.sax,xml.sax.handler
class CHandler(xml.sax.handler.ContentHandler):
def startElement(self, name, attrs):
print name
def characters(self, ch):
print ch.encode('Latin-1')
class EResolver(xml.sax.handler.EntityResolver):
def resolveEntity(self,publicId,systemId):
print " resolveEntity ",publicId,systemId
sys.exit()
class DHandler(xml.sax.handler.DTDHandler):
def notationDecl(name, publicId, systemId):
print " notationDecl ",publicId,systemId
sys.exit()
def unparsedEntityDecl(name, publicId, systemId, ndata):
print " unparsedEntityDecl ",publicId,systemId,ndata
sys.exit()
xmlstr = '<?xml version="1.0"?><html><body>hello </body></html>'
parser = xml.sax.make_parser()
parser.setContentHandler(CHandler())
parser.setEntityResolver(EResolver())
parser.setDTDHandler(DHandler())
parser.parse(StringIO.StringIO(xmlstr))
--
Antony Lesuisse
GPG EA2CCD66: 4B7F 6061 3DF5 F07A ACFF F127 6487 54F7 EA2C CD66
More information about the Python-list
mailing list