xml parsing with sax
Alex Martelli
aleaxit at yahoo.com
Fri Apr 27 07:26:58 EDT 2001
"Harald Kirsch" <kirschh at lionbioscience.com> wrote in message
news:yv28zkm63gi.fsf at lionsp093.lion-ag.de...
[snip]
> it possible to push a pseudo root element in front of a stream parsed
> with
>
> xml.sax.parse(sys.stdin, ...)
If the stream fits in memory, it's clearly easy:
import xml.sax, xml.sax.handler, sys, cStringIO
class Handler(xml.sax.handler.ContentHandler):
def startElement(self, name, attrs):
print "Element",name
def wrapWith(tag, fileob):
result = cStringIO.StringIO()
result.write("<%s>\n"%tag)
result.write(fileob.read())
result.write("</%s>\n"%tag)
result.seek(0)
return result
xml.sax.parse(wrapWith("fakedoc",sys.stdin), Handler())
If you have to cater for large streams, such that
fileob.read() could blow up, then wrapWith() will
need to become a factory callable for an appropriate
object -- not quite as easy but of course still OK;
somewhat in the same spirit as cookbook recipe
http://www.activestate.com/ASPN/Python/Cookbook/Recipe/52295
we could have:
import xml.sax, xml.sax.handler, sys, cStringIO
class Handler(xml.sax.handler.ContentHandler):
def startElement(self, name, attrs):
print "Element",name
class FileWrappedWithTag:
def __init__(self, tagname, file):
self.tagname = tagname
self.file = file
self.state = 0
def __getattr__(self, attr):
return getattr(self.file, attr)
def read(self, size=-1):
if size<0:
self.state = 2
return "<%s>\n%s</%s>\n" % (
self.tagname, self.file.read(), self.tagname)
elif self.state == 0:
self.state = 1
return "<%s>\n%s" % (
self.tagname, self.file.read(size-len(self.tagname)-3))
elif self.state == 1:
result = self.file.read(size)
if result: return result
self.state = 2
return "</%s>\n" % self.tagname
elif self.state == 2:
return ""
def wrapWith(tag, fileob):
return FileWrappedWithTag(tag, fileob)
xml.sax.parse(wrapWith("fakedoc",sys.stdin), Handler())
Fortunately, the parser just calls .read(N) on its
file-object argument -- it does *NOT* type-test,
and a GOOD thing it is that it doesn't! -- so that
Python's typical signature-based polymorphism lets
us do the job in a reasonably clean and easy way.
Alex
More information about the Python-list
mailing list