xml parsing with sax

Alex Martelli aleaxit at yahoo.com
Fri Apr 27 07:26:58 EDT 2001


"Harald Kirsch" <kirschh at lionbioscience.com> wrote in message
news:yv28zkm63gi.fsf at lionsp093.lion-ag.de...
    [snip]
> it possible to push a pseudo root element in front of a stream parsed
> with
>
>   xml.sax.parse(sys.stdin, ...)


If the stream fits in memory, it's clearly easy:


import xml.sax, xml.sax.handler, sys, cStringIO

class Handler(xml.sax.handler.ContentHandler):
    def startElement(self, name, attrs):
        print "Element",name

def wrapWith(tag, fileob):
    result = cStringIO.StringIO()
    result.write("<%s>\n"%tag)
    result.write(fileob.read())
    result.write("</%s>\n"%tag)
    result.seek(0)
    return result

xml.sax.parse(wrapWith("fakedoc",sys.stdin), Handler())


If you have to cater for large streams, such that
fileob.read() could blow up, then wrapWith() will
need to become a factory callable for an appropriate
object -- not quite as easy but of course still OK;
somewhat in the same spirit as cookbook recipe
http://www.activestate.com/ASPN/Python/Cookbook/Recipe/52295
we could have:

import xml.sax, xml.sax.handler, sys, cStringIO

class Handler(xml.sax.handler.ContentHandler):
    def startElement(self, name, attrs):
        print "Element",name

class FileWrappedWithTag:
    def __init__(self, tagname, file):
        self.tagname = tagname
        self.file = file
        self.state = 0

    def __getattr__(self, attr):
        return getattr(self.file, attr)

    def read(self, size=-1):
        if size<0:
            self.state = 2
            return "<%s>\n%s</%s>\n" % (
              self.tagname, self.file.read(), self.tagname)
        elif self.state == 0:
            self.state = 1
            return "<%s>\n%s" % (
              self.tagname, self.file.read(size-len(self.tagname)-3))
        elif self.state == 1:
            result = self.file.read(size)
            if result: return result
            self.state = 2
            return "</%s>\n" % self.tagname
        elif self.state == 2:
            return ""

def wrapWith(tag, fileob):
    return FileWrappedWithTag(tag, fileob)

xml.sax.parse(wrapWith("fakedoc",sys.stdin), Handler())


Fortunately, the parser just calls .read(N) on its
file-object argument -- it does *NOT* type-test,
and a GOOD thing it is that it doesn't! -- so that
Python's typical signature-based polymorphism lets
us do the job in a reasonably clean and easy way.


Alex






More information about the Python-list mailing list