[XML-SIG] easySAX

Lars Marius Garshol larsga@ifi.uio.no
05 May 1999 17:37:38 +0200


Here's a first sketch of what easySAX might look like.  It
incorporates Paul's text proposal, Geir Ove's text-in-one-block, Paul's
always-print-error-messages and should probably be extended to call
start_foo for <foo ...> events provided that start_foo is defined (and
otherwise use startElement) and ditto for end tags.

One addition that might be nice: allow users to define pi_<target>(data)
and ppi_<target>(attrs) methods. In the latter case the data would be
parsed into an attributelist.

What do people think? Is this better than adding the suggested
improvements to the SAX core? This was just hacked together in 15
minutes, so please don't hesitate to slaughter it if you don't like
it.

--Lars M.


from xml.sax import saxlib,saxexts
import sys

class SAXAdapter(saxlib.DocumentHandler):

    def __init__(self,dh):
        self.dh=dh
    
    def startElement(self,name,attrs):
        self.dh.text=""
        self.dh.startElement(name,attrs)
        self.dh.element_stack.append((name,attrs))  # copy attrs??

    def characters(self,data,start,len):
        self.dh.text=self.dh.text+data[start:start+len]

    def endElement(self,name):
        attrs=self.dh.element_stack[-1][1]
        del self.dh.element_stack[-1]
        self.dh.endElement(name,attrs)
        self.dh.text=""
        
    def processingInstruction(self,target,data):
        self.dh.processingInstruction(target,data)

class DocHandler:

    def __init__(self):
        self.element_stack=[]  # stack of (name,attrs) tuples
        self.locator=None      # locator, if any
        self.text=""           # text seen so far in current element
                               # (reset whenever a tag is seen)
    
    def startElement(self,name,attrs):
        pass

    def endElement(self,name,attrs):
        pass

    def processingInstruction(self,target,data):
        pass

    
# --- Main program

dh=DocHandler()
p=saxexts.make_parser()
p.setDocumentHandler(SAXAdapter(DocHandler()))
p.parse(sys.argv[1])