[XML-SIG] sax2 parsing from a string

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Thu, 27 Sep 2001 07:55:24 +0200


> Can someone give me a brief example showing how to create a
> namespace-aware sax2 parser and use it to parse a string containing an
> XML document?

I see a number of confusing information in your message, perhaps you
can help making sense out of it.

> parser = xml.sax.make_parser()
> parser.setFeature(xml.sax.handler.feature_namespaces, 1)
> parser.setContentHandler(myhandler)
> inputsource = xml.sax.xmlreader.InputSource()
> inbuffer = cStringIO.StringIO()
> inbuffer.write(xmlstring)
> inbuffer.seek(0)
> inputsource.setByteStream(inbuffer)
> parser.parse(inputsource)
> parser.close()

You don't need to close the parser if you use the .parse method; this
is only for use as an IncremementalParser (i.e. through feed).

>     self._parser.Parse(data, isFinal)
>   File "extensions/pyexpat.c", line 522, in CharacterData
> TypeError: not enough arguments; expected 4, got 2

I cannot reproduce this problem. Can you please find out what content
handler exactly you gave to the expat reader? It appears that you
somehow put in a character data handler that expects 4 arguments,
whereas pyexpat will only pass 2 of them.

To find this out, please print myhandler, and perhaps
myhandler.characters.

> If I replace the line:
> inputsource.setByteStream(inbuffer)
> 
> with:
> inputsource.setCharacterStream(inbuffer)
> 
> 
> I get:
> Traceback (most recent call last):

This is not so surprising: the character stream interface is inherited
from Java, but it doesn't work in Python (yet?).

> Also (on a tangent), I think in xml.sax.saxutils.XMLGenerator and
> xml.sax.saxutils.XMLFilterBase that the characters() and
> ignorableWhitespace() methods need to have 4 arguments instead of 2...
> 
> For example:
>    def characters(self, content, start, length):
>        self._out.write(escape(content[start:start+length]))

No, they don't. A SAX2 characters handler has only a single content
argument; it was SAX1 where you had start and length arguments.

Regards,
Martin