[XML-SIG] sax2 parsing from a string
Martin v. Loewis
martin@loewis.home.cs.tu-berlin.de
Thu, 27 Sep 2001 07:55:24 +0200
> Can someone give me a brief example showing how to create a
> namespace-aware sax2 parser and use it to parse a string containing an
> XML document?
I see a number of confusing information in your message, perhaps you
can help making sense out of it.
> parser = xml.sax.make_parser()
> parser.setFeature(xml.sax.handler.feature_namespaces, 1)
> parser.setContentHandler(myhandler)
> inputsource = xml.sax.xmlreader.InputSource()
> inbuffer = cStringIO.StringIO()
> inbuffer.write(xmlstring)
> inbuffer.seek(0)
> inputsource.setByteStream(inbuffer)
> parser.parse(inputsource)
> parser.close()
You don't need to close the parser if you use the .parse method; this
is only for use as an IncremementalParser (i.e. through feed).
> self._parser.Parse(data, isFinal)
> File "extensions/pyexpat.c", line 522, in CharacterData
> TypeError: not enough arguments; expected 4, got 2
I cannot reproduce this problem. Can you please find out what content
handler exactly you gave to the expat reader? It appears that you
somehow put in a character data handler that expects 4 arguments,
whereas pyexpat will only pass 2 of them.
To find this out, please print myhandler, and perhaps
myhandler.characters.
> If I replace the line:
> inputsource.setByteStream(inbuffer)
>
> with:
> inputsource.setCharacterStream(inbuffer)
>
>
> I get:
> Traceback (most recent call last):
This is not so surprising: the character stream interface is inherited
from Java, but it doesn't work in Python (yet?).
> Also (on a tangent), I think in xml.sax.saxutils.XMLGenerator and
> xml.sax.saxutils.XMLFilterBase that the characters() and
> ignorableWhitespace() methods need to have 4 arguments instead of 2...
>
> For example:
> def characters(self, content, start, length):
> self._out.write(escape(content[start:start+length]))
No, they don't. A SAX2 characters handler has only a single content
argument; it was SAX1 where you had start and length arguments.
Regards,
Martin