[lxml-dev] Generating lxml objects from expat

We have a C application that uses expat to parse a XML stream (XMPP). The stream only terminates when the connection closes and we need to process (and respond) to it while its open, so a DOM parser would not work. Inside the <stream/> (root element) are "stanzas", elements at depth=1 which are generally short and easy to parse, so its desirable to pass each stanza to a DOM tree for further processing in Python. We can do this with cElementTree due to the exposed expat handlers, but I have not yet seen anything in lxml's C api which would suggest that its capable of this. Is it possible to construct and populate a lxml.etree externally using startElement, endElement, etc calls in a manner which is compatible with libxslt? Its also a bit frustrating that lxml's C headers are not installed to the system (and thus unpackaged on both Gentoo and Debian/Ubuntu).

Arc Riley, 13.06.2010 19:15:
We have a C application that uses expat to parse a XML stream (XMPP). The stream only terminates when the connection closes and we need to process (and respond) to it while its open, so a DOM parser would not work.
Inside the<stream/> (root element) are "stanzas", elements at depth=1 which are generally short and easy to parse, so its desirable to pass each stanza to a DOM tree for further processing in Python.
You didn't mention what your C application actually does and through what kind of 'connection' it gets its data, but this sounds like you should drop the C code entirely and just use iterparse(some_stream, tag='stanzas') in lxml.etree. When done with each element, .clear() it to discard its content from memory. If you still need your C code for some reason, there may still be ways to interact easily, but you will need to provide more information. Stefan
participants (2)
-
Arc Riley
-
Stefan Behnel