On Wednesday, June 11, 2014 18:34:26 Phil Mayers wrote:
(Bah sorry, hit wrong key)
I'm using lxml to parse Junoscript, which is a protocol that's a single infinite XML document, a bit like XMPP. It goes:
<junoscript> <pdu>...</pdu> <pdu>...</pdu> <pdu>...</pdu>
I've been using the construct:
parser = XMLParser(target=SomeClass()) while getdata(): parser.feed(data)
...and handling the start/end event callbacks to dispatch PDUs. I've run into problems in production where, all of a sudden, parser events aren't being dispatched when I expected. The difference seems to be the chunking of the data differs in production, for reasons of timing/load.
I found this thread:
http://thread.gmane.org/gmane.comp.python.lxml.devel/4871/focus=4881
...which suggests this is actually expected, and my understanding of the parser/target stuff is wrong - there's no guarantee that "end" will be called at any particular time. Is this correct?
My understanding of the thread you linked to is that this was actually a stream of XML documents, not a single large (endless?) doc. Which makes it essentially non-XML, as an XML doc can only have one root node. Is this also the case in your situation i.e. a series of <junoscript> ...</junoscript> <junoscript> ...</junoscript> ... "documents"? Holger