<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
    <div class="moz-cite-prefix">On 05/09/14 16:55, Stefan Behnel wrote:<br>
    </div>
    <blockquote cite="mid:lkimp0$428$1@ger.gmane.org" type="cite">
      <pre wrap="">ElementTree has gained a nice API in
Py3.4 that supports this in a much saner way than SAX, using iterators.
Basically, you just dump in some data that you received and get back an
iterator over the elements (and their subtrees) that it generated from it.
Intercept on the right top elements and you get your next subtree as soon
as it's ready.</pre>
    </blockquote>
    <br>
    <br>
    Hi Stefan,<br>
    <br>
    Here's a small script:<br>
    <blockquote>
      <pre class="line-pre"><div class="line" id="file-incparsexml-py-LC23"><span class="n">events</span> <span class="o">=</span> <span class="n">etree</span><span class="o">.</span><span class="n">iterparse</span><span class="p">(</span><span class="n">istr</span><span class="p">,</span> <span class="n">events</span><span class="o">=</span><span class="p">(</span><span class="s">"start"</span><span class="p">,</span> <span class="s">"end"</span><span class="p">))</span></div><div class="line" id="file-incparsexml-py-LC24"><span class="n">stack</span> <span class="o">=</span> <span class="n">deque</span><span class="p">()</span></div><div class="line" id="file-incparsexml-py-LC25"><span class="k">for</span> <span class="n">event</span><span class="p">,</span> <span class="n">element</span> <span class="ow">in</span> <span class="n">events</span><span class="p">:</span></div><div class="line" id="file-incparsexml-py-LC26">  <span class="k">if</span> <span class="n">event</span> <span
  
class="o">==</span> <span class="s">"start"</span><span class="p">:</span></div><div class="line" id="file-incparsexml-py-LC27">    <span class="n">stack</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">element</span><span class="p">)</span></div><div class="line" id="file-incparsexml-py-LC28">  <span class="k">elif</span> <span class="n">event</span> <span class="o">==</span> <span class="s">"end"</span><span class="p">:</span></div><div class="line" id="file-incparsexml-py-LC29">    <span class="n">stack</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span></div><div class="line" id="file-incparsexml-py-LC30"> </div><div class="line" id="file-incparsexml-py-LC31">  <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span></div><div class="line" id="file-i
 n
cparsexml-py-LC32">    <span class="k">break</span></div><div class="line" id="file-incparsexml-py-LC33"> </div><div class="line" id="file-incparsexml-py-LC34">  <span class="k">print</span><span class="p">(</span><span class="n">istr</span><span class="o">.</span><span class="n">tell</span><span class="p">(),</span> <span class="s">"</span><span class="si">%5s</span><span class="s">, </span><span class="si">%4s</span><span class="s">, </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">event</span><span class="p">,</span> <span class="n">element</span><span class="o">.</span><span class="n">tag</span><span class="p">,</span> <span class="n">element</span><span class="o">.</span><span class="n">text</span><span class="p">))</span></div></pre>
    </blockquote>
    where istr is an input-stream. (Fully working example:
    <a class="moz-txt-link-freetext" href="https://gist.github.com/plq/025005a71e8135c46800">https://gist.github.com/plq/025005a71e8135c46800</a>)<br>
    <br>
    I was expecting to have istr.tell() return the position where the
    first root element ends, which would make it possible to continue
    parsing with another call to etree.iterparse(). But istr.tell()
    returns the position of EOF after the first call to next() on the
    iterator it returns. Without the stack check, the loop eventually
    throws an exception and the offset value in that exception is None.<br>
    <br>
    So I'm lost here, how it'd possible to parse OP's document with
    lxml?<br>
    <br>
    Best,<br>
    Burak<br>
  </body>
</html>