Hi,
I'm CC-ing the list, I hope you don't mind. I think your description is
abstract enough not to reveal anything about your application.
Robert Liebeskind wrote:
> The trace you received was from v2.2 of lxml but we continue to
> experience
> the same issue with v2.5. We use XPath extensively. We do not use
> XSLT.
I guess you meant 2.2beta1 and 2.1.5?
> 1. An etree is loaded from an xml file and the data displayed for the
> user.
> 2. The etree is modified as the result of user edits using a GUI
I assume that this happens inside one thread.
> 3. The etree is the copied using copy.deepcopy() to etree2
> 4. etree2 is passed via a queue to a thread in which it is further
> processed.
Try copying the tree inside the target thread, (preferably) instead of
copying it inside another thread and passing it over. Trees inherit state
from the thread that built them. Also, using a tree inside a thread that
did not build it will result in some additional adaptation overhead.
> 5. etree2 is modfied as a result of processing in its own thread.
> during this processing
> additional trees/elems are fetched from disk and used to modify/
> augment etree2.
> 6. etree2 is copied to etree3
> 7. etree3 is sent for a additional processing in its own thread.
> 8. etree2 is copied to etree4
> 9. etree 4 is sent for additional processing in its own thread.
Same thing for 6/7 and 8/9. Copying the tree from inside the target thread
will make things more stable. Even if multiple copying is not really
memory friendly, it's very fast in lxml, so as long as we are not talking
about documents with several megabytes, and as long as this thing really
runs on a multi processor machine, you should be fine even with a
work-around that copies the tree redundantly in both threads.
> at this point the initial thread is complete and tears down.
> the two additional spawned threads finish quickly and tear down as well.
> These processes will succeed quite often. They fail intermittently
> and result in a Windows Unhandled Exception.
lxml.etree uses a per-thread dictionary that holds names of tags and
attributes. That's one of the reasons why it's so fast and memory
friendly. In the stack trace you showed me, it seems that a tree is freed
in a different thread than the one that built it, but (for whatever
reason) some of it content is still linked to a dictionary of the original
thread. In this case, the tree cleanup cannot detect that the name is
stored in a dictionary and will free it manually. When the originating
thread goes down, either before or after the thread that freed the tree,
it will destroy the dictionary that stores the name, which results in a
double free.
Does that help for now?
Stefan