Re: [lxml-dev] Unable to solve a crash on Windows with LXML
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Hi, I'm CC-ing the list, I hope you don't mind. I think your description is abstract enough not to reveal anything about your application. Robert Liebeskind wrote:
I guess you meant 2.2beta1 and 2.1.5?
I assume that this happens inside one thread.
Try copying the tree inside the target thread, (preferably) instead of copying it inside another thread and passing it over. Trees inherit state from the thread that built them. Also, using a tree inside a thread that did not build it will result in some additional adaptation overhead.
Same thing for 6/7 and 8/9. Copying the tree from inside the target thread will make things more stable. Even if multiple copying is not really memory friendly, it's very fast in lxml, so as long as we are not talking about documents with several megabytes, and as long as this thing really runs on a multi processor machine, you should be fine even with a work-around that copies the tree redundantly in both threads.
lxml.etree uses a per-thread dictionary that holds names of tags and attributes. That's one of the reasons why it's so fast and memory friendly. In the stack trace you showed me, it seems that a tree is freed in a different thread than the one that built it, but (for whatever reason) some of it content is still linked to a dictionary of the original thread. In this case, the tree cleanup cannot detect that the name is stored in a dictionary and will free it manually. When the originating thread goes down, either before or after the thread that freed the tree, it will destroy the dictionary that stores the name, which results in a double free. Does that help for now? Stefan
data:image/s3,"s3://crabby-images/22f5b/22f5be07d81b304b465bf21d25b866d2457369e5" alt=""
Hi Stefan, Yes, this is helpful and I will make the adjustment you suggest. Actually I meant lxml 2.1.2 and lxml 2.1.5. Sorry for the confusion. Regards, Rob. On Jan 26, 2009, at 9:30 AM, Stefan Behnel wrote:
data:image/s3,"s3://crabby-images/22f5b/22f5be07d81b304b465bf21d25b866d2457369e5" alt=""
Hi Stefan, I have modified my code so that an etree never crosses threads. Now the etree is converted to text and then back to an etree in the new thread. This has resolved the issue. Does you know if lxml 2.2 have the same issue? Thanks for your help. Rob. On Jan 26, 2009, at 9:30 AM, Stefan Behnel wrote:
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Robert Liebeskind wrote:
That's an extreme measure, but I doubt that it's noticeably slower than copying the tree. Serialising and parsing is *very* fast in lxml, so this becomes a safe and simple option.
Does you know if lxml 2.2 have the same issue?
There were no changes between 2.1 and 2.2 that could make me expect anything else. I've resolved a lot of similar issues in the past, so the conditions under which crashes occur have become more and more obscure. It would be nice if we could fix this problem for 2.2 final. Could you come up with a simple setup that mimics what you described as your application flow? That would allow me to play with it myself. There's also a set of threading related tests in test_threading.py. IIRC, we are still missing one that tests a multi-thread XML pipeline as you use. Stefan
data:image/s3,"s3://crabby-images/22f5b/22f5be07d81b304b465bf21d25b866d2457369e5" alt=""
Hi Stefan, Yes, this is helpful and I will make the adjustment you suggest. Actually I meant lxml 2.1.2 and lxml 2.1.5. Sorry for the confusion. Regards, Rob. On Jan 26, 2009, at 9:30 AM, Stefan Behnel wrote:
data:image/s3,"s3://crabby-images/22f5b/22f5be07d81b304b465bf21d25b866d2457369e5" alt=""
Hi Stefan, I have modified my code so that an etree never crosses threads. Now the etree is converted to text and then back to an etree in the new thread. This has resolved the issue. Does you know if lxml 2.2 have the same issue? Thanks for your help. Rob. On Jan 26, 2009, at 9:30 AM, Stefan Behnel wrote:
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Robert Liebeskind wrote:
That's an extreme measure, but I doubt that it's noticeably slower than copying the tree. Serialising and parsing is *very* fast in lxml, so this becomes a safe and simple option.
Does you know if lxml 2.2 have the same issue?
There were no changes between 2.1 and 2.2 that could make me expect anything else. I've resolved a lot of similar issues in the past, so the conditions under which crashes occur have become more and more obscure. It would be nice if we could fix this problem for 2.2 final. Could you come up with a simple setup that mimics what you described as your application flow? That would allow me to play with it myself. There's also a set of threading related tests in test_threading.py. IIRC, we are still missing one that tests a multi-thread XML pipeline as you use. Stefan
participants (2)
-
Robert Liebeskind
-
Stefan Behnel