lxml - The Python XML Toolkit

Download

lxml@python.org

May 2012

15 participants
22 discussions

threading + deepcopy(Element) = memory loss
by Christian Heimes May 4, 2012

May 4, 2012

Hello, for a while we have seen extensive memory usage in our application. Yesterday a colleague and I were able to find the cause of the memory loss. A cache is caching etree instances and uses deepcopy() to return a copy of the etree. The cache is filled and access from multiple threads, hence the need for deepcopy. The amount of lost memory depends on the amount of threads and -- to some point -- on the amount of copy operations in each thread. I've attached a sample script to reproduce … [View More]

2 1

Problems with top level comments and meta tags
by Adalbert Michelic May 4, 2012

May 4, 2012

Hi, while implementing a script that should extract a few things from HTML pages, I've run into two problems with lxml.html. I'm not sure if those are actually bugs or if I am doing something wrong, so I haven't created entries in the bug tracker yet. Problem 1: Top level comment has no parent ,---- | >>> html = "<html><head><title>foo</title></head></html>" | >>> tree = lxml.html.fromstring(html) | >>> [tag.… [View More]

1 0