Iterparse memory problem
data:image/s3,"s3://crabby-images/35b60/35b60cdbceb8395f090710a13d2cf5f4eb75ddd2" alt=""
Hello, I have been using lxml (3.4.3) for parsing xmls from vendors. For example, here is one of the smaller files that should be publicly available: http://www.eberry.cz/editor/image/eshop_products/feed_seznam_jyxo.xml I am using urllib3 to get the response which should be file-like object that I am sending straight to iterparse method. It works great memory-wise as it does not have to put whole file into memory (some files can be huge). I am interested only in SHOPITEM element and I also clear() the element after I am done with it. I tried tag attribute of iterparse method to get events relevant only to this element. When I do that, the memory usage spikes up and it looks like it is putting whole file in memory. Any ideas on what could cause this behavior? Regards, Dio
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Dionyz Lazar schrieb am 16.06.2015 um 12:24:
iterparse() builds up the tree while parsing and it's up to you to clean up after it. Try intercepting on a couple of more tags that are not of interest and just delete their children as well. Or delete the parts of the tree that are 'left' of the SHOPITEM element that you're processing, i.e. all preceding siblings of the tag itself and its ancestors. Or try a mixture of both. Stefan
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Dionyz Lazar schrieb am 16.06.2015 um 12:24:
iterparse() builds up the tree while parsing and it's up to you to clean up after it. Try intercepting on a couple of more tags that are not of interest and just delete their children as well. Or delete the parts of the tree that are 'left' of the SHOPITEM element that you're processing, i.e. all preceding siblings of the tag itself and its ancestors. Or try a mixture of both. Stefan
participants (2)
-
Dionyz Lazar
-
Stefan Behnel