Re: [lxml] Efficient incremental parsing using etree.iterparse
Ah wow, one of the hidden treasures of lxml.. :) Thanks! On 11/21/2014 12:58 PM, Steven Vereecken wrote:
Hello,
It doesn't seem to be mentioned in the docs, but you *can* specify multiple tag names (just use a list of names instead of one string). I'm not really sure where I picked that up myself, maybe from this mention in the changelog (features added in 3.0alpha1) : "Tree iteration and iterparse() with a selective tag argument supports passing a set of tags. Tree nodes will be returned by the iterators if they match any of the tags."
greetings, Steven
2014-11-21 11:47 GMT+01:00 D.H.J. Takken <d.h.j.takken@xs4all.nl <mailto:d.h.j.takken@xs4all.nl>>:
Hello,
I need to process very large XML files as quickly as possible. The XML processing does not require processing of every single tag, so I was looking at the iterparse method.
Unfortunately, the iterparse method only allows one tag name to be specified for triggering events, while I need to do processing on two or three different tags. This would still be much more efficient than using the target parser method, because the XML data contains many more tags that do not require immediate processing.
So, it looks like I need something in between processing *all* tags and processing a single tag. Is there any way to do that?
Thanks for any hints! _________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de <mailto:lxml@lxml.de> https://mailman-mail5.webfaction.com/listinfo/lxml
On 11/21/2014 12:58 PM, Steven Vereecken wrote:
It doesn't seem to be mentioned in the docs, but you *can* specify multiple tag names (just use a list of names instead of one string).
It's at least mentioned in two places: http://lxml.de/tutorial.html#tree-iteration http://lxml.de/api.html#iteration (was added in lxml 3.0, I just noticed that the second link says 2.4) Stefan
Stefan Behnel schrieb am 21.11.2014 um 18:11:
On 11/21/2014 12:58 PM, Steven Vereecken wrote:
It doesn't seem to be mentioned in the docs, but you *can* specify multiple tag names (just use a list of names instead of one string).
It's at least mentioned in two places:
... and I forgot to say that improvements for the docs (in the form of pull requests) are always welcome. https://github.com/lxml/lxml/tree/master/doc Stefan
participants (2)
-
D.H.J. Takken
-
Stefan Behnel