Converting from Element to ObjectifiedElement

I'm new to using lxml and not that much of a Python expert, so sorry in advance if this question has an obvious answer: I'm mostly using lxml.objectify (to parse some XML in an HTTP response) but in some cases I need to use lxml.etree (also to parse an HTTP response) because in those cases the XML I'm dealing with is less predictable/regular. However, once I've done some initial processing on the non-Objectified version, I'd like to make an Objectified version of it. I could convert through a string and back, but I'm looking for something (that I assume would be) faster. Any suggestions? Thanks. Nat

Hi Nat,
I'm mostly using lxml.objectify (to parse some XML in an HTTP response) but in some cases I need to use lxml.etree (also to parse an HTTP response) because in those cases the XML I'm dealing with is less predictable/regular.
Can you elaborate a bit on what makes lxml.objectify less suitable for these cases?
However, once I've done some initial processing on the non-Objectified version, I'd like to make an Objectified version of it. I could convert through a string and back, but I'm looking for something (that I assume would be) faster.
Interesting question. I don't know any obvious conversion method but I'd say just go for serialization & re-parsing. It's an area where lxml usually shines speed-wise. Unless your performance/memory measurements tell you this is not the way to go for your use case, of course... Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Am .11.2015, 14:52 Uhr, schrieb Holger Joukl <Holger.Joukl@lbbw.de>:
Interesting question. I don't know any obvious conversion method but I'd say just go for serialization & re-parsing. It's an area where lxml usually shines speed-wise.
I also reckon that convert to string and parse is the way to go. Converting to string is very fast, and the main overhead when parsing is the creation of the new object tree, which you'd have anyway. Note, as Holger hints, the total size of the tree you're working on may be your biggest worry: XML is a memory hog. Charlie -- Charlie Clark Managing Director Clark Consulting & Research German Office Kronenstr. 27a Düsseldorf D- 40217 Tel: +49-211-600-3657 Mobile: +49-178-782-6226

Hi Holger, Can you elaborate a bit on what makes lxml.objectify less suitable
for these cases?
In most cases, the XML is being processed by code that knows exactly what the schema of the document is and hence using lxml.objectify is a perfect match. However, in a few cases I'm writing code that's trying to generically process XML across documents whose schema is structurally similar but differs in detail. E.g., it might be processing a document that looks like: <X> <A/> <A/> <M/> <M/> <M/> <M/> </X> or: <X> <A/> <A/> <N/> <N/> <N/> <N/> </X> I.e., both are rooted with <X>, start with some number of <A> sub-elements followed some number of *either* <M> or <N> sub-elements. The processing code doesn't care about the structure of the <M> or <N> elements and in fact the documents it's processing could have any number of any single type of sub-element following the <A> sub-elements and not really care about their details. However, the code wants to return a list of objectified versions of those elements. Maybe there's a way to do all this entirely in "objectified mode", but I haven't figured it out. Interesting question. I don't know any obvious conversion method but
I'd say just go for serialization & re-parsing. It's an area where lxml usually shines speed-wise. Unless your performance/memory measurements tell you this is not the way to go for your use case, of course...
Yeah, I'm not too worried about the serialization/deserialization cost in my particular case. It just seemed a bit unclean and I was looking for something less unclean. Thanks. Nat

Hi Nat,
In most cases, the XML is being processed by code that knows exactly what the schema of the document is and hence using lxml.objectify is a perfect match. However, in a few cases I'm writing code that's trying to generically process XML across documents whose schema is structurally similar but differs in detail. E.g., it might be processing a document that looks like:
<X> <A/> <A/> <M/> <M/> <M/> <M/> </X>
or:
<X> <A/> <A/> <N/> <N/> <N/> <N/> </X>
I.e., both are rooted with <X>, start with some number of <A> sub- elements followed some number of either <M> or <N> sub-elements. The processing code doesn't care about the structure of the <M> or <N> elements and in fact the documents it's processing could have any number of any single type of sub-element following the <A> sub- elements and not really care about their details. However, the code wants to return a list of objectified versions of those elements. Maybe there's a way to do all this entirely in "objectified mode", but I haven't figured it out.
Thanks for sharing. Note that X.A will give you access to all A children of X but it's not actually a list - rather kind of a "sequential view", i.e. objectify gives you the index operator to access following identically-named siblings. I.e. maybe it's feasible for you to simple iterate over the elements which will present them in document order regardless of their name:
root = objectify.fromstring("""<X> ... <A/> ... <A/> ... <M/> ... <M/> ... <M/> ... <M/> ... </X>""") root.getchildren() [u'', u'', u'', u'', u'', u''] for elem in root.iterchildren(): ... print elem.tag, type(elem) ... A <type 'lxml.objectify.StringElement'> A <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'>
Which won't care at all about element names. If need be you can also iterate on restricted tag name(s) only, like:
for elem in root.iterchildren('M', 'N'): ... print elem.tag, type(elem) ... M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> root = objectify.fromstring("""<X> ... <A/> ... <A/> ... <N/> ... <N/> ... <N/> ... <N/> ... </X>""") for elem in root.iterchildren('M', 'N'): ... print elem.tag, type(elem) ... N <type 'lxml.objectify.StringElement'> N <type 'lxml.objectify.StringElement'> N <type 'lxml.objectify.StringElement'> N <type 'lxml.objectify.StringElement'>
Best regards, Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart

Hi Holger, The approach of using iterchildren() fit exactly my case and I've switched to using it. Thanks! Nat On Tue, Nov 10, 2015 at 10:33 AM Holger Joukl <Holger.Joukl@lbbw.de> wrote:
Hi Nat,
In most cases, the XML is being processed by code that knows exactly what the schema of the document is and hence using lxml.objectify is a perfect match. However, in a few cases I'm writing code that's trying to generically process XML across documents whose schema is structurally similar but differs in detail. E.g., it might be processing a document that looks like:
<X> <A/> <A/> <M/> <M/> <M/> <M/> </X>
or:
<X> <A/> <A/> <N/> <N/> <N/> <N/> </X>
I.e., both are rooted with <X>, start with some number of <A> sub- elements followed some number of either <M> or <N> sub-elements. The processing code doesn't care about the structure of the <M> or <N> elements and in fact the documents it's processing could have any number of any single type of sub-element following the <A> sub- elements and not really care about their details. However, the code wants to return a list of objectified versions of those elements. Maybe there's a way to do all this entirely in "objectified mode", but I haven't figured it out.
Thanks for sharing.
Note that X.A will give you access to all A children of X but it's not actually a list - rather kind of a "sequential view", i.e. objectify gives you the index operator to access following identically-named siblings.
I.e. maybe it's feasible for you to simple iterate over the elements which will present them in document order regardless of their name:
root = objectify.fromstring("""<X> ... <A/> ... <A/> ... <M/> ... <M/> ... <M/> ... <M/> ... </X>""") root.getchildren() [u'', u'', u'', u'', u'', u''] for elem in root.iterchildren(): ... print elem.tag, type(elem) ... A <type 'lxml.objectify.StringElement'> A <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'>
Which won't care at all about element names.
If need be you can also iterate on restricted tag name(s) only, like:
for elem in root.iterchildren('M', 'N'): ... print elem.tag, type(elem) ... M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> root = objectify.fromstring("""<X> ... <A/> ... <A/> ... <N/> ... <N/> ... <N/> ... <N/> ... </X>""") for elem in root.iterchildren('M', 'N'): ... print elem.tag, type(elem) ... N <type 'lxml.objectify.StringElement'> N <type 'lxml.objectify.StringElement'> N <type 'lxml.objectify.StringElement'> N <type 'lxml.objectify.StringElement'>
Best regards, Holger
Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart
_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
participants (3)
-
Charlie Clark
-
Holger Joukl
-
Nathaniel Mishkin