
Hi Nat,
In most cases, the XML is being processed by code that knows exactly what the schema of the document is and hence using lxml.objectify is a perfect match. However, in a few cases I'm writing code that's trying to generically process XML across documents whose schema is structurally similar but differs in detail. E.g., it might be processing a document that looks like:
<X> <A/> <A/> <M/> <M/> <M/> <M/> </X>
or:
<X> <A/> <A/> <N/> <N/> <N/> <N/> </X>
I.e., both are rooted with <X>, start with some number of <A> sub- elements followed some number of either <M> or <N> sub-elements. The processing code doesn't care about the structure of the <M> or <N> elements and in fact the documents it's processing could have any number of any single type of sub-element following the <A> sub- elements and not really care about their details. However, the code wants to return a list of objectified versions of those elements. Maybe there's a way to do all this entirely in "objectified mode", but I haven't figured it out.
Thanks for sharing. Note that X.A will give you access to all A children of X but it's not actually a list - rather kind of a "sequential view", i.e. objectify gives you the index operator to access following identically-named siblings. I.e. maybe it's feasible for you to simple iterate over the elements which will present them in document order regardless of their name:
root = objectify.fromstring("""<X> ... <A/> ... <A/> ... <M/> ... <M/> ... <M/> ... <M/> ... </X>""") root.getchildren() [u'', u'', u'', u'', u'', u''] for elem in root.iterchildren(): ... print elem.tag, type(elem) ... A <type 'lxml.objectify.StringElement'> A <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'>
Which won't care at all about element names. If need be you can also iterate on restricted tag name(s) only, like:
for elem in root.iterchildren('M', 'N'): ... print elem.tag, type(elem) ... M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> M <type 'lxml.objectify.StringElement'> root = objectify.fromstring("""<X> ... <A/> ... <A/> ... <N/> ... <N/> ... <N/> ... <N/> ... </X>""") for elem in root.iterchildren('M', 'N'): ... print elem.tag, type(elem) ... N <type 'lxml.objectify.StringElement'> N <type 'lxml.objectify.StringElement'> N <type 'lxml.objectify.StringElement'> N <type 'lxml.objectify.StringElement'>
Best regards, Holger Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart