Hi Holger,

The approach of using iterchildren() fit exactly my case and I've switched to using it. Thanks!

Nat

On Tue, Nov 10, 2015 at 10:33 AM Holger Joukl <Holger.Joukl@lbbw.de> wrote:
Hi Nat,

> In most cases, the XML is being processed by code that knows exactly
> what the schema of the document is and hence using lxml.objectify is
> a perfect match. However, in a few cases I'm writing code that's
> trying to generically process XML across documents whose schema is
> structurally similar but differs in detail. E.g., it might be
> processing a document that looks like:
>
> <X>
>   <A/>
>   <A/>
>   <M/>
>   <M/>
>   <M/>
>   <M/>
> </X>
>
> or:
>
> <X>
>   <A/>
>   <A/>
>   <N/>
>   <N/>
>   <N/>
>   <N/>
> </X>
>
> I.e., both are rooted with <X>, start with some number of <A> sub-
> elements followed some number of either <M> or <N> sub-elements. The
> processing code doesn't care about the structure of the <M> or <N>
> elements and in fact the documents it's processing could have any
> number of any single type of sub-element following the <A> sub-
> elements and not really care about their details. However, the code
> wants to return a list of objectified versions of those elements.
> Maybe there's a way to do all this entirely in "objectified mode",
> but I haven't figured it out.

Thanks for sharing.

Note that X.A will give you access to all A children of X but it's not
actually a list - rather kind of a "sequential view", i.e. objectify gives
you
the index operator to access following identically-named siblings.

I.e. maybe it's feasible for you to simple iterate over the elements
which will present them in document order regardless of their name:

>>> root = objectify.fromstring("""<X>
...   <A/>
...   <A/>
...   <M/>
...   <M/>
...   <M/>
...   <M/>
... </X>""")
>>> root.getchildren()
[u'', u'', u'', u'', u'', u'']
>>> for elem in root.iterchildren():
...     print elem.tag, type(elem)
...
A <type 'lxml.objectify.StringElement'>
A <type 'lxml.objectify.StringElement'>
M <type 'lxml.objectify.StringElement'>
M <type 'lxml.objectify.StringElement'>
M <type 'lxml.objectify.StringElement'>
M <type 'lxml.objectify.StringElement'>

Which won't care at all about element names.

If need be you can also iterate on restricted tag name(s) only, like:

>>> for elem in root.iterchildren('M', 'N'):
...     print elem.tag, type(elem)
...
M <type 'lxml.objectify.StringElement'>
M <type 'lxml.objectify.StringElement'>
M <type 'lxml.objectify.StringElement'>
M <type 'lxml.objectify.StringElement'>
>>> root = objectify.fromstring("""<X>
...   <A/>
...   <A/>
...   <N/>
...   <N/>
...   <N/>
...   <N/>
... </X>""")
>>> for elem in root.iterchildren('M', 'N'):
...     print elem.tag, type(elem)
...
N <type 'lxml.objectify.StringElement'>
N <type 'lxml.objectify.StringElement'>
N <type 'lxml.objectify.StringElement'>
N <type 'lxml.objectify.StringElement'>
>>>

Best regards,
Holger

Landesbank Baden-Wuerttemberg
Anstalt des oeffentlichen Rechts
Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz
HRA 12704
Amtsgericht Stuttgart

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml@lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml