Hi Nat,
> In most cases, the XML is being processed by code that knows exactly
> what the schema of the document is and hence using lxml.objectify is
> a perfect match. However, in a few cases I'm writing code that's
> trying to generically process XML across documents whose schema is
> structurally similar but differs in detail. E.g., it might be
> processing a document that looks like:
>
> <X>
> <A/>
> <A/>
> <M/>
> <M/>
> <M/>
> <M/>
> </X>
>
> or:
>
> <X>
> <A/>
> <A/>
> <N/>
> <N/>
> <N/>
> <N/>
> </X>
>
> I.e., both are rooted with <X>, start with some number of <A> sub-
> elements followed some number of either <M> or <N> sub-elements. The
> processing code doesn't care about the structure of the <M> or <N>
> elements and in fact the documents it's processing could have any
> number of any single type of sub-element following the <A> sub-
> elements and not really care about their details. However, the code
> wants to return a list of objectified versions of those elements.
> Maybe there's a way to do all this entirely in "objectified mode",
> but I haven't figured it out.
Thanks for sharing.
Note that X.A will give you access to all A children of X but it's not
actually a list - rather kind of a "sequential view", i.e. objectify gives
you
the index operator to access following identically-named siblings.
I.e. maybe it's feasible for you to simple iterate over the elements
which will present them in document order regardless of their name:
>>> root = objectify.fromstring("""<X>
... <A/>
... <A/>
... <M/>
... <M/>
... <M/>
... <M/>
... </X>""")
>>> root.getchildren()
[u'', u'', u'', u'', u'', u'']
>>> for elem in root.iterchildren():
... print elem.tag, type(elem)
...
A <type 'lxml.objectify.StringElement'>
A <type 'lxml.objectify.StringElement'>
M <type 'lxml.objectify.StringElement'>
M <type 'lxml.objectify.StringElement'>
M <type 'lxml.objectify.StringElement'>
M <type 'lxml.objectify.StringElement'>
Which won't care at all about element names.
If need be you can also iterate on restricted tag name(s) only, like:
>>> for elem in root.iterchildren('M', 'N'):
... print elem.tag, type(elem)
...
M <type 'lxml.objectify.StringElement'>
M <type 'lxml.objectify.StringElement'>
M <type 'lxml.objectify.StringElement'>
M <type 'lxml.objectify.StringElement'>
>>> root = objectify.fromstring("""<X>
... <A/>
... <A/>
... <N/>
... <N/>
... <N/>
... <N/>
... </X>""")
>>> for elem in root.iterchildren('M', 'N'):
... print elem.tag, type(elem)
...
N <type 'lxml.objectify.StringElement'>
N <type 'lxml.objectify.StringElement'>
N <type 'lxml.objectify.StringElement'>
N <type 'lxml.objectify.StringElement'>
>>>
Best regards,
Holger
Landesbank Baden-Wuerttemberg
Anstalt des oeffentlichen Rechts
Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz
HRA 12704
Amtsgericht Stuttgart
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml@lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml