Hi Tobias,
lxml.objectfy lets us define custom types by means of objectify.PyType and ObjectifiedDataElement sub classes, e.g. as in [0]. It's nice how they map to XSD and automatically convert between python type and XML representation.
I understand how it works for scalar leaf nodes. But does objectify.PyType work for structured/nested types as well? Or is PyType the wrong tool and one should better head over to "Generating XML with custom classes" [1] and "Custom element class lookup" [2]? Or is it not possible at all?
It's not really suitable for structured types since the __setattr__ mechanics and PyType registry/lookup mechanisms basically put a text representation of the assigned value plus a type annotation attribute into the "underlying" XML tree. See https://lxml.de/objectify.html#how-data-types-are-matched and https:// github.com/lxml/lxml/blob/ac829d561c0bf71fb8cc704305ffc18bd26c6abb/src/lxml/ objectify.pyx#L491 for most there's to know about this. That said your options depend on how the parsing-from-XML and setting-objects- in-python-then-serialize-to-XML behaviour should "mirror" for your use case.
Simple imaginary code:
---snip--- from collections import namedtuple from lxml import etree, objectify
# A structured python type MyStructuredThing = namedtuple('MyStructuredThing', 'a b')
# some custom code and registration like with objectify.PyType here # ...
root = objectify.Element("root") # magically take python type and construct tree + leaf elements automagically root.mystructuredthing = MyStructuredThing(1, 2) etree.tostring(root, pretty_print=True)
root = objectify.Element('root') root.x = (1, 2, 3) print(etree.tostring(root)) b'<root xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http ://www.w3.org/2001/XMLSchema" py:pytype="TREE"><x py:pytype="int">1</x><x
E = objectify.E class MyElement(objectify.ObjectifiedElement): ... def __setattr__(self, name, value): ... if isinstance(value, Structured): ... value = E.structured(E.a(value.a), E.b(value.b)) ... objectify.ObjectifiedElement.__setattr__(self, name, value) ... root = MyElement('root') root.structured = Structured(a=1, b=2) print(etree.tostring(root)) b'<MyElement>root<structured xmlns:py="http://codespeak.net/lxml/objectify/
Since a namedtuple is still a tuple this would trigger special-cased sequence assignment (details here: https://github.com/lxml/lxml/blob/ ac829d561c0bf71fb8cc704305ffc18bd26c6abb/src/lxml/objectify.pyx#L474 ) py:pytype="int">2</x><x py:pytype="int">3</x></root> Of course, you could always override __setattr__ in a custom subclass and special-case your structured datatype: pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><a py:pytype="int">1</a><b py:pytype="int">2</b></structured></MyElement>' That won't get you creation of Structured objects when parsing - you'd need custom element class lookup for such stuff, and basically an Element class (distinct from your namedtuple) that represents your structured datatype. Note how parsing from XML doesn't give you built-in Python datatypes but objectified representatives that behave (very much) like the built-ins. (See Advanced element class lookup here: https://lxml.de/objectify.html#how-data-types-are-matched) I'd probably forgo all this and simply use the glorious E-Factory to create structured data in assignments where needed:
root = objectify.Element('root') root.structured = E._(E.a(1), E.b(2)) print(etree.tostring(root)) b'<root xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http:// www.w3.org/2001/XMLSchema" py:pytype="TREE"><structured><a py:pytype="int">1</ a><b py:pytype="int">2</b></structured></root>'
(You can even build a vocabulary if you wish, like some mini-DSL, see: https://lxml.de/tutorial.html#the-e-factory) I.e. create the structured ObjectifiedElement "from the outside", not implicitly in the assignment. Best, Holger