Hi, Grazhdani Jonian schrieb am 26.12.2017 um 19:46:
We have been using lxml.objectify for a while for our xml work in python and I really appreciate its neat way of converting xml Elements to python objects and vice-versa. That worked great until we needed to add support for an xml-based standard which extensively uses hyphens in its element names (Specifically the MDX from ASAM https://www.asam.net/standards/detail/mdx/wiki/ ). The resulting lxml.objectify objects have a lot of attributes with hyphens which makes their use in python very cumbersome, since using hyphens in variable names in forbidden in python and results in a syntax error.
Googling pointed me to use getattr() / setattr() instead of the dot notation, which I have done, also in combination with xpath expressions to avoid ridiculous getattr() chains. It does do the job but the resulting code is a nightmare to work with and maintain (See example code as post scriptum below).
I kindly ask those of you who have extensive experience with hyphened elements to provide me some guidance on what would be the best way to proceed from here. The alternatives I see in front of me are:
1 Convert all element names to use another character (e.g. underscore) instead of hyphen before objectify parsing, and then converting back before exporting.
2 Continue with getattr() / setattr()
3 give up objectify Obviously (and unfortunately) changing the standard is not an option. If you have any other approaches not listed above I would be very happy if you could share them with me!
I am seriously considering giving alternative 1 a try and figured out I would need to convert both the schema and the xml to make objectify work. Any tips on doing that in a smart way would be also welcome.
tel: +46 (0)8 553 53427 e-mail: jonian.grazhdani@scania.com<mailto:jonian.grazhdani@scania.com>
P.S. Example code with two implementations:
NAMESPACE = "http://www.asam.net/schema/MDX/r1.3" E = objectify.ElementMaker(annotate=False, namespace=NAMESPACE, nsmap={ None: NAMESPACE, 'xsd': 'http://www.w3.org/2001/XMLSchema', 'xsi': 'http://www.w3.org/2001/XMLSchema-instance'})
# Implementation 1: dot notation which results in a syntax error self.sw_feature = E.SW-FEATURE( E.SHORT-NAME(self.short_name), E.CATEGORY("FCT"), E.SW-FEATURE-OWNED-ELEMENT-SETS( E.SW-FEATURE-OWNED-ELEMENT-SET( E.SW-FEATURE-ELEMENTS() ), E.SW-FEATURE-INTERFACES(), ID = self.id )
I recommend defining a separate module that holds all the tag names, see: http://lxml.de/objectify.html#tree-generation-with-the-e-factory And as an example of such a module: https://github.com/lxml/lxml/blob/master/src/lxml/html/builder.py That also prevents typos when using the E-factory, which would otherwise be copied into the generated XML unchecked.
if len(self.owned_elements['SW-VARIABLE-REFS']) > 0: self.sw_feature.SW-FEATURE-OWNED-ELEMENT-SETS.SW-FEATURE-OWNED-ELEMENT-SET.SW-FEATURE-ELEMENTS.SW-VARIABLE-REFS = E.SW-VARIABLE-REFS() self.sw_feature.SW-FEATURE-OWNED-ELEMENT-SETS.SW-FEATURE-OWNED-ELEMENT-SET.SW-FEATURE-ELEMENTS.SW-VARIABLE-REFS = self.owned_elements['SW-VARIABLE-REFS']
# Implementation 2: getattr / setattr with the occasional xpath self.sw_feature = getattr(E,'SW-FEATURE')( getattr(E,'SHORT-NAME')(self.short_name), E.CATEGORY("FCT"), getattr(E,'SW-FEATURE-OWNED-ELEMENT-SETS')( getattr(E,'SW-FEATURE-OWNED-ELEMENT-SET')( getattr(E,'SW-FEATURE-ELEMENTS')() ) ), getattr(E,'SW-FEATURE-INTERFACES')(), ID = self.id )
mdx_sw_feature_elements = self.sw_feature.find("./{ns}SW-FEATURE-OWNED-ELEMENT-SETS/{ns}SW-FEATURE-OWNED-ELEMENT-SET/{ns}SW-FEATURE-ELEMENTS".format(ns="{" + NAMESPACE + "}")) if self.owned_elements['SW-VARIABLE-REFS']: setattr(mdx_sw_feature_elements, 'SW-VARIABLE-REFS', getattr(E, 'SW-VARIABLE-REFS')()) mdx_sw_refs = getattr(mdx_sw_feature_elements, 'SW-VARIABLE-REFS') #set the objects children (e.g. SW-VARIABLE-REFS.SW-VARIABLE-REF) setattr('SW-VARIABLE-REFS', 'SW-VARIABLE-REF', self.owned_elements['SW-VARIABLE-REFS'])
When processing elements, the item access syntax (e.g. "root['sub-tag-name')") seems the best way to do it with objectify. For longer path expressions, you can also use predefined ObjectPath or XPath objects: http://lxml.de/objectify.html#objectpath http://lxml.de/xpathxslt.html#the-xpath-class Give them a proper function-like name that makes it clear what they do, and that will help you avoid spelling them out in lengthy lines in your code. Stefan