lxml.objectify, enforcing a specific object tree from an XSD schema
Hello! I am trying to use lxml.objectify with an XSD schema in my code, and the behaviour I am getting does not correspond to what I was hoping for after reading the documentation. The relevant part of the documentation I was reading is the following: http://lxml.de/objectify.html#asserting-a-schema: When dealing with XML documents from different sources, you will often require them to follow a common schema. In lxml.objectify, this directly translates to enforcing a specific object tree, i.e. expected object attributes are ensured to be there and to have the expected type. This can easily be achieved through XML Schema validation at parse time. After running the following example code (with Python 3.4): from io import StringIO from lxml import etree from lxml import objectify f = StringIO('''\ <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="Root"> <xs:complexType> <xs:sequence> <xs:element name="Element" type="ElementType" minOccurs="0" maxOccurs="unbounded" /> </xs:sequence> </xs:complexType> </xs:element> <xs:complexType name="ElementType"> <xs:sequence> <xs:element name="a" type="xs:int" minOccurs="0" maxOccurs="1" default="0"/> <xs:element name="b" type="xs:string" minOccurs="0" maxOccurs="1" default=""/> </xs:sequence> <xs:attribute name="id" type="xs:int" use="required" /> </xs:complexType> </xs:schema> ''') schema = etree.XMLSchema(file=f, attribute_defaults=True) parser = objectify.makeparser(schema = schema, attribute_defaults=True) xml = '''\ <Root> <Element id="0"><a>100</a><b>test with both elements</b></Element> <Element id="1"><a/><b>test with default a element</b></Element> <Element id="2"><b>test with missing a element</b></Element> <Element id="3"><a>2</a><!-- test with missing b element --></Element> </Root>''' root = objectify.fromstring(xml, parser) print(objectify.dump(root)) the dump I get is the following: Root = None [ObjectifiedElement] Element = None [ObjectifiedElement] * id = '0' a = 100 [IntElement] b = 'test with both elements' [StringElement] Element = None [ObjectifiedElement] * id = '1' a = '' [StringElement] b = 'test with default a element' [StringElement] Element = None [ObjectifiedElement] * id = '2' b = 'test with missing a element' [StringElement] Element = None [ObjectifiedElement] * id = '3' a = 2 [IntElement] comment = '' [StringElement] What I was hoping for is the following: 1. I was hoping objectify would be enforcing the object tree specified in the schema for both elements and attributes (I realized after rereading the documentation paragraph above that it only promises to do so for attributes) 2. Objectify would be able to assign the default value specified in the schema for simple elements (which it does correctly for attributes) resulting in the following dump: Root = None [ObjectifiedElement] Element = None [ObjectifiedElement] * id = '0' a = 100 [IntElement] b = 'test with both elements' [StringElement] Element = None [ObjectifiedElement] * id = '1' a = 0 [IntElement] b = 'test with default a element' [StringElement] Element = None [ObjectifiedElement] * id = '2' a = 0 [IntElement] b = 'test with missing a element' [StringElement] Element = None [ObjectifiedElement] * id = '3' a = 2 [IntElement] b = '' [StringElement] Is there a way to achieve this with lxml.objectify? If not, could you point my to any other direction? Thank you very much for your help! Jonian
Hi,
> What I was hoping for is the following:
> 1. I was hoping objectify would be enforcing the object tree
> specified in the schema for both elements and attributes (I realized
> after rereading the documentation paragraph above that it only
> promises to do so for attributes)
According to http://xmlsoft.org/html/libxml-xmlschemas.html:
Enum xmlSchemaValidOption {
XML_SCHEMA_VAL_VC_I_CREATE = 1 : Default/fixed: create an attribute
node * or an element's text node on the instance. *
}
This is the option that's set by using attribute_defaults=True in lxml, so
in theory
it should work for both missing attributes and empty elements.
(Don't know if you need it in the XMLSchema constructor or the makeparser
function or both).
> 2. Objectify would be able to assign the default value
> specified in the schema for simple elements (which it does correctly
> for attributes)
Note that minOccurs=0 elements shall not be defaulted if missing according
to XML Schema rec,
see e.g.
https://lists.w3.org/Archives/Public/xmlschema-dev/2001Apr/0059.html
for some background.
I.e. missing elements are never created through default="...".
However in practice:
>>> schemadoc = etree.fromstring("""
... <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
... <xs:element name="root">
... <xs:complexType>
... <xs:sequence>
... <xs:element name="elem" minOccurs="0" maxOccurs="unbounded">
... <xs:complexType>
... <xs:sequence>
... <xs:element name="a" type="xs:int" minOccurs="0"
maxOccurs="1" default="0"/>
... </xs:sequence>
... </xs:complexType>
... </xs:element>
... </xs:sequence>
... </xs:complexType>
... </xs:element>
... </xs:schema>
... """)
>>>
>>> schema = etree.XMLSchema(schemadoc, attribute_defaults=True)
>>> parser = objectify.makeparser(schema=schema)
>>>
>>> xml = """
... <root>
... <elem></elem>
... <elem><a/></elem>
... <elem><a></a></elem>
... <elem><a>42</a></elem>
... </root>
... """
>>>
>>> root = objectify.fromstring(xml, parser=parser)
>>>
>>> print(objectify.dump(root))
root = None [ObjectifiedElement]
elem = u'' [StringElement]
elem = None [ObjectifiedElement]
a = u'' [StringElement]
elem = None [ObjectifiedElement]
a = u'' [StringElement]
elem = None [ObjectifiedElement]
a = 42 [IntElement]
>>>
I.e. a's text content is never defaulted to 0 contrary to what we expect.
Now, if I slightly modify the validating schema and add a default attribute
to
the root element but leave the instance document untouched:
>>> schemadoc = etree.fromstring("""
... <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
... <xs:element name="root">
... <xs:complexType>
... <xs:sequence>
... <xs:element name="elem" minOccurs="0" maxOccurs="unbounded">
... <xs:complexType>
... <xs:sequence>
... <xs:element name="a" type="xs:int" minOccurs="0"
maxOccurs="1" default="0"/>
... </xs:sequence>
... </xs:complexType>
... </xs:element>
... </xs:sequence>
... <xs:attribute name="elem_attr" default="default value"/>
... </xs:complexType>
... </xs:element>
... </xs:schema>
... """)
>>>
>>> schema = etree.XMLSchema(schemadoc, attribute_defaults=True)
>>> parser = objectify.makeparser(schema=schema)
>>>
>>> xml = """
... <root>
... <elem></elem>
... <elem><a/></elem>
... <elem><a></a></elem>
... <elem><a>42</a></elem>
... </root>
... """
>>>
>>>
>>> root = objectify.fromstring(xml, parser=parser)
>>>
>>> print(objectify.dump(root))
root = None [ObjectifiedElement]
* elem_attr = 'default value'
elem = u'' [StringElement]
elem = None [ObjectifiedElement]
a = 0 [IntElement]
elem = None [ObjectifiedElement]
a = 0 [IntElement]
elem = None [ObjectifiedElement]
a = 42 [IntElement]
>>>
Hm. Now we see both the root attribute get defaulted *and* the element text
content.
Strange - I suspect a bug in libxml2 (?).
But maybe
I'm using
LIBXML_COMPILED_VERSION: (2, 9, 1)
LIBXML_VERSION: (2, 9, 1)
LIBXSLT_COMPILED_VERSION: (1, 1, 28)
LIBXSLT_VERSION: (1, 1, 28)
LXML_VERSION: (3, 4, 1, 0)
on Python 2.7
Holger
Landesbank Baden-Wuerttemberg
Anstalt des oeffentlichen Rechts
Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz
HRA 12704
Amtsgericht Stuttgart
participants (2)
-
Grazhdani Jonian -
Holger Joukl