[lxml-dev] About objectify

Hello, I wanted to give a try to objectify, and while reading documentation, I've seen that what I'd like to do involves using a schema for my XML file, and the xsi namespace to specify element types. Since I do discover at the same time the objectify particularities and what is exactly a schema in details, I have some troubles to handle the whole thing, and I'm not sure I do understand well everything from documentation. The main problem is that if I do an error, for now I do not know if I should try correct my schema, my XML file or my python code. Does anyone have a simple example with a short XML file containing some elements, its schema, and the python code parsing the file using objectify with the xsi namespace to do type association ? Thanks, -- David.

Hi David,
Does anyone have a simple example with a short XML file containing some elements, its schema, and the python code parsing the file using objectify with the xsi namespace to do type association ?
I'm not quite sure I understand what you try to achieve, but lxml.objectify does not necessarily need a schema: $ cat simpleInstance.xml <root> <s>A string, hopefully.</s> </root> You can simply parse this using objecify: >>> from lxml import etree, objectify
root = objectify.parse("simpleInstance.xml").getroot() print objectify.dump(root) root = None [ObjectifiedElement] s = 'A string, hopefully.' [StringElement] print root.s A string, hopefully.
What you can do is validate this XML tree against a schema: $ cat simpleSchema.xsd <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="root" type="RootType"/> <xsd:complexType name="RootType"> <xsd:sequence> <xsd:element name="s" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:schema> >>> from lxml import etree, objectify
root = objectify.parse("simpleInstance.xml").getroot() schema = etree.XMLSchema(objectify.parse("simpleSchema.xsd")) schema.validate(root) True
No xsi:type information anywhere, so far.
What you currently can not do is use the schema to *add* xsi:type attributes to the XML instance. Or, to put it another way, schema-validation does not add any type-information. No problem, though, if the XML contains xsi:type information: >>> from lxml import etree, objectify
root = objectify.parse("simpleInstance2.xml").getroot() print objectify.dump(root) root = None [ObjectifiedElement] s = 'A string, hopefully.' [StringElement] * xsi:type = 'xsd:normalizedString' print root.s A string, hopefully. schema = etree.XMLSchema(objectify.parse("simpleSchema.xsd")) schema.validate(root) True
If xsi:type information is available, it will be used to determine the lxml.objectify type representation of an element: Consider >>> root = objectify.fromstring("<root><s>3</s></root>")
print objectify.dump(root) root = None [ObjectifiedElement] s = 3 [IntElement]
vs. >>> root = objectify.fromstring(""" ... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ... xmlns:xsd="http://www.w3.org/2001/XMLSchema"> ... <s xsi:type="xsd:string">3</s> ... </root>""")
print objectify.dump(root) root = None [ObjectifiedElement] s = '3' [StringElement] * xsi:type = 'xsd:string'
HTH,
Holger -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

Le mardi 22 janvier 2008 à 14:54 +0100, jholg@gmx.de a écrit :
Hi David,
Does anyone have a simple example with a short XML file containing some elements, its schema, and the python code parsing the file using objectify with the xsi namespace to do type association ?
I'm not quite sure I understand what you try to achieve,
but lxml.objectify does not necessarily need a schema:
Hello. Thanks for your answer. I should have written I'd like to define a complex type, as stated in the end of the following chapter. http://codespeak.net/lxml/objectify.html#how-data-types-are-matched In my case, writing a type checker would be impossible - or very difficult. I understand I can also use a py:pytype attribute, but I'd like to try the xsi namespace if possible. -- David.

Hi,
I should have written I'd like to define a complex type, as stated in the end of the following chapter.
http://codespeak.net/lxml/objectify.html#how-data-types-are-matched
Hm, it is not really a complex type (in XML Schema terms), but rather a custom simple data type.
In my case, writing a type checker would be impossible - or very difficult. I understand I can also use a py:pytype attribute, but I'd like to try the xsi namespace if possible.
Have you tried registering your custom type as an xmlSchemaType, as in: >>> my_strange_type.xmlSchemaTypes = ("myns:mytypename",) Not sure if this will work though, as lxml.objectify currently expects xsi:type information to contain xsd:<typename>hints, i.e. the xsi:type infos must come from the XML Schema namespace. At least the DataElement() factory will choke: >>> objectify.DataElement(3, _xsi="foo:bar") Traceback (most recent call last): File "<stdin>", line 1, in ? File "lxml.objectify.pyx", line 1708, in lxml.objectify.DataElement ValueError: XSD types require the XSD namespace
Maybe you can achieve what you need by taking a look at the "Using custom element classes in lxml" section? Good luck, H. -- GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail

Hello, Le mardi 22 janvier 2008 à 17:31 +0100, jholg@gmx.de a écrit :
Hi,
I should have written I'd like to define a complex type, as stated in the end of the following chapter.
http://codespeak.net/lxml/objectify.html#how-data-types-are-matched
Hm, it is not really a complex type (in XML Schema terms), but rather a custom simple data type.
In my case, writing a type checker would be impossible - or very difficult. I understand I can also use a py:pytype attribute, but I'd like to try the xsi namespace if possible.
Have you tried registering your custom type as an xmlSchemaType, as in:
my_strange_type.xmlSchemaTypes = ("myns:mytypename",)
All this was a bit fuzzy to me. I had the time to dig more, and here is a very simple file I wrote to proceed to some tests: ---%<------%<------%<------%<--- <?xml version="1.0" encoding="ISO-8859-15"?> <site xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:type='site'> <title xsi:type='title'>Achille 2.0</title> <value xsi:type='long'>2</value> </site> ---%<------%<------%<------%<--- And below the code I've written to parse it: ---%<------%<------%<------%<--- from lxml import etree from lxml import objectify class Configuration(objectify.ObjectifiedDataElement): pass class MyString(objectify.ObjectifiedDataElement): pass configuration_type = objectify.PyType('configuration', None, Configuration) string_type = objectify.PyType('MyString', None, MyString) configuration_type.xmlSchemaTypes = ('site',) string_type.xmlSchemaTypes = ('title',) configuration_type.register() string_type.register() lParser = etree.XMLParser(remove_blank_text=True) lLookup = objectify.ObjectifyElementClassLookup() lParser.setElementClassLookup(lLookup) lFile = open('test.xml', 'r') lTree = etree.parse(lFile, lParser) print objectify.dump(lTree.getroot()) ---%<------%<------%<------%<--- Here is the result: $ python ./try_objectify.py site = None [ObjectifiedElement] * xsi:type = 'site' title = Achille 2.0 [MyString] * xsi:type = 'title' value = 2L [LongElement] * xsi:type = 'long' So I managed to get some success, but here are some remaining questions. - Why is the root element still an ObjectifiedElement instance ? It seems to me I applied the same rules for both of my defined types. - Is there a way to specify the xsi:type in a schema sheet ? This question may sound stupid, but I'm still learning the XSD spec, and I wonder if objectify could rely entirely on the schema, without the need to add anything in the XML document itself.
Good luck,
Thanks for your attention, -- David Soulayrol <dsoulayrol@free.fr>

Hi,
Here is the result:
$ python ./try_objectify.py site = None [ObjectifiedElement] * xsi:type = 'site' title = Achille 2.0 [MyString] * xsi:type = 'title' value = 2L [LongElement] * xsi:type = 'long'
So I managed to get some success, but here are some remaining questions.
- Why is the root element still an ObjectifiedElement instance ? It seems to me I applied the same rules for both of my defined types.
Basically, when lxml parses an XML file/string, the underlying libxml2 is used to build a DOM-like XML-Tree, i.e. a C data structure. On element access, lxml creates a proxy object to represent the node in Python. After you´ve finished your proceedings with the node and delete your Python references to it, it is free to be garbage-collected. Now, objectify bases its element class lookup (i.e. which element class to use for the Python proxy representation) on certain rules: 1. if element has children => no data class 2. if element is defined as xsi:nil, return NoneElement class 3. check for Python type hint 4. check for XML Schema type hint 5. guess element class Therefore, the objectify class lookup will *always* choose ObjectifiedElement if an element has children ("structural element"), as opposed to a "data element". You can beat this behaviour by using custom element class lookup (with ObjectifyElementClassLookup as the fallback) based on attributes: $ cat lxml_attributeBasedLookup.py from lxml import etree, objectify class Configuration(objectify.ObjectifiedElement): pass class MyString(objectify.ObjectifiedDataElement): pass # maps attribute values to element classes xsitype_class_mapping = { "site": Configuration, "title": MyString, } lookup = etree.AttributeBasedElementClassLookup( "{http://www.w3.org/2001/XMLSchema-instance}type", xsitype_class_mapping, objectify.ObjectifyElementClassLookup()) parser = etree.XMLParser() parser.setElementClassLookup(lookup) objectify.setDefaultParser(parser) root = objectify.fromstring(""" <site xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:type='site'> <title xsi:type='title'>Achille 2.0</title> <value xsi:type='long'>2</value> </site> """) print objectify.dump(root) ###################### $ python2.4 lxml_attributeBasedLookup.py site = None [Configuration] * xsi:type = 'site' title = Achille 2.0 [MyString] * xsi:type = 'title' value = 2L [LongElement] * xsi:type = 'long'
- Is there a way to specify the xsi:type in a schema sheet ? This question may sound stupid, but I'm still learning the XSD spec, and I wonder if objectify could rely entirely on the schema, without the need to add anything in the XML document itself. You can define custom types in XML Schema, probably the best is to look at the
XML Schema Primer first, or the excellent tutorials of a certain Roger Costello (I think the site is xfront.com) Currently, I think lxml.objectify restricts itself to supporting the "xsd" types as in http://www.w3.org/TR/xmlschema-2/ with regard to xsi:type values, e.g. forcing them to come from the schema namespace. You might be able to achieve what you need with what I've shown above, beating the objectify lookup in lookup order. For now, there is nothing like a "typifier" that takes an instance and a schema and adds type information from the schema to the instance document. Cheers, Holger -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

Hi, jholg@gmx.de wrote:
- Is there a way to specify the xsi:type in a schema sheet ? This question may sound stupid, but I'm still learning the XSD spec, and I wonder if objectify could rely entirely on the schema, without the need to add anything in the XML document itself. For now, there is nothing like a "typifier" that takes an instance and a schema and adds type information from the schema to the instance document.
Someone should file a "wishlist" bug report on this, as it has been requested a couple of times. There would be ways to approach this. One is the "generateDS" tool by Dave Kuhlman: http://www.rexx.com/~dkuhlman/generateDS.html we could try to extract the parser (or reimplement it with lxml) and run an annotation instead of the class generation step. We could also look a bit deeper into the internal handling of XML schema in libxml2, there could be something to start from. Stefan

Le mercredi 23 janvier 2008 à 08:45 +0100, jholg@gmx.de a écrit :
Hi,
- Why is the root element still an ObjectifiedElement instance ? It seems to me I applied the same rules for both of my defined types.
Now, objectify bases its element class lookup (i.e. which element class to use for the Python proxy representation) on certain rules:
1. if element has children => no data class 2. if element is defined as xsi:nil, return NoneElement class 3. check for Python type hint 4. check for XML Schema type hint 5. guess element class
Yes, it is in documentation which I should have read with more attention. Sorry for that.
# maps attribute values to element classes xsitype_class_mapping = { "site": Configuration, "title": MyString, }
lookup = etree.AttributeBasedElementClassLookup( "{http://www.w3.org/2001/XMLSchema-instance}type", xsitype_class_mapping, objectify.ObjectifyElementClassLookup())
parser = etree.XMLParser() parser.setElementClassLookup(lookup) objectify.setDefaultParser(parser)
root = objectify.fromstring(""" <site xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:type='site'>
<title xsi:type='title'>Achille 2.0</title> <value xsi:type='long'>2</value> </site> """)
print objectify.dump(root)
This makes sense now.
- Is there a way to specify the xsi:type in a schema sheet ? This question may sound stupid, but I'm still learning the XSD spec, and I wonder if objectify could rely entirely on the schema, without the need to add anything in the XML document itself.
For now, there is nothing like a "typifier" that takes an instance and a schema and adds type information from the schema to the instance document.
Thanks for all the help. -- David
participants (3)
-
David Soulayrol
-
jholg@gmx.de
-
Stefan Behnel