[lxml-dev] Force element content to be string

Hi all! I have a bit of XML which I want to parse via lxml.objectify. from lxml import objectify node = objectify.fromstring(''' <Item> <ASIN>0747532745</ASIN> <ItemAttributes> <Author>Joanne K. Rowling</Author> <Manufacturer>Bloomsbury Publishing</Manufacturer> <ProductGroup>Book</ProductGroup> <Title>Harry Potter and the Philosopher's Stone</Title> </ItemAttributes> </Item> ''') I have the following problem: node.ASIN is evaluated to integer value 747532745 but should be a string ('0747532745'). There is no way for me to influence the incoming XML, so any py:pytype magic or adding a schema is out of the question. Is there a way to ensure that ASIN elements are always evaluated to a string? Cheers, Seb. -- Sebastian Rahlf basti@redtoad.de

Sebastian Rahlf, 31.05.2010 14:51:
I have a bit of XML which I want to parse via lxml.objectify.
from lxml import objectify node = objectify.fromstring(''' <Item> <ASIN>0747532745</ASIN> <ItemAttributes> <Author>Joanne K. Rowling</Author> <Manufacturer>Bloomsbury Publishing</Manufacturer> <ProductGroup>Book</ProductGroup> <Title>Harry Potter and the Philosopher's Stone</Title> </ItemAttributes> </Item> ''')
I have the following problem: node.ASIN is evaluated to integer value 747532745 but should be a string ('0747532745').
There is no way for me to influence the incoming XML, so any py:pytype magic or adding a schema is out of the question. Is there a way to ensure that ASIN elements are always evaluated to a string?
Just add the type attribute after parsing and remove it before serialisation. Alternatively, you can register your own Element type for the ASIN tag. There should be something about that in the objectify docs. Stefan

I have a bit of XML which I want to parse via lxml.objectify.
from lxml import objectify node = objectify.fromstring(''' <Item> <ASIN>0747532745</ASIN> <ItemAttributes> <Author>Joanne K. Rowling</Author> <Manufacturer>Bloomsbury Publishing</Manufacturer> <ProductGroup>Book</ProductGroup> <Title>Harry Potter and the Philosopher's Stone</Title> </ItemAttributes> </Item> ''')
I have the following problem: node.ASIN is evaluated to integer value 747532745 but should be a string ('0747532745').
There is no way for me to influence the incoming XML, so any py:pytype magic or adding a schema is out of the question. Is there a way to ensure that ASIN elements are always evaluated to a string?
Just add the type attribute after parsing and remove it before serialisation.
Alternatively, you can register your own Element type for the ASIN tag. There should be something about that in the objectify docs.
Stefan
Thanks for the tip. I wrote my own class lookup from lxml import etree class MyLookup(etree.CustomElementClassLookup): def lookup(self, node_type, document, namespace, name): if name == 'ASIN': return objectify.StringElement lookup = MyLookup() parser = etree.XMLParser() parser.set_element_class_lookup(lookup) node = objectify.fromstring(xml, parser) which now returns the right element type. node = objectify.fromstring(xml, parser) objectify.annotate(node) print objectify.dump(node) Item = None [_Element] * py:pytype = 'str' ASIN = '0747532745' [StringElement] * py:pytype = 'int' DetailPageURL = 'http://www.amazon.de/Harry-...' [_Element] * py:pytype = 'str' ItemAttributes = None [_Element] * py:pytype = 'str' Author = 'Joanne K. Rowling' [_Element] * py:pytype = 'str' Manufacturer = 'Bloomsbury Publishing' [_Element] * py:pytype = 'str' ProductGroup = 'Book' [_Element] * py:pytype = 'str' Title = "Harry Potter and the Philosopher's Stone" [_Element] * py:pytype = 'str' How do I make it fall back to objectify.ObjectifyElementClassLookup? Seb.

I have a bit of XML which I want to parse via lxml.objectify.
from lxml import objectify node = objectify.fromstring(''' <Item> <ASIN>0747532745</ASIN> <ItemAttributes> <Author>Joanne K. Rowling</Author> <Manufacturer>Bloomsbury Publishing</Manufacturer> <ProductGroup>Book</ProductGroup> <Title>Harry Potter and the Philosopher's Stone</Title> </ItemAttributes> </Item> ''')
I have the following problem: node.ASIN is evaluated to integer value 747532745 but should be a string ('0747532745').
There is no way for me to influence the incoming XML, so any py:pytype magic or adding a schema is out of the question. Is there a way to ensure that ASIN elements are always evaluated to a string?
Just add the type attribute after parsing and remove it before serialisation.
Alternatively, you can register your own Element type for the ASIN tag. There should be something about that in the objectify docs.
Stefan
Thanks for the tip. I wrote my own class lookup
from lxml import etree class MyLookup(etree.CustomElementClassLookup): def lookup(self, node_type, document, namespace, name): if name == 'ASIN': return objectify.StringElement
lookup = MyLookup() parser = etree.XMLParser() parser.set_element_class_lookup(lookup) node = objectify.fromstring(xml, parser)
which now returns the right element type.
node = objectify.fromstring(xml, parser) objectify.annotate(node) print objectify.dump(node)
Item = None [_Element] * py:pytype = 'str' ASIN = '0747532745' [StringElement] * py:pytype = 'int' DetailPageURL = 'http://www.amazon.de/Harry-...' [_Element] * py:pytype = 'str' ItemAttributes = None [_Element] * py:pytype = 'str' Author = 'Joanne K. Rowling' [_Element] * py:pytype = 'str' Manufacturer = 'Bloomsbury Publishing' [_Element] * py:pytype = 'str' ProductGroup = 'Book' [_Element] * py:pytype = 'str' Title = "Harry Potter and the Philosopher's Stone" [_Element] * py:pytype = 'str'
How do I make it fall back to objectify.ObjectifyElementClassLookup?
To answer my own question: You can set a fallbacḱ lookup lookup.set_fallback(objectify.ObjectifyElementClassLookup()) As easy as pie! Seb. -- Sebastian Rahlf <basti@redtoad.de>
participants (2)
-
Sebastian Rahlf
-
Stefan Behnel