[lxml-dev] Force element content to be string

Hi all! I have a bit of XML which I want to parse via lxml.objectify. from lxml import objectify node = objectify.fromstring(''' <Item> <ASIN>0747532745</ASIN> <ItemAttributes> <Author>Joanne K. Rowling</Author> <Manufacturer>Bloomsbury Publishing</Manufacturer> <ProductGroup>Book</ProductGroup> <Title>Harry Potter and the Philosopher's Stone</Title> </ItemAttributes> </Item> ''') I have the following problem: node.ASIN is evaluated to integer value 747532745 but should be a string ('0747532745'). There is no way for me to influence the incoming XML, so any py:pytype magic or adding a schema is out of the question. Is there a way to ensure that ASIN elements are always evaluated to a string? Cheers, Seb. -- Sebastian Rahlf basti@redtoad.de

Thanks for the tip. I wrote my own class lookup from lxml import etree class MyLookup(etree.CustomElementClassLookup): def lookup(self, node_type, document, namespace, name): if name == 'ASIN': return objectify.StringElement lookup = MyLookup() parser = etree.XMLParser() parser.set_element_class_lookup(lookup) node = objectify.fromstring(xml, parser) which now returns the right element type. node = objectify.fromstring(xml, parser) objectify.annotate(node) print objectify.dump(node) Item = None [_Element] * py:pytype = 'str' ASIN = '0747532745' [StringElement] * py:pytype = 'int' DetailPageURL = 'http://www.amazon.de/Harry-...' [_Element] * py:pytype = 'str' ItemAttributes = None [_Element] * py:pytype = 'str' Author = 'Joanne K. Rowling' [_Element] * py:pytype = 'str' Manufacturer = 'Bloomsbury Publishing' [_Element] * py:pytype = 'str' ProductGroup = 'Book' [_Element] * py:pytype = 'str' Title = "Harry Potter and the Philosopher's Stone" [_Element] * py:pytype = 'str' How do I make it fall back to objectify.ObjectifyElementClassLookup? Seb.

Thanks for the tip. I wrote my own class lookup from lxml import etree class MyLookup(etree.CustomElementClassLookup): def lookup(self, node_type, document, namespace, name): if name == 'ASIN': return objectify.StringElement lookup = MyLookup() parser = etree.XMLParser() parser.set_element_class_lookup(lookup) node = objectify.fromstring(xml, parser) which now returns the right element type. node = objectify.fromstring(xml, parser) objectify.annotate(node) print objectify.dump(node) Item = None [_Element] * py:pytype = 'str' ASIN = '0747532745' [StringElement] * py:pytype = 'int' DetailPageURL = 'http://www.amazon.de/Harry-...' [_Element] * py:pytype = 'str' ItemAttributes = None [_Element] * py:pytype = 'str' Author = 'Joanne K. Rowling' [_Element] * py:pytype = 'str' Manufacturer = 'Bloomsbury Publishing' [_Element] * py:pytype = 'str' ProductGroup = 'Book' [_Element] * py:pytype = 'str' Title = "Harry Potter and the Philosopher's Stone" [_Element] * py:pytype = 'str' How do I make it fall back to objectify.ObjectifyElementClassLookup? Seb.
participants (2)
-
Sebastian Rahlf
-
Stefan Behnel