Hi, the mailing list would have been the place to ask this. Kevin D Smith, 25.12.2009 00:01:
I'm trying to use lxml to parse a document, apply CSS styles to the nodes, then walk through the document to render it to another format. The problem is that different ways of accessing the document return different instances of the nodes. I really need to work on the same instance no matter what way I access them.
doc.find('body') returns <Element body at 101da2ef0>
CSSSelector('body')(doc) returns <Element body at 101da2ef0>
for action, elem in etree.iterwalk(doc, events=('start',)): if elem.tag == 'body': return elem
returns <Element body at 101da2b48>
Using saxify(doc, handler) and processing styles in the event handlers gives the following:
CSSSelector('html, address, blockquote, body, dd, div, dl, dt, fieldset, form, frame, frameset, h1, h2, h3, h4, h5, h6, noframes, ol, p, ul, center, dir, hr, menu, pre')(self.doc) returns <Element body at 101da2ce8> {u'display': 'block'}
CSSSelector('body') returns <Element body at 101da2ae0>
As you can see, within the context of the iterwalk and saxify, the node instances aren't the same as returned by doc.find and CSSSelector outside of iterwalk and saxify.
It's not because you are using different ways to get to the element, it's because you are throwing away the reference in between. This might help: http://codespeak.net/lxml/element_classes.html#background-on-element-proxies
Is there a way to guarantee that all of these methods will use the same nodes?
You didn't write /why/ you consider this a requirement - it likely isn't one. But if you really need this, you can cache the element instances like this: cache = list(root_element.iter()) # do stuff with the elements in the tree del cache As long as you don't add elements to the tree, this will ensure that you always get the same instances back. If you add elements, just add them to the cache as well. Stefan