Passing xpath results to ElementMaker
data:image/s3,"s3://crabby-images/1b43e/1b43e272d8d0fad3d27533878e403927598eed48" alt=""
elem = etree.parse(io.StringIO('<root><node>text</node></root>')) E.b(elem.xpath('string(node)')) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/builder.py",
I'm writing an XML-transforming script with lxml, and I've run into two unexpected behaviors: First, ElementMaker doesn't accept the string results of xpath expressions: line 220, in __call__ raise TypeError("bad argument type: %r" % item) TypeError: bad argument type: 'text'
type(elem.xpath('string(node)')) <class 'lxml.etree._ElementUnicodeResult'>
etree.tostring(E.b(elem.xpath('node'))) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/builder.py",
This happens because https://github.com/lxml/lxml/blob/master/src/lxml/builder.py#L215 looks up the exact type of the argument in a dict, rather than doing something that can respect _ElementUnicodeResult's inheritance from str. Is this the right behavior for some reason I haven't thought of, or is it just an oversight? The workaround is straightforward—just pass the xpath result through str()—but it seems more verbose than should be necessary. Second, ElementMaker doesn't accept lists at all: line 220, in __call__ raise TypeError("bad argument type: %r" % item) TypeError: bad argument type: [<Element node at 0x106eb8d70>] Accepting lists would also be useful for building HTML documents as E.section("Header text", function_returning_element_sequence(), "Footer text"). The workarounds for this are straightforward too:
etree.tostring(E.b(elem.xpath('node')[0])) b'<b><node>text</node></b>' etree.tostring(E.b(*elem.xpath('node'))) b'<b/>'
... oops. Looks like ElementMaker re-parents nodes passed to it rather than copying them when they already have a parent. That's a third surprise, although it makes sense if ElementMaker was only intended for building entirely new documents, rather than copying bits from existing documents. This is with python-3.2.2, lxml-2.3.0, and libxml-2.7.8, Are these worth filing bugs/feature requests about? Thanks, Jeffrey
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Jeffrey Yasskin, 31.12.2011 10:26:
I'm writing an XML-transforming script with lxml, and I've run into two unexpected behaviors:
First, ElementMaker doesn't accept the string results of xpath expressions:
elem = etree.parse(io.StringIO('<root><node>text</node></root>')) E.b(elem.xpath('string(node)')) Traceback (most recent call last): File "<stdin>", line 1, in<module> File "/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/builder.py", line 220, in __call__ raise TypeError("bad argument type: %r" % item) TypeError: bad argument type: 'text' type(elem.xpath('string(node)')) <class 'lxml.etree._ElementUnicodeResult'>
This happens because https://github.com/lxml/lxml/blob/master/src/lxml/builder.py#L215 looks up the exact type of the argument in a dict, rather than doing something that can respect _ElementUnicodeResult's inheritance from str. Is this the right behavior for some reason I haven't thought of, or is it just an oversight? The workaround is straightforward—just pass the xpath result through str()—but it seems more verbose than should be necessary.
Agreed. Basically, testing for basestring/str subtypes after trying the type map should fix it. Feel free to post a pull request on github.
Second, ElementMaker doesn't accept lists at all:
etree.tostring(E.b(elem.xpath('node'))) Traceback (most recent call last): File "<stdin>", line 1, in<module> File "/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/builder.py", line 220, in __call__ raise TypeError("bad argument type: %r" % item) TypeError: bad argument type: [<Element node at 0x106eb8d70>]
Not really a bug, rather a feature request. I can see an interest in supporting this, and I think the intention is clear enough when passing in a list. However, note that when the elements in the list come from an XPath result as in your case, users likely want to deepcopy() them first. In that case, there will be a preprocessing step on the list anyway, which likely won't happen as part of the call expression. Silently accepting arbitrary lists may make it harder for unaware users to spot and track down this problem because it will likely lead to more complex calling expressions in their code. OTOH, the '*' work-around is simple enough to figure out for users and won't work around the copy-or-not decision either, so it's not much of a regression on that front to support lists directly... Doesn't look like an obvious decision to me.
Accepting lists would also be useful for building HTML documents as E.section("Header text", function_returning_element_sequence(), "Footer text").
Yes, the '*' unpacking work-around doesn't work when the elements in the list are to be followed by text:
def test(*args): pass test(1,2, *[1,2,3], 4) SyntaxError: only named arguments may follow *expression
The workarounds for this are straightforward too:
etree.tostring(E.b(elem.xpath('node')[0])) b'<b><node>text</node></b>' etree.tostring(E.b(*elem.xpath('node'))) b'<b/>'
... oops. Looks like ElementMaker re-parents nodes passed to it rather than copying them when they already have a parent. That's a third surprise, although it makes sense if ElementMaker was only intended for building entirely new documents, rather than copying bits from existing documents.
That's the primary use case, yes. Most elements are created and then put directly into the parent call, so deep-copying each element before inserting it into a new parent would just lead to a severe slow-down by copying subtrees over and over. If you want to reuse existing trees, you have to deepcopy() them explicitly (just like everywhere else in the API). I think that's simple enough to be acceptable. Potentially, lxml may be able to determine at insertion time if the element was freshly created or if it exists in a different tree. The builder implementation in lxml.objectify is already somewhat smart about document handling, and similar tricks may enable auto-deepcopying at some point. I'm not sure if this is safe, though, so a bit of investigation would be required. Also, a change here would break code that relies on the move operation. So I think this is unlikely to happen. Explicit is better than implicit here. Stefan
participants (2)
-
Jeffrey Yasskin
-
Stefan Behnel