I'm writing an XML-transforming script with lxml, and I've run into
two unexpected behaviors:
First, ElementMaker doesn't accept the string results of xpath expressions:
>>> elem = etree.parse(io.StringIO('<root><node>text</node></root>'))
>>> E.b(elem.xpath('string(node)'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/builder.py",
line 220, in __call__
raise TypeError("bad argument type: %r" % item)
TypeError: bad argument type: 'text'
>>> type(elem.xpath('string(node)'))
<class 'lxml.etree._ElementUnicodeResult'>
This happens because
https://github.com/lxml/lxml/blob/master/src/lxml/builder.py#L215
looks up the exact type of the argument in a dict, rather than doing
something that can respect _ElementUnicodeResult's inheritance from
str. Is this the right behavior for some reason I haven't thought of,
or is it just an oversight? The workaround is straightforward—just
pass the xpath result through str()—but it seems more verbose than
should be necessary.
Second, ElementMaker doesn't accept lists at all:
>>> etree.tostring(E.b(elem.xpath('node')))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/builder.py",
line 220, in __call__
raise TypeError("bad argument type: %r" % item)
TypeError: bad argument type: [<Element node at 0x106eb8d70>]
Accepting lists would also be useful for building HTML documents as
E.section("Header text", function_returning_element_sequence(),
"Footer text"). The workarounds for this are straightforward too:
>>> etree.tostring(E.b(elem.xpath('node')[0]))
b'<b><node>text</node></b>'
>>> etree.tostring(E.b(*elem.xpath('node')))
b'<b/>'
... oops. Looks like ElementMaker re-parents nodes passed to it rather
than copying them when they already have a parent. That's a third
surprise, although it makes sense if ElementMaker was only intended
for building entirely new documents, rather than copying bits from
existing documents.
This is with python-3.2.2, lxml-2.3.0, and libxml-2.7.8, Are these
worth filing bugs/feature requests about?
Thanks,
Jeffrey