Namespaced attributes losing their namespace
In the below test script, I'm not getting the result I expect. Is this a bug in lxml? If not, what am I doing wrong? I'm using lxml 3.1.1 on win32, downloaded from PyPi. What I get: <root xmlns="url1"> <test attrib="value"/> </root> I expect attrib to not lose its namespace, as shown when I re-parse the output:
lxml.etree.parse('test.xml').getroot()[0].attrib {'attrib': 'value'}
If I swap the append and the attribute setting, things work: <test xmlns:ns0="url1" ns0:attrib="value"/>
lxml.etree.parse('test.xml').getroot()[0].attrib {'{url1}attrib': 'value'}
If this is intended behaviour, I'll have to see how to fit that into my design, since my functions return elements ready for appending. from lxml import etree nsmap = {None: 'url1'} root = etree.Element('{url1}root', nsmap=nsmap) test = etree.Element('{url1}test') test.attrib['{url1}attrib'] = 'value' root.append(test) print etree.tostring(root, encoding='utf-8', pretty_print=True)
Tyler Spivey, 31.03.2013 03:14:
In the below test script, I'm not getting the result I expect. Is this a bug in lxml? If not, what am I doing wrong? I'm using lxml 3.1.1 on win32, downloaded from PyPi.
What I get:
<root xmlns="url1"> <test attrib="value"/> </root>
I expect attrib to not lose its namespace, as shown when I re-parse the output:
lxml.etree.parse('test.xml').getroot()[0].attrib {'attrib': 'value'}
If I swap the append and the attribute setting, things work: <test xmlns:ns0="url1" ns0:attrib="value"/>
lxml.etree.parse('test.xml').getroot()[0].attrib {'{url1}attrib': 'value'}
If this is intended behaviour, I'll have to see how to fit that into my design, since my functions return elements ready for appending.
from lxml import etree
nsmap = {None: 'url1'} root = etree.Element('{url1}root', nsmap=nsmap) test = etree.Element('{url1}test') test.attrib['{url1}attrib'] = 'value' root.append(test) print etree.tostring(root, encoding='utf-8', pretty_print=True)
Your assumption about the default namespace is incorrect here. http://www.w3.org/TR/REC-xml-names/#defaulting Basically, attributes in the default namespace are badly defined. Don't use them. Instead, use an explicit namespace prefix for them, either by defining the namespace URI twice or by not using the default namespace at all. Stefan
On 3/31/2013 9:29 AM, Stefan Behnel wrote:
Your assumption about the default namespace is incorrect here.
http://www.w3.org/TR/REC-xml-names/#defaulting
Basically, attributes in the default namespace are badly defined. Don't use them. Instead, use an explicit namespace prefix for them, either by defining the namespace URI twice or by not using the default namespace at all.
I came up with a better example. It looks like I'm not understanding something, because this script should now return the attribute with its correct namespace because I've added a prefix for opf to metadata. I think I should get this output: <package xmlns="http://www.idpf.org/2007/opf"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf"> <dc:creator opf:role="aut">author name</dc:creator> </metadata> </package> But I simply get: <package xmlns="http://www.idpf.org/2007/opf"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:creator role="aut">author name</dc:creator> </metadata> </package> from lxml import etree OPF_NS = 'http://www.idpf.org/2007/opf' DC_NS = 'http://purl.org/dc/elements/1.1/' def generate_creator(name): creator = etree.Element('{%s}creator' % DC_NS) creator.attrib['{%s}role' % OPF_NS] = 'aut' creator.text = name return creator nsmap = {None: OPF_NS} #The root element's default namespace will be OPF. package = etree.Element('{%s}package' % OPF_NS, nsmap=nsmap) #opf should be an additional namespace prefix for OPF_NS here, in the metadata element metadata = etree.Element('{%s}metadata' % OPF_NS, nsmap={'opf': OPF_NS, 'dc': DC_NS}) creator = generate_creator("author name") metadata.append(creator) package.append(metadata) print etree.tostring(package, encoding='utf-8', pretty_print=True)
Stefan
_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
Tyler Spivey, 31.03.2013 23:29:
On 3/31/2013 9:29 AM, Stefan Behnel wrote:
Your assumption about the default namespace is incorrect here.
http://www.w3.org/TR/REC-xml-names/#defaulting
Basically, attributes in the default namespace are badly defined. Don't use them. Instead, use an explicit namespace prefix for them, either by defining the namespace URI twice or by not using the default namespace at all.
I came up with a better example. It looks like I'm not understanding something, because this script should now return the attribute with its correct namespace because I've added a prefix for opf to metadata. I think I should get this output:
<package xmlns="http://www.idpf.org/2007/opf"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf"> <dc:creator opf:role="aut">author name</dc:creator> </metadata> </package>
But I simply get: <package xmlns="http://www.idpf.org/2007/opf"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:creator role="aut">author name</dc:creator> </metadata> </package>
from lxml import etree OPF_NS = 'http://www.idpf.org/2007/opf' DC_NS = 'http://purl.org/dc/elements/1.1/'
def generate_creator(name): creator = etree.Element('{%s}creator' % DC_NS) creator.attrib['{%s}role' % OPF_NS] = 'aut' creator.text = name return creator
nsmap = {None: OPF_NS} #The root element's default namespace will be OPF. package = etree.Element('{%s}package' % OPF_NS, nsmap=nsmap) #opf should be an additional namespace prefix for OPF_NS here, in the metadata element metadata = etree.Element('{%s}metadata' % OPF_NS, nsmap={'opf': OPF_NS, 'dc': DC_NS}) creator = generate_creator("author name") metadata.append(creator) package.append(metadata) print etree.tostring(package, encoding='utf-8', pretty_print=True)
Ok, let's follow you example step by step.
package = etree.Element('{%s}package' % OPF_NS, nsmap=nsmap) print etree.tostring(package, encoding='utf-8', pretty_print=True) <package xmlns="http://www.idpf.org/2007/opf"/>
metadata = etree.Element('{%s}metadata' % OPF_NS, nsmap={'opf': ... OPF_NS, 'dc': DC_NS}) print etree.tostring(metadata, encoding='utf-8', pretty_print=True) <opf:metadata xmlns:opf="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/"/>
creator = generate_creator("author name") print etree.tostring(creator, encoding='utf-8', pretty_print=True) <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ns0="http://www.idpf.org/2007/opf" ns0:role="aut">author name</dc:creator>
Ok so far. Now, when you start merging XML trees (note that all three trees are completely independent documents up to this point), lxml will start doing automatic namespace declaration cleanup on them, which means that it tries to delete redundant declarations. In the next step, this simply means that both namespace declaration on the "creator" tag go away, because the declarations on the "metadata" tag can take over:
metadata.append(creator) print etree.tostring(metadata, encoding='utf-8', pretty_print=True) <opf:metadata xmlns:opf="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:creator opf:role="aut">author name</dc:creator> </opf:metadata>
Again, all beautiful up to this point. Now, in the last step of your tree merge, it's actually the default namespace declaration of the "package" tag that takes over:
package.append(metadata) print etree.tostring(package, encoding='utf-8', pretty_print=True) <package xmlns="http://www.idpf.org/2007/opf"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:creator role="aut">author name</dc:creator> </metadata> </package>
print etree.tostring(metadata, encoding='utf-8', pretty_print=True) <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns="http://www.idpf.org/2007/opf"> <dc:creator role="aut">author name</dc:creator> </metadata>
There is a bit of code in lxml's namespace cleanup algorithm that tries to prefer prefixed namespace declarations over the default namespace when looking up and creating namespace prefixes, for exactly the reason of attribute namespaces, but that doesn't apply here because we're in the namespace cleanup phase and the declarations already exist. The code for the cleanup is in proxy.pxi, starting in function moveNodeToDocument() and especially in _stripRedundantNamespaceDeclarations(). It might be possible to improve the situation by special casing the default namespace also in _stripRedundantNamespaceDeclarations(). Basically, it shouldn't consider a prefixed namespace declaration redundant with a default namespace declaration. Currently, it doesn't care if there's a prefix or not. I can give it a try, if you can write test cases for it. Or you can give it a go yourself. The relevant test code is in test_etree.py, and the new tests should go below the test method "test_namespace_cleanup()". Note that there isn't currently a cleanup test for attribute namespaces, so please write some more, as needed. Note that the test code needs to run from Py2.4 to Py3.4, inclusively (don't care about 3.0). The relevant Cython code translates to plain C anyway, so it doesn't have portability issues by itself. There's a travis instance that tests at least 2.6-3.3, and a separate Jenkins instance that I use to test 2.4-3.4, so no problem if you fail to write portable code in the first try. Stefan
namespace also in _stripRedundantNamespaceDeclarations(). Basically, it shouldn't consider a prefixed namespace declaration redundant with a default namespace declaration. Currently, it doesn't care if there's a prefix or not.
I can give it a try, if you can write test cases for it. Or you can give it a go yourself. The relevant test code is in test_etree.py, and the new tests should go below the test method "test_namespace_cleanup()". Note that there isn't currently a cleanup test for attribute namespaces, so please write some more, as needed.
Note that the test code needs to run from Py2.4 to Py3.4, inclusively (don't care about 3.0). The relevant Cython code translates to plain C anyway, so it doesn't have portability issues by itself. There's a travis instance that tests at least 2.6-3.3, and a separate Jenkins instance
On 2013/03/31 23:50, Stefan Behnel wrote:> It might be possible to improve the situation by special casing the default that
I use to test 2.4-3.4, so no problem if you fail to write portable code in the first try.
I'm not skilled enough to edit the cython source yet, but I tried to come up with a test. So far, I've came up with this, simplified to one sub-element. I'm not sure where the test prefix should go, since it's only used on sub. I put it in the root - if I append another element, say sub2, I wouldn't want xmlns:test serialized twice. diff -r 93eb49ee9e76 src/lxml/tests/test_etree.py --- a/src/lxml/tests/test_etree.py Mon Apr 01 15:50:43 2013 +0200 +++ b/src/lxml/tests/test_etree.py Wed Apr 03 09:27:19 2013 -0700 @@ -2165,6 +2165,17 @@ _bytes('<foo xmlns="F"><bar xmlns:ns="NS" xmlns="B"><ns:baz/></bar></foo>'), self.etree.tostring(root)) + def test_append_attribute_namespace(self): + etree = self.etree + + root = etree.Element('{http://test/ns}root', nsmap={None: 'http://test/ns'}) + sub = etree.Element('{http://test/ns}sub', nsmap={'test': 'http://test/ns'}) + sub.attrib['{http://test/ns}attr'] = 'value' + root.append(sub) + self.assertEqual( + _bytes('<root xmlns="http://test/ns" xmlns:test="http://test/ns"><sub test:attr="value"/></root>'), + etree.tostring(root)) + def test_element_nsmap(self): etree = self.etree
Stefan
Tyler Spivey, 03.04.2013 18:36:
On 2013/03/31 23:50, Stefan Behnel wrote:
It might be possible to improve the situation by special casing the default namespace also in _stripRedundantNamespaceDeclarations(). Basically, it shouldn't consider a prefixed namespace declaration redundant with a default namespace declaration. Currently, it doesn't care if there's a prefix or not.
I can give it a try, if you can write test cases for it. Or you can give it a go yourself. The relevant test code is in test_etree.py, and the new tests should go below the test method "test_namespace_cleanup()". Note that there isn't currently a cleanup test for attribute namespaces, so please write some more, as needed.
Note that the test code needs to run from Py2.4 to Py3.4, inclusively (don't care about 3.0). The relevant Cython code translates to plain C anyway, so it doesn't have portability issues by itself. There's a travis instance that tests at least 2.6-3.3, and a separate Jenkins instance that I use to test 2.4-3.4, so no problem if you fail to write portable code in the first try.
I'm not skilled enough to edit the cython source yet
Think of it as Python code. That generally works quite well.
but I tried to come up with a test. So far, I've came up with this, simplified to one sub-element. I'm not sure where the test prefix should go, since it's only used on sub. I put it in the root - if I append another element, say sub2, I wouldn't want xmlns:test serialized twice.
That won't work if an intermediate element already defines that prefix with a different namespace, though. The only safe way is to leave it where it was originally defined. However, it would be an improvement to the namespace cleanup code if namespace declarations were moved upwards in the tree. Maybe only when they are found in more than one subtree - moving declarations unconditionally may break stupid processors. The user might have had a reason to put them there explicitly.
diff -r 93eb49ee9e76 src/lxml/tests/test_etree.py --- a/src/lxml/tests/test_etree.py Mon Apr 01 15:50:43 2013 +0200 +++ b/src/lxml/tests/test_etree.py Wed Apr 03 09:27:19 2013 -0700 @@ -2165,6 +2165,17 @@ _bytes('<foo xmlns="F"><bar xmlns:ns="NS" xmlns="B"><ns:baz/></bar></foo>'), self.etree.tostring(root))
+ def test_append_attribute_namespace(self): + etree = self.etree + + root = etree.Element('{http://test/ns}root', nsmap={None: 'http://test/ns'}) + sub = etree.Element('{http://test/ns}sub', nsmap={'test': 'http://test/ns'}) + sub.attrib['{http://test/ns}attr'] = 'value'
You should verify that the attribute uses the expected prefix at this point. If you don't, you can't be sure that it's the next step that made the change.
+ root.append(sub) + self.assertEqual( + _bytes('<root xmlns="http://test/ns" xmlns:test="http://test/ns"><sub test:attr="value"/></root>'), + etree.tostring(root)) + def test_element_nsmap(self): etree = self.etree
Stefan
On 4/3/2013 10:26 PM, Stefan Behnel wrote:> Tyler Spivey, 03.04.2013 18:36:
On 2013/03/31 23:50, Stefan Behnel wrote:
It might be possible to improve the situation by special casing the default namespace also in _stripRedundantNamespaceDeclarations(). Basically, it shouldn't consider a prefixed namespace declaration redundant with a default namespace declaration. Currently, it doesn't care if there's a prefix or not.
I can give it a try, if you can write test cases for it. Or you can give it a go yourself. The relevant test code is in test_etree.py, and the new tests should go below the test method "test_namespace_cleanup()". Note that there isn't currently a cleanup test for attribute namespaces, so please write some more, as needed.
Note that the test code needs to run from Py2.4 to Py3.4, inclusively (don't care about 3.0). The relevant Cython code translates to plain C anyway, so it doesn't have portability issues by itself. There's a travis instance that tests at least 2.6-3.3, and a separate Jenkins instance that I use to test 2.4-3.4, so no problem if you fail to write portable code in the first try.
Here's, hopefully, a slightly better test. diff -r 452a2705665b -r 0ab3835ad21f src/lxml/tests/test_etree.py --- a/src/lxml/tests/test_etree.py Sun Apr 28 20:42:33 2013 +0200 +++ b/src/lxml/tests/test_etree.py Wed May 01 15:58:45 2013 -0700 @@ -2165,6 +2165,18 @@ _bytes('<foo xmlns="F"><bar xmlns:ns="NS" xmlns="B"><ns:baz/></bar></foo>'), self.etree.tostring(root)) + def test_append_attribute_namespace(self): + etree = self.etree + + root = etree.Element('{http://test/ns}root', nsmap={None: 'http://test/ns'}) + sub = etree.Element('{http://test/ns}sub', nsmap={'test': 'http://test/ns'}) + sub.attrib['{http://test/ns}attr'] = 'value' + self.assertEqual(sub.attrib['{http://test/ns}attr'], 'value') + root.append(sub) + self.assertEqual( + _bytes('<root xmlns="http://test/ns"><sub xmlns:test="http://test/ns" test:attr="value"/></root>'), + etree.tostring(root)) + def test_element_nsmap(self): etree = self.etree
participants (2)
-
Stefan Behnel
-
Tyler Spivey