
Hi All, I'm trying to use getpath() to get the absolute xpath for an element in a tree. This works fine when all namespaces have consistent prefixes in the document. However when I deal with a document in which the same namespace can have multiple prefixes I get wrong path. I wrote this little script to re-produce the error. #!/usr/bin/env python from lxml import etree from StringIO import StringIO def generate_sample_tree_with_multiple_prefixes(): root = etree.Element('root', nsmap={'a': 'http://a.b/c'}) for x in range(10): ns = "http://ns.org/ns/" prefix = 'l_%s' % x node = etree.SubElement(root, "{%s}l1" % ns, nsmap={prefix: ns}) node.set('id', 'id_%s' % x) return etree.parse(StringIO(etree.tostring(root))) def generate_sample_tree_with_single_prefixes(): root = etree.Element('root', nsmap={'a': 'http://a.b/c'}) for x in range(10): ns = "http://ns.org/ns/" prefix = 'ns' node = etree.SubElement(root, "{%s}l1" % ns, nsmap={prefix: ns}) node.set('id', 'id_%s' % x) return etree.parse(StringIO(etree.tostring(root))) def test_multiple(): tree = generate_sample_tree_with_multiple_prefixes() r = tree.getroot() for l1 in r.iterchildren(): path = tree.getpath(l1) ret = tree.xpath(path, namespaces=l1.nsmap) assert len(ret) == 1, "Multiple elements returned: %s" % ret assert l1 == ret[0], "It's not the same" def test_single(): tree = generate_sample_tree_with_single_prefixes() r = tree.getroot() for l1 in r.iterchildren(): path = tree.getpath(l1) ret = tree.xpath(path, namespaces=l1.nsmap) assert len(ret) == 1, "Multiple elements returned" assert l1 == ret[0], "It's not the same" if __name__ == '__main__': test_single() test_multiple() thanks -- -Ahmed

Ahmed, 26.11.2012 18:04:
The error output I get is that it finds more than one occurrence for test_multiple(). The tree it generates is <root xmlns:a="http://a.b/c"> <l_0:l1 xmlns:l_0="http://ns.org/ns/" id="id_0"/> <l_1:l1 xmlns:l_1="http://ns.org/ns/" id="id_1"/> <l_2:l1 xmlns:l_2="http://ns.org/ns/" id="id_2"/> <l_3:l1 xmlns:l_3="http://ns.org/ns/" id="id_3"/> <l_4:l1 xmlns:l_4="http://ns.org/ns/" id="id_4"/> <l_5:l1 xmlns:l_5="http://ns.org/ns/" id="id_5"/> <l_6:l1 xmlns:l_6="http://ns.org/ns/" id="id_6"/> <l_7:l1 xmlns:l_7="http://ns.org/ns/" id="id_7"/> <l_8:l1 xmlns:l_8="http://ns.org/ns/" id="id_8"/> <l_9:l1 xmlns:l_9="http://ns.org/ns/" id="id_9"/> </root> The path it generates is /root/l_0:l1 and that returns all children instead of just the one you want. This looks like a bug to me - it fails to realise that the other tags have the same namespace and local name, even though they have different prefixes. However, it's not a but in lxml but in libxml2 as that's where the XPath expression is being generated. Please report the problem on their side. Thanks! Stefan

I tried to parse the same documents by directly using libxml2 in C and it has the same issue. So I sent an email the libxml2 mailing list about the issue, https://mail.gnome.org/archives/xml/2012-November/msg00040.html thanks, -Ahmed On Mon, Nov 26, 2012 at 12:28 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
-- -Ahmed

Ahmed, 26.11.2012 18:04:
The error output I get is that it finds more than one occurrence for test_multiple(). The tree it generates is <root xmlns:a="http://a.b/c"> <l_0:l1 xmlns:l_0="http://ns.org/ns/" id="id_0"/> <l_1:l1 xmlns:l_1="http://ns.org/ns/" id="id_1"/> <l_2:l1 xmlns:l_2="http://ns.org/ns/" id="id_2"/> <l_3:l1 xmlns:l_3="http://ns.org/ns/" id="id_3"/> <l_4:l1 xmlns:l_4="http://ns.org/ns/" id="id_4"/> <l_5:l1 xmlns:l_5="http://ns.org/ns/" id="id_5"/> <l_6:l1 xmlns:l_6="http://ns.org/ns/" id="id_6"/> <l_7:l1 xmlns:l_7="http://ns.org/ns/" id="id_7"/> <l_8:l1 xmlns:l_8="http://ns.org/ns/" id="id_8"/> <l_9:l1 xmlns:l_9="http://ns.org/ns/" id="id_9"/> </root> The path it generates is /root/l_0:l1 and that returns all children instead of just the one you want. This looks like a bug to me - it fails to realise that the other tags have the same namespace and local name, even though they have different prefixes. However, it's not a but in lxml but in libxml2 as that's where the XPath expression is being generated. Please report the problem on their side. Thanks! Stefan

I tried to parse the same documents by directly using libxml2 in C and it has the same issue. So I sent an email the libxml2 mailing list about the issue, https://mail.gnome.org/archives/xml/2012-November/msg00040.html thanks, -Ahmed On Mon, Nov 26, 2012 at 12:28 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
-- -Ahmed
participants (2)
-
Ahmed
-
Stefan Behnel