I tried to parse the same documents by directly using libxml2 in C and it has the same issue. So I sent an email the libxml2 mailing list about the issue, https://mail.gnome.org/archives/xml/2012-November/msg00040.html

thanks,
-Ahmed

On Mon, Nov 26, 2012 at 12:28 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Ahmed, 26.11.2012 18:04:
> I'm trying to use getpath() to get the absolute xpath for an element in a
> tree. This works fine when all namespaces have consistent prefixes in the
> document. However when I deal with a document in which the same namespace
> can have multiple prefixes I get wrong path. I wrote this little script to
> re-produce the error.
>
>
> from lxml import etree
> from StringIO import StringIO
>
> def generate_sample_tree_with_multiple_prefixes():
>     root = etree.Element('root', nsmap={'a': 'http://a.b/c'})
>     for x in range(10):
>         ns = "http://ns.org/ns/"
>         prefix = 'l_%s' % x
>         node = etree.SubElement(root, "{%s}l1" % ns, nsmap={prefix: ns})
>         node.set('id', 'id_%s' % x)
>     return etree.parse(StringIO(etree.tostring(root)))
>
> def generate_sample_tree_with_single_prefixes():
>     root = etree.Element('root', nsmap={'a': 'http://a.b/c'})
>     for x in range(10):
>         ns = "http://ns.org/ns/"
>         prefix = 'ns'
>         node = etree.SubElement(root, "{%s}l1" % ns, nsmap={prefix: ns})
>         node.set('id', 'id_%s' % x)
>     return etree.parse(StringIO(etree.tostring(root)))
>
> def test_multiple():
>     tree = generate_sample_tree_with_multiple_prefixes()
>     r = tree.getroot()
>     for l1 in r.iterchildren():
>         path = tree.getpath(l1)
>         ret = tree.xpath(path, namespaces=l1.nsmap)
>         assert len(ret) == 1, "Multiple elements returned: %s" % ret
>         assert l1 == ret[0], "It's not the same"
>
> def test_single():
>     tree = generate_sample_tree_with_single_prefixes()
>     r = tree.getroot()
>     for l1 in r.iterchildren():
>         path = tree.getpath(l1)
>         ret = tree.xpath(path, namespaces=l1.nsmap)
>         assert len(ret) == 1, "Multiple elements returned"
>         assert l1 == ret[0], "It's not the same"
>
> if __name__ == '__main__':
>     test_single()
>     test_multiple()

The error output I get is that it finds more than one occurrence for
test_multiple(). The tree it generates is

<root xmlns:a="http://a.b/c">
  <l_0:l1 xmlns:l_0="http://ns.org/ns/" id="id_0"/>
  <l_1:l1 xmlns:l_1="http://ns.org/ns/" id="id_1"/>
  <l_2:l1 xmlns:l_2="http://ns.org/ns/" id="id_2"/>
  <l_3:l1 xmlns:l_3="http://ns.org/ns/" id="id_3"/>
  <l_4:l1 xmlns:l_4="http://ns.org/ns/" id="id_4"/>
  <l_5:l1 xmlns:l_5="http://ns.org/ns/" id="id_5"/>
  <l_6:l1 xmlns:l_6="http://ns.org/ns/" id="id_6"/>
  <l_7:l1 xmlns:l_7="http://ns.org/ns/" id="id_7"/>
  <l_8:l1 xmlns:l_8="http://ns.org/ns/" id="id_8"/>
  <l_9:l1 xmlns:l_9="http://ns.org/ns/" id="id_9"/>
</root>

The path it generates is

    /root/l_0:l1

and that returns all children instead of just the one you want.

This looks like a bug to me - it fails to realise that the other tags have
the same namespace and local name, even though they have different
prefixes. However, it's not a but in lxml but in libxml2 as that's where
the XPath expression is being generated. Please report the problem on their
side.

Thanks!

Stefan

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml@lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml



--
-Ahmed