Ahmed, 26.11.2012 18:04:
> I'm trying to use getpath() to get the absolute xpath for an element in a
> tree. This works fine when all namespaces have consistent prefixes in the
> document. However when I deal with a document in which the same namespace
> can have multiple prefixes I get wrong path. I wrote this little script to
> re-produce the error.
>
>
The error output I get is that it finds more than one occurrence for> from lxml import etree
> from StringIO import StringIO
>
> def generate_sample_tree_with_multiple_prefixes():
> root = etree.Element('root', nsmap={'a': 'http://a.b/c'})
> for x in range(10):
> ns = "http://ns.org/ns/"
> prefix = 'l_%s' % x
> node = etree.SubElement(root, "{%s}l1" % ns, nsmap={prefix: ns})
> node.set('id', 'id_%s' % x)
> return etree.parse(StringIO(etree.tostring(root)))
>
> def generate_sample_tree_with_single_prefixes():
> root = etree.Element('root', nsmap={'a': 'http://a.b/c'})
> for x in range(10):
> ns = "http://ns.org/ns/"
> prefix = 'ns'
> node = etree.SubElement(root, "{%s}l1" % ns, nsmap={prefix: ns})
> node.set('id', 'id_%s' % x)
> return etree.parse(StringIO(etree.tostring(root)))
>
> def test_multiple():
> tree = generate_sample_tree_with_multiple_prefixes()
> r = tree.getroot()
> for l1 in r.iterchildren():
> path = tree.getpath(l1)
> ret = tree.xpath(path, namespaces=l1.nsmap)
> assert len(ret) == 1, "Multiple elements returned: %s" % ret
> assert l1 == ret[0], "It's not the same"
>
> def test_single():
> tree = generate_sample_tree_with_single_prefixes()
> r = tree.getroot()
> for l1 in r.iterchildren():
> path = tree.getpath(l1)
> ret = tree.xpath(path, namespaces=l1.nsmap)
> assert len(ret) == 1, "Multiple elements returned"
> assert l1 == ret[0], "It's not the same"
>
> if __name__ == '__main__':
> test_single()
> test_multiple()
test_multiple(). The tree it generates is
<root xmlns:a="http://a.b/c">
<l_0:l1 xmlns:l_0="http://ns.org/ns/" id="id_0"/>
<l_1:l1 xmlns:l_1="http://ns.org/ns/" id="id_1"/>
<l_2:l1 xmlns:l_2="http://ns.org/ns/" id="id_2"/>
<l_3:l1 xmlns:l_3="http://ns.org/ns/" id="id_3"/>
<l_4:l1 xmlns:l_4="http://ns.org/ns/" id="id_4"/>
<l_5:l1 xmlns:l_5="http://ns.org/ns/" id="id_5"/>
<l_6:l1 xmlns:l_6="http://ns.org/ns/" id="id_6"/>
<l_7:l1 xmlns:l_7="http://ns.org/ns/" id="id_7"/>
<l_8:l1 xmlns:l_8="http://ns.org/ns/" id="id_8"/>
<l_9:l1 xmlns:l_9="http://ns.org/ns/" id="id_9"/>
</root>
The path it generates is
/root/l_0:l1
and that returns all children instead of just the one you want.
This looks like a bug to me - it fails to realise that the other tags have
the same namespace and local name, even though they have different
prefixes. However, it's not a but in lxml but in libxml2 as that's where
the XPath expression is being generated. Please report the problem on their
side.
Thanks!
Stefan
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml@lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml