[lxml-dev] XPath error ?

Hello,
I've successfully compiled and installed lxml, but now I have XPath errors. To be precise, the error is that no nodes are returned. I'm using "//a" on "http://www.w3.org/". It's seems to be an namespace error but I haven't found how to give to the .xpath() function the default namespace. Here is my code :
==================== 8< =======================
import lxml.etree import urllib import sys
def show_skel(element, ident = 0): print " "*ident, element.tag for node in element: show_skel(node, ident+4)
url = "http://www.w3.org" expr_xpath = "//a"
f = lxml.etree.StringIO("".join(urllib.urlopen(url).readlines())) doc = lxml.etree.parse(f) r = doc.xpath(expr_xpath)
#show_skel(doc.getroot())
print len(r) if (len(r) > 0): for node in r: print node.tag
==================== 8< =======================
Thanks in advance

Fabien SCHWOB wrote:
I've successfully compiled and installed lxml, but now I have XPath errors. To be precise, the error is that no nodes are returned. I'm using "//a" on "http://www.w3.org/". It's seems to be an namespace error but I haven't found how to give to the .xpath() function the default namespace.
You mean like described in doc/xpath.txt or doc/api.txt?
http://codespeak.net/lxml/xpath.html http://codespeak.net/lxml/api.html
The default namespace is None, but you can also use an XPath expression like "//xhtml:a" is you prefer.
Stefan

You mean like described in doc/xpath.txt or doc/api.txt?
http://codespeak.net/lxml/xpath.html http://codespeak.net/lxml/api.html
The default namespace is None, but you can also use an XPath expression like "//xhtml:a" is you prefer.
I've seen these two pages. But the thing I would like to make is to make an XPath query against the default namespace (like in <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">). How can I make that ?
Thanks

Fabien SCHWOB wrote:
You mean like described in doc/xpath.txt or doc/api.txt?
http://codespeak.net/lxml/xpath.html http://codespeak.net/lxml/api.html
The default namespace is None, but you can also use an XPath expression like "//xhtml:a" is you prefer.
I've seen these two pages. But the thing I would like to make is to make an XPath query against the default namespace (like in <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">). How can I make that ?
something.xpath("//xhtml:a", {'xhtml':'http://www.w3.org/1999/xhtml%27%7D)
--Paul

Fabien SCHWOB wrote:
You mean like described in doc/xpath.txt or doc/api.txt?
http://codespeak.net/lxml/xpath.html http://codespeak.net/lxml/api.html
The default namespace is None, but you can also use an XPath expression like "//xhtml:a" is you prefer.
I've seen these two pages. But the thing I would like to make is to make an XPath query against the default namespace (like in <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">). How can I make that ?
Just as Paul told you. I was actually wrong, you can't register a default namespace prefix for XPath expressions. Sorry.
If you try to do this:
.>>> root = XML('<a xmlns="uri:test"><b/></a>') .>>> root.xpath("//b", {None : "uri:test"})
you will receive a TypeError. That is somewhat unfortunate, but due to libxml2. I can't see a way to define the default namespace for an XPath expression in libxml2. This behaviour now also has a test case, since it's unlikely to change in the future. Thanks for pointing me at it.
Anyway, just go with a non-empty prefix.
Stefan

Just as Paul told you. I was actually wrong, you can't register a default namespace prefix for XPath expressions. Sorry.
If you try to do this:
.>>> root = XML('<a xmlns="uri:test"><b/></a>') .>>> root.xpath("//b", {None : "uri:test"})
you will receive a TypeError. That is somewhat unfortunate, but due to libxml2. I can't see a way to define the default namespace for an XPath expression in libxml2. This behaviour now also has a test case, since it's unlikely to change in the future. Thanks for pointing me at it.
Anyway, just go with a non-empty prefix.
The problem is that in the application I'm developing, I got the xpath expression from the users, and I can't them to change their expressions to add namespace.
It's sad, but I think that for this project I will use ruby. I will gave a try more deeply to Python for some personal projects.
Thanks for all the help you gave to me, everyone.

Hi,
On Mon, 2006-03-13 at 15:28 +0100, Fabien SCHWOB wrote:
Just as Paul told you. I was actually wrong, you can't register a default namespace prefix for XPath expressions. Sorry.
If you try to do this:
.>>> root = XML('<a xmlns="uri:test"><b/></a>') .>>> root.xpath("//b", {None : "uri:test"})
you will receive a TypeError. That is somewhat unfortunate, but due to libxml2. I can't see a way to define the default namespace for an XPath expression in libxml2. This behaviour now also has a test case, since it's unlikely to change in the future. Thanks for pointing me at it.
This behaviour is based on the XPath 1.0 spec. http://www.w3.org/TR/xpath#node-tests : "if the QName does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded)"
IIRC, then this was changed in XPath 2.0.
[...]
Regards,
Kasimier

Fabien SCHWOB wrote:
Just as Paul told you. I was actually wrong, you can't register a default namespace prefix for XPath expressions. Sorry.
If you try to do this:
.>>> root = XML('<a xmlns="uri:test"><b/></a>') .>>> root.xpath("//b", {None : "uri:test"})
you will receive a TypeError. That is somewhat unfortunate, but due to libxml2. I can't see a way to define the default namespace for an XPath expression in libxml2. This behaviour now also has a test case, since it's unlikely to change in the future. Thanks for pointing me at it.
Anyway, just go with a non-empty prefix.
The problem is that in the application I'm developing, I got the xpath expression from the users, and I can't them to change their expressions to add namespace.
It's sad, but I think that for this project I will use ruby. I will gave a try more deeply to Python for some personal projects.
If Ruby's XML library implements XPath 1.0 then you'll have the same issue with Ruby. lxml in this respect follows the XPath standard (and any libxml2-based library will do so unless it takes special measures to break XPath 1.0 compliance).
If you really do not want this behavior, you might be able to get away with preprocessing the XML itself to rip off all namespaces. Then the (non-XPath 1.0 compliant) expressions you desire will work.
As Kasimier mentioned in his followup, XPath 2.0 may do what you desire. I do not know if libxml2 will implement XPath 2.0 - if so it's certainly not anywhere close, I suspect.
Regards,
Martijn
participants (5)
-
Fabien SCHWOB
-
Kasimier Buchcik
-
Martijn Faassen
-
Paul Everitt
-
Stefan Behnel