[lxml-dev] .base and docinfo.URL
![](https://secure.gravatar.com/avatar/cc8334869c9d2a9e603017f2da805eb3.jpg?s=120&d=mm&r=g)
Does .base inherit from docinfo.URL? It doesn't seem like it does. I tried changing .base_url to just return self.base, but if I do:
from lxml.html import parse doc = parse('http://python.org').getroot() print doc.base None doc.getroottree().docinfo.URL 'http://python.org'
![](https://secure.gravatar.com/avatar/8b97b5aad24c30e4a1357b38cc39aeaa.jpg?s=120&d=mm&r=g)
Hi Ian, Ian Bicking wrote:
Does .base inherit from docinfo.URL? It doesn't seem like it does. I tried changing .base_url to just return self.base, but if I do:
from lxml.html import parse doc = parse('http://python.org').getroot() print doc.base None doc.getroottree().docinfo.URL 'http://python.org'
I just checked the libxml2 source, it actually behaves completely different for HTML documents. Here, it looks for <html><head><base href="..."> and takes that. It completely ignores the document URL for HTML. I think it would be good to override that (directly in etree), so that it returns the document URL if nothing is returned from the base search. That way, it's consistent with the fallback in XML. Stefan
![](https://secure.gravatar.com/avatar/8b97b5aad24c30e4a1357b38cc39aeaa.jpg?s=120&d=mm&r=g)
Hi, fixed on the trunk. Stefan Stefan Behnel wrote:
Ian Bicking wrote:
Does .base inherit from docinfo.URL? It doesn't seem like it does. I tried changing .base_url to just return self.base, but if I do:
from lxml.html import parse doc = parse('http://python.org').getroot() print doc.base None doc.getroottree().docinfo.URL 'http://python.org'
I just checked the libxml2 source, it actually behaves completely different for HTML documents. Here, it looks for
<html><head><base href="...">
and takes that. It completely ignores the document URL for HTML.
I think it would be good to override that (directly in etree), so that it returns the document URL if nothing is returned from the base search. That way, it's consistent with the fallback in XML.
Stefan
_______________________________________________ lxml-dev mailing list lxml-dev@codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev
participants (2)
-
Ian Bicking
-
Stefan Behnel