Extract Title from HTML documents

Walter Dörwald walter at livinglogic.de
Fri Nov 5 10:59:43 CET 2004

Nickolay Kolev wrote:

> Hi all,
> I am looking for a way to extract the titles of HTML documents. I have 
> made an honest attempt at doing it, and it even works. Is there an 
> easier (faster / more efficient / clearer) way?

You might try XIST (http://www.livinglogic.de/Python/xist):
from ll.xist import parsers, xfind
from ll.xist.ns import html

e = parsers.parseFile("test.html", tidy=True)
print unicode(xfind.first(e//html.title))
(This uses libxml2's HTML parser internally).

    Walter Dörwald

More information about the Python-list mailing list