Extract Title from HTML documents
Max M
maxm at mxm.dk
Fri Nov 5 03:18:40 EST 2004
Nickolay Kolev wrote:
> Hi all,
>
> I am looking for a way to extract the titles of HTML documents. I have
> made an honest attempt at doing it, and it even works. Is there an
> easier (faster / more efficient / clearer) way?
You anly need one tag here, so using a regex is ok.
linkPattern = re.compile('((<title.*?>(.*?)</body>))', re.I|re.S)
match = linkPattern.search(source)
if match is None:
result = ''
result = match.group(0)
If you need more than just the title I would definitely go with
BeautifulSoap.
--
hilsen/regards Max M, Denmark
http://www.mxm.dk/
IT's Mad Science
More information about the Python-list
mailing list