Extract Title from HTML documents
Mike Meyer
mwm at mired.org
Fri Nov 5 03:29:22 EST 2004
Max M <maxm at mxm.dk> writes:
> Nickolay Kolev wrote:
>> Hi all,
>> I am looking for a way to extract the titles of HTML documents. I
>> have made an honest attempt at doing it, and it even works. Is there
>> an easier (faster / more efficient / clearer) way?
>
> You anly need one tag here, so using a regex is ok.
>
> linkPattern = re.compile('((<title.*?>(.*?)</body>))', re.I|re.S)
^^^^
Shouldn't that be </title>
<mike?
> match = linkPattern.search(source)
> if match is None:
> result = ''
> result = match.group(0)
>
> If you need more than just the title I would definitely go with
> BeautifulSoap.
>
> --
>
> hilsen/regards Max M, Denmark
>
> http://www.mxm.dk/
> IT's Mad Science
--
Mike Meyer <mwm at mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
More information about the Python-list
mailing list