how to get the summarized text from a given URL?
__peter__ at web.de
Tue Mar 24 12:46:12 CET 2009
Rama Vadakattu wrote:
> Is there any python library to solve the below problem?
> FOr the below URL :
> Summarized text is :
> By Roy Mark With sales plummeting and its smart phones failing to woo
> new customers, Sony Ericsson follows its warning that first quarter
> sales will be disappointing with the announcement that Najmi Jarwala,
> president of Sony Ericsson USA and head of ...
> Usually summarized text is a 2 to 3 line description of the URL which
> we usually obtain by fetching that html page , examining the content
> and figuring out short description from that html markup.
> Are there any python libraries which give summarized text for a given
> url ?
BeautifulSoup makes it easy to access parts of a web page.
from BeautifulSoup import BeautifulSoup
data = urllib2.urlopen("http://tinyurl.com/dzcwbg").read()
bs = BeautifulSoup(data)
print bs.find("meta", dict(name="description"))["content"]
> It is ok even if the library just gives intial two lines of text
> from the given URL Instead of summarization.
The problem is how you identify the summary. Different web sites will put it
in different places using different markup.
More information about the Python-list