[BangPypers] How to compare the relevancy between news headlines?

Vijay Ramachandran vijay750 at gmail.com
Mon Jun 20 05:54:32 CEST 2011


On Tue, Jun 14, 2011 at 12:07 PM, <bangpypers-request at python.org> wrote:

> While looking into news.google.co.in site, they find the similar news by
> grouping them..
>
> For example, The following news headlines from different online portal are
> grouped together.
>
> Jayalalithaa meets PM, DMK watches closely
> Jaya to meet PM today in New Delhi
> Jaya-PM meet, 'jittery' DMK watches on Times
>
> How to do this in Python? I think, NLT toolkit is too large for me to learn
> and do.. Any other fun & simpler way to do that?
>

Both are fairly standard machine learning tasks.

First, you can use
clustering<http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/>to
identify classes - there are quite a few well known algorithms, such
as
k-means. Or, you could manually select which are your classes. Then, you
need to train a
classifier<http://en.wikipedia.org/wiki/Statistical_classification_%28machine_learning%29>which
will classify new articles into one of your classes.

For both these tasks, nltk provides very nice, pythonic tools. You can also
search for other pythonic machine learning toolkits. If you need to do
anything with natural language processing, though, nltk is well worth your
time to learn. It has excellent documentation including a few books.

HTH,
Vijay

-- 
Targeted direct marketing on Twitter - http://www.wisdomtap.com/


More information about the BangPypers mailing list