[BangPypers] How to compare the relevancy between news headlines?

Venkatraman S venkat83 at gmail.com
Tue Jun 14 09:27:11 CEST 2011

On Tue, Jun 14, 2011 at 12:07 PM, Gopalakrishnan Subramani <
gopalakrishnan.subramani at gmail.com> wrote:

> Jayalalithaa meets PM, DMK watches closely
> Jaya to meet PM today in New Delhi
> Jaya-PM meet, 'jittery' DMK watches on Times
> How to do this in Python? I think, NLT toolkit is too large for me to learn
> and do.. Any other fun & simpler way to do that?

1) NLTK is pretty simple. You can do duplicate detection pretty easily -
look out for sample codes.

2) Do a keyword generation from the content and check the correlation
between documents.

3) For headlines alone : do a substring matching?(but this would leave the
semantics of the text - i.e, 'Jayalalitha was last seen in KOdagu estate'
and 'Real estate would get a boost under Jayalalitha' would be categorized
under the same)


More information about the BangPypers mailing list