[Mailman-Developers] Fwd: Eta patha

Pratik Sarkar iampratiksarkar at gmail.com
Wed Apr 10 21:44:08 CEST 2013


Hi all,

 I am a 2nd yr undergrad student studying CS at Bengal Engineering and
Science University, India. This is the first time I am participating in
Google Summer of Code. And I am very interested to contribute to machine
learning or NLP related projects.

 I went through the Python projects of gsoc 2013 and I found the
anti-spam/anti-abuse filter of Mailman very interesting. I had some own
ideas regarding the project.I want to integrate/use external NLP toolkits
like Lingpipe and LIBSVM. Classification of the text will be done on an
N-gram Language Model and a Support Vector Machine (I would prefer LIBSVM),
with the "localized knowledge" (which would be used as the training set for
the classifier)  to help in identifying the abuse/spam. Adding a feedback
procedure to the classifier might also be helpful as it will help us to
improve classification and the classifier can update itself with previously
classified spams/abuse and hence the program can filter latest spams/abuses
without any update, on programmer's part.

About me:
1. About 5 years of programming experience mainly in Java,python, C and C++
and a bit of  PHP and LISP. I am a part of our college Software and
programming club.
2. I qualified the Zonal Informatics olympiad 2010-11. I regularly
participate in online coding competitions and hackathons.
3. Developed a email client in python.
4. I have some prior experience in machine learning and natural language
processing. I participated in Twitminer 2013 (IISC Bangalore) where our
team (BEing) came 6th. In the last few months, I have been working on the
sentiment analysis of tweets, based on Language Models of Lingpipe Toolkit
of Java.and LIBSVM to analyze the efficiency of classification.

Looking forward to hear back from all of you!

Pratik Sarkar


More information about the Mailman-Developers mailing list