[scikit-learn] suggested classification algorithm

Thomas Evangelidis tevang3 at gmail.com
Mon Nov 14 06:14:06 EST 2016


Greetings,

I want to design a program that can deal with classification problems of
the same type, where the  number of positive observations is small but the
number of negative much larger. Speaking with numbers, the number of
positive observations could range usually between 2 to 20 and the number of
negative could be at least x30 times larger. The number of features could
be between 2 and 20 too, but that could be reduced using feature selection
and elimination algorithms. I 've read in the documentation that some
algorithms like the SVM are still effective when the number of dimensions
is greater than the number of samples, but I am not sure if they are
suitable for my case. Moreover, according to this Figure, the Nearest
Neighbors is the best and second is the RBF SVM:

http://scikit-learn.org/stable/_images/sphx_glr
_plot_classifier_comparison_001.png

However, I assume that Nearest Neighbors would not be effective in my case
where the number of positive observations is very low. For these reasons I
would like to know your expert opinion about which classification algorithm
should I try first.

thanks in advance
Thomas


-- 

======================================================================

Thomas Evangelidis

Research Specialist
CEITEC - Central European Institute of Technology
Masaryk University
Kamenice 5/A35/1S081,
62500 Brno, Czech Republic

email: tevang at pharm.uoa.gr

          tevang3 at gmail.com


website: https://sites.google.com/site/thomasevangelidishomepage/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161114/6230e772/attachment.html>


More information about the scikit-learn mailing list