[scikit-learn] Maximum Mutual Information value for continuous variables

Thomas Evangelidis tevang3 at gmail.com
Wed Nov 27 11:58:45 EST 2019


I am thinking of alternative ways of removing the invariant scalar features
from my feature vectors before training MLPs. So far I tried removing
columns with 0-variance and columns with Pearson's R=1.0 or R=-1.0. If I
remove columns with |R|<1.0 the performance drops. However, R measures the
linear correlation. Now I am thinking to try removing columns with high
Mutual Information, but first I need to normalize it. I found in the
documentation under "Univariate Feature Selection" the function


I used this function to measure the correlation between columns (features)
but sometimes returns values >1.0. On the other hand, there is also this


which is upper limited to 1.0 but it is for categorical data (clusters). So
my question is, is there a way to computer normalized Mutual Information
for continuous variables, too?

Thanks in advance for any advice.



Dr. Thomas Evangelidis

Research Scientist

IOCB - Institute of Organic Chemistry and Biochemistry of the Czech Academy
of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en>, Prague,
Czech Republic
CEITEC - Central European Institute of Technology
<https://www.ceitec.eu/>, Brno,
Czech Republic

email: tevang3 at gmail.com, Twitter: tevangelidis
<https://twitter.com/tevangelidis>, LinkedIn: Thomas Evangelidis

website: https://sites.google.com/site/thomasevangelidishomepage/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20191127/5ec49bd2/attachment.html>

More information about the scikit-learn mailing list