I am thinking of alternative ways of removing the invariant scalar features
from my feature vectors before training MLPs. So far I tried removing
columns with 0-variance and columns with Pearson's R=1.0 or R=-1.0. If I
remove columns with |R|<1.0 the performance drops. However, R measures the
linear correlation. Now I am thinking to try removing columns with high
Mutual Information, but first I need to normalize it. I found in the
documentation under "Univariate Feature Selection" the function


I used this function to measure the correlation between columns (features)
but sometimes returns values >1.0. On the other hand, there is also this


which is upper limited to 1.0 but it is for categorical data (clusters). So
my question is, is there a way to computer normalized Mutual Information
for continuous variables, too?

Thanks in advance for any advice.



