[scikit-learn] mutual information for continuous variables with scikit-learn
Gael Varoquaux
gael.varoquaux at normalesup.org
Wed Feb 1 09:18:40 EST 2023
For estimating mutual information on continuous variables, have a look at the corresponding package
https://pypi.org/project/mutual-info/
G
On Wed, Feb 01, 2023 at 02:32:03PM +0100, m m wrote:
> Hello,
> I have two continuous variables (heart rate samples over a period of time), and
> would like to compute mutual information between them as a measure of
> similarity.
> I've read some posts suggesting to use the mutual_info_score from scikit-learn
> but will this work for continuous variables? One stackoverflow answer suggested
> converting the data into probabilities with np.histogram2d() and passing the
> contingency table to the mutual_info_score.
> from sklearn.metrics import mutual_info_score
> def calc_MI(x, y, bins):
> c_xy = np.histogram2d(x, y, bins)[0]
> mi = mutual_info_score(None, None, contingency=c_xy)
> return mi
> # generate data
> L = np.linalg.cholesky( [[1.0, 0.60], [0.60, 1.0]])
> uncorrelated = np.random.standard_normal((2, 300))
> correlated = np.dot(L, uncorrelated)
> A = correlated[0]
> B = correlated[1]
> x = (A - np.mean(A)) / np.std(A)
> y = (B - np.mean(B)) / np.std(B)
> # calculate MI
> mi = calc_MI(x, y, 50)
> Is calc_MI a valid approach? I'm asking because I also read that when variables
> are continuous, then the sums in the formula for discrete data become
> integrals, but I'm not sure if this procedure is implemented in scikit-learn?
> Thanks!
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
--
Gael Varoquaux
Research Director, INRIA
http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
More information about the scikit-learn
mailing list