# [scikit-learn] mutual information for continuous variables with scikit-learn

```Thanks Sole and Gael, I'll try both ways. Are the two methods fundamentally
different, or will they give me similar results?
Also, the majority of MI analysis I've seen with continuous variables
discretize the data into arbitrary bins. Is this procedure actually valid?
I'd think by discretizing continuous data we would be losing important
variation in the data.

> For estimating mutual information on continuous variables, have a look at
> the corresponding package
> https://pypi.org/project/mutual-info/
> > Hello,
>
> > I have two continuous variables (heart rate samples over a period of
> time), and
> > would like to compute mutual information between them as a measure of
> > similarity.
>
> > I've read some posts suggesting to use the mutual_info_score from
> scikit-learn
> > but will this work for continuous variables? One stackoverflow answer
> suggested
> > converting the data into probabilities with np.histogram2d() and passing
> the
> > contingency table to the mutual_info_score.
> > from sklearn.metrics import mutual_info_score
> > def calc_MI(x, y, bins):
> >     c_xy = np.histogram2d(x, y, bins)[0]
> >     mi = mutual_info_score(None, None, contingency=c_xy)
> >     return mi
> > # generate data
> > L = np.linalg.cholesky( [[1.0, 0.60], [0.60, 1.0]])
> > uncorrelated = np.random.standard_normal((2, 300))
> > correlated = np.dot(L, uncorrelated)
> > A = correlated[0]
> > B = correlated[1]
> > x = (A - np.mean(A)) / np.std(A)
> > y = (B - np.mean(B)) / np.std(B)
> > # calculate MI
> > mi = calc_MI(x, y, 50)
>
> > Is calc_MI a valid approach? I'm asking because I also read that when
> variables
> > are continuous, then the sums in the formula for discrete data become
> > integrals, but I'm not sure if this procedure is implemented in
> scikit-learn?
> > Thanks!
>
