# [scikit-learn] Bayesian Gaussian Mixture

Tommaso Costanzo tommaso.costanzo01 at gmail.com
Fri Nov 25 14:34:48 EST 2016

I am facing some problem with the "BayesianGaussianMixture" function, but I
do not know if it is because of my poor knowledge on this type of
statistics or if it is something related to the algorithm. I have set of
data of around 1000 to 4000 observation (every feature is a spectrum of
around 200 point) so in the end I have n_samples = ~1000 and n_features =
~20. The good things is that I am getting the same results of KMeans
however the "predict_proba" has value only of 0 or 1.

I have wrote a small function to simulate my problem with random data that
is reported below. The first 1/2 of the array has the point with a positive
slope while the second 1/2 has a negative slope, so the cross in the
middle. What I have seen is that for a small number of features I obtain
good probability, but if the number of features increases (say 50) than the
probability become only 0 or 1.
Can someone help me in interpret this result?

Here is the code I wrote with the generated random number, I'll generally
run it with ncomponent=2 and nfeatures=5 or 10 or 50 or 100. I am not sure
if it will work in every case is not very highly tested. I have also
attached as a file!

##########################################################################
import numpy as
np

from sklearn.mixture import GaussianMixture,
BayesianGaussianMixture
import matplotlib.pyplot as
plt

def test_bgm(ncomponent,
nfeatures):
temp =
np.random.randn(500,nfeatures)

temp = temp + np.arange(-1,1,
2.0/nfeatures)
temp1 =
np.random.randn(400,nfeatures)

temp1 = temp1 + np.arange(1,-1,
(-2.0/nfeatures))
X = np.vstack((temp,
temp1))

bgm =
BayesianGaussianMixture(ncomponent,degrees_of_freedom_prior=nfeatures*2).fit(X)

bgm_proba =
bgm.predict_proba(X)

bgm_labels =
bgm.predict(X)

plt.figure(-1)

plt.imshow(bgm_labels.reshape(30,-1), origin='lower',
interpolatio='none')

plt.colorbar()

for i in
np.arange(0,ncomponent):

plt.figure(i)

plt.imshow(bgm_proba[:,i].reshape(30,-1), origin='lower',
interpolatio='none')

plt.colorbar()

plt.show()
##############################################################################

