[scikit-learn] Bayesian Gaussian Mixture

Tommaso Costanzo tommaso.costanzo01 at gmail.com
Fri Nov 25 14:34:48 EST 2016


Hi,

I am facing some problem with the "BayesianGaussianMixture" function, but I
do not know if it is because of my poor knowledge on this type of
statistics or if it is something related to the algorithm. I have set of
data of around 1000 to 4000 observation (every feature is a spectrum of
around 200 point) so in the end I have n_samples = ~1000 and n_features =
~20. The good things is that I am getting the same results of KMeans
however the "predict_proba" has value only of 0 or 1.

I have wrote a small function to simulate my problem with random data that
is reported below. The first 1/2 of the array has the point with a positive
slope while the second 1/2 has a negative slope, so the cross in the
middle. What I have seen is that for a small number of features I obtain
good probability, but if the number of features increases (say 50) than the
probability become only 0 or 1.
Can someone help me in interpret this result?

Here is the code I wrote with the generated random number, I'll generally
run it with ncomponent=2 and nfeatures=5 or 10 or 50 or 100. I am not sure
if it will work in every case is not very highly tested. I have also
attached as a file!

##########################################################################
import numpy as
np

from sklearn.mixture import GaussianMixture,
BayesianGaussianMixture
import matplotlib.pyplot as
plt


def test_bgm(ncomponent,
nfeatures):
    temp =
np.random.randn(500,nfeatures)

    temp = temp + np.arange(-1,1,
2.0/nfeatures)
    temp1 =
np.random.randn(400,nfeatures)

    temp1 = temp1 + np.arange(1,-1,
(-2.0/nfeatures))
    X = np.vstack((temp,
temp1))


    bgm =
BayesianGaussianMixture(ncomponent,degrees_of_freedom_prior=nfeatures*2).fit(X)

    bgm_proba =
bgm.predict_proba(X)

    bgm_labels =
bgm.predict(X)




plt.figure(-1)

    plt.imshow(bgm_labels.reshape(30,-1), origin='lower',
interpolatio='none')

plt.colorbar()



    for i in
np.arange(0,ncomponent):


plt.figure(i)

        plt.imshow(bgm_proba[:,i].reshape(30,-1), origin='lower',
interpolatio='none')

plt.colorbar()



    plt.show()
##############################################################################

Thank you in advance
Tommaso


-- 
Please do NOT send Microsoft Office Attachments:
http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161125/3c3808cb/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GaussianTest.py
Type: text/x-python
Size: 844 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161125/3c3808cb/attachment-0001.py>


More information about the scikit-learn mailing list