[scikit-learn] Bayesian Gaussian Mixture

Tommaso Costanzo tommaso.costanzo01 at gmail.com
Sun Nov 27 11:47:38 EST 2016


Hi Jacob,

I have just changed my code from BayesianGaussianMixture to
GaussianMixture, and the results is the same. I attached here the picture
of the first component when I runned the code with 5, 10, and 50 nfeatures
and 2 components. In my short test function I expect to have point that
they can be in one component as well as another has visible for small
number of nfeatures, but 0 1 for nfeatures >50 does  not sounds correct.
Seems that is just related to the size of the model and in particular to
the number of features. With the BayesianGaussianMixture I have seen that
it is sligthly better to increase the degree of freedoms to 2*nfeatures
instead of the default nfeatures. However, this does not change the result
when the nfeatures are 50 or more.

Thank you in advance
Tommaso

2016-11-25 21:32 GMT-05:00 Jacob Schreiber <jmschreiber91 at gmail.com>:

> Typically this means that the model is so confident in its predictions it
> does not believe it possible for the sample to come from the other
> component. Do you get the same results with a regular GaussianMixture?
>
> On Fri, Nov 25, 2016 at 11:34 AM, Tommaso Costanzo <
> tommaso.costanzo01 at gmail.com> wrote:
>
>> Hi,
>>
>> I am facing some problem with the "BayesianGaussianMixture" function, but
>> I do not know if it is because of my poor knowledge on this type of
>> statistics or if it is something related to the algorithm. I have set of
>> data of around 1000 to 4000 observation (every feature is a spectrum of
>> around 200 point) so in the end I have n_samples = ~1000 and n_features =
>> ~20. The good things is that I am getting the same results of KMeans
>> however the "predict_proba" has value only of 0 or 1.
>>
>> I have wrote a small function to simulate my problem with random data
>> that is reported below. The first 1/2 of the array has the point with a
>> positive slope while the second 1/2 has a negative slope, so the cross in
>> the middle. What I have seen is that for a small number of features I
>> obtain good probability, but if the number of features increases (say 50)
>> than the probability become only 0 or 1.
>> Can someone help me in interpret this result?
>>
>> Here is the code I wrote with the generated random number, I'll generally
>> run it with ncomponent=2 and nfeatures=5 or 10 or 50 or 100. I am not sure
>> if it will work in every case is not very highly tested. I have also
>> attached as a file!
>>
>> ############################################################
>> ##############
>> import numpy as np
>>
>> from sklearn.mixture import GaussianMixture,
>> BayesianGaussianMixture
>> import matplotlib.pyplot as plt
>>
>>
>>
>> def test_bgm(ncomponent, nfeatures):
>>
>>     temp = np.random.randn(500,nfeatures)
>>
>>     temp = temp + np.arange(-1,1, 2.0/nfeatures)
>>
>>     temp1 = np.random.randn(400,nfeatures)
>>
>>     temp1 = temp1 + np.arange(1,-1, (-2.0/nfeatures))
>>
>>     X = np.vstack((temp, temp1))
>>
>>
>>
>>     bgm = BayesianGaussianMixture(ncomponent,degrees_of_freedom_prior=nfeatures*2).fit(X)
>>
>>     bgm_proba = bgm.predict_proba(X)
>>
>>     bgm_labels = bgm.predict(X)
>>
>>
>>
>>     plt.figure(-1)
>>
>>     plt.imshow(bgm_labels.reshape(30,-1), origin='lower',
>> interpolatio='none')
>>     plt.colorbar()
>>
>>
>>
>>     for i in np.arange(0,ncomponent):
>>
>>         plt.figure(i)
>>
>>         plt.imshow(bgm_proba[:,i].reshape(30,-1), origin='lower',
>> interpolatio='none')
>>         plt.colorbar()
>>
>>
>>
>>     plt.show()
>> ############################################################
>> ##################
>>
>> Thank you in advance
>> Tommaso
>>
>>
>> --
>> Please do NOT send Microsoft Office Attachments:
>> http://www.gnu.org/philosophy/no-word-attachments.html
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>


-- 
Please do NOT send Microsoft Office Attachments:
http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161127/ce58765e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: N_Features-5.png
Type: image/png
Size: 23257 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161127/ce58765e/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: N_Features-10.png
Type: image/png
Size: 21773 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161127/ce58765e/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: N_Features-50.png
Type: image/png
Size: 18618 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161127/ce58765e/attachment-0005.png>


More information about the scikit-learn mailing list