[scikit-learn] Bayesian Gaussian Mixture

Andreas Mueller t3kcit at gmail.com
Mon Nov 28 11:56:29 EST 2016


Hi Tommaso.
So what's the issue? The distributions are very distinct, so there is no 
confusion.
The higher the dimensionality, the further apart the points are (compare 
the distance between (-1, 1) and (1, -1) to the one between (-1, -.5, 0, 
.5, 1)  and (1, .5, 0, -.5, -1).
I'm not sure what you mean by "the cross in the middle".
You create two fixed points, one at np.arange(-1,1, 2.0/nfeatures) and 
one at np.arange(1,-1, (-2.0/nfeatures)). In high dimensions, these 
points are very far apart.
Then you add standard normal noise to it. So this data is two perfect 
Gaussians. In low dimensions, they are "close together" so there is some 
confusion,
in high dimensions, they are "far apart" so there is less confusion.

Hth,
Andy

On 11/27/2016 11:47 AM, Tommaso Costanzo wrote:
> Hi Jacob,
>
> I have just changed my code from BayesianGaussianMixture to 
> GaussianMixture, and the results is the same. I attached here the 
> picture of the first component when I runned the code with 5, 10, and 
> 50 nfeatures and 2 components. In my short test function I expect to 
> have point that they can be in one component as well as another has 
> visible for small number of nfeatures, but 0 1 for nfeatures >50 does  
> not sounds correct. Seems that is just related to the size of the 
> model and in particular to the number of features. With the 
> BayesianGaussianMixture I have seen that it is sligthly better to 
> increase the degree of freedoms to 2*nfeatures instead of the default 
> nfeatures. However, this does not change the result when the nfeatures 
> are 50 or more.
>
> Thank you in advance
> Tommaso
>
> 2016-11-25 21:32 GMT-05:00 Jacob Schreiber <jmschreiber91 at gmail.com 
> <mailto:jmschreiber91 at gmail.com>>:
>
>     Typically this means that the model is so confident in its
>     predictions it does not believe it possible for the sample to come
>     from the other component. Do you get the same results with a
>     regular GaussianMixture?
>
>     On Fri, Nov 25, 2016 at 11:34 AM, Tommaso Costanzo
>     <tommaso.costanzo01 at gmail.com
>     <mailto:tommaso.costanzo01 at gmail.com>> wrote:
>
>         Hi,
>
>         I am facing some problem with the "BayesianGaussianMixture"
>         function, but I do not know if it is because of my poor
>         knowledge on this type of statistics or if it is something
>         related to the algorithm. I have set of data of around 1000 to
>         4000 observation (every feature is a spectrum of around 200
>         point) so in the end I have n_samples = ~1000 and n_features =
>         ~20. The good things is that I am getting the same results of
>         KMeans however the "predict_proba" has value only of 0 or 1.
>
>         I have wrote a small function to simulate my problem with
>         random data that is reported below. The first 1/2 of the array
>         has the point with a positive slope while the second 1/2 has a
>         negative slope, so the cross in the middle. What I have seen
>         is that for a small number of features I obtain good
>         probability, but if the number of features increases (say 50)
>         than the probability become only 0 or 1.
>         Can someone help me in interpret this result?
>
>         Here is the code I wrote with the generated random number,
>         I'll generally run it with ncomponent=2 and nfeatures=5 or 10
>         or 50 or 100. I am not sure if it will work in every case is
>         not very highly tested. I have also attached as a file!
>
>         ##########################################################################
>         import numpy as np
>         from sklearn.mixture import GaussianMixture,
>         BayesianGaussianMixture
>         import matplotlib.pyplot as plt
>
>         def test_bgm(ncomponent, nfeatures):
>             temp = np.random.randn(500,nfeatures)
>             temp = temp + np.arange(-1,1, 2.0/nfeatures)
>             temp1 = np.random.randn(400,nfeatures)
>             temp1 = temp1 + np.arange(1,-1, (-2.0/nfeatures))
>             X = np.vstack((temp, temp1))
>
>             bgm =
>         BayesianGaussianMixture(ncomponent,degrees_of_freedom_prior=nfeatures*2).fit(X)
>
>             bgm_proba = bgm.predict_proba(X)
>             bgm_labels = bgm.predict(X)
>
>             plt.figure(-1)
>             plt.imshow(bgm_labels.reshape(30,-1), origin='lower',
>         interpolatio='none')
>             plt.colorbar()
>
>             for i in np.arange(0,ncomponent):
>                 plt.figure(i)
>                 plt.imshow(bgm_proba[:,i].reshape(30,-1),
>         origin='lower', interpolatio='none')
>                 plt.colorbar()
>
>             plt.show()
>         ##############################################################################
>
>         Thank you in advance
>         Tommaso
>
>
>         -- 
>         Please do NOT send Microsoft Office Attachments:
>         http://www.gnu.org/philosophy/no-word-attachments.html
>         <http://www.gnu.org/philosophy/no-word-attachments.html>
>
>         _______________________________________________
>         scikit-learn mailing list
>         scikit-learn at python.org <mailto:scikit-learn at python.org>
>         https://mail.python.org/mailman/listinfo/scikit-learn
>         <https://mail.python.org/mailman/listinfo/scikit-learn>
>
>
>
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>     <https://mail.python.org/mailman/listinfo/scikit-learn>
>
>
>
>
> -- 
> Please do NOT send Microsoft Office Attachments:
> http://www.gnu.org/philosophy/no-word-attachments.html
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161128/f08d0057/attachment-0001.html>


More information about the scikit-learn mailing list