# [scikit-learn] Bayesian Gaussian Mixture

Tommaso Costanzo tommaso.costanzo01 at gmail.com
Wed Nov 30 12:17:15 EST 2016

```Dear Andreas,

thank you so much for your answser now I can see my mistake. What I am
trying to do is convince myself that the fact that when I analyze my data I
am getting probability of only 0 and 1 is it because the data are well
separated so I was trying to make some synthetic data where there is a
probabioity different from 0 or 1, but I did it in the wrong way. Does it
sounds correct if I make 300 samples with random number centered at 0 and
STD 1 and other 300 centered at 0.5 and then adding some samples in between
these two gaussian distributions (say in between 0.15 and 0.35)? In this
case I think that I should expect probability different from 0 or 1 in the
two components (when using 2 components).

Tommaso
On Nov 28, 2016 11:58 AM, "Andreas Mueller" <t3kcit at gmail.com> wrote:

> Hi Tommaso.
> So what's the issue? The distributions are very distinct, so there is no
> confusion.
> The higher the dimensionality, the further apart the points are (compare
> the distance between (-1, 1) and (1, -1) to the one between (-1, -.5, 0,
> .5, 1)  and (1, .5, 0, -.5, -1).
> I'm not sure what you mean by "the cross in the middle".
> You create two fixed points, one at np.arange(-1,1, 2.0/nfeatures) and one
> at np.arange(1,-1, (-2.0/nfeatures)). In high dimensions, these points are
> very far apart.
> Then you add standard normal noise to it. So this data is two perfect
> Gaussians. In low dimensions, they are "close together" so there is some
> confusion,
> in high dimensions, they are "far apart" so there is less confusion.
>
> Hth,
> Andy
>
> On 11/27/2016 11:47 AM, Tommaso Costanzo wrote:
>
> Hi Jacob,
>
> I have just changed my code from BayesianGaussianMixture to
> GaussianMixture, and the results is the same. I attached here the picture
> of the first component when I runned the code with 5, 10, and 50 nfeatures
> and 2 components. In my short test function I expect to have point that
> they can be in one component as well as another has visible for small
> number of nfeatures, but 0 1 for nfeatures >50 does  not sounds correct.
> Seems that is just related to the size of the model and in particular to
> the number of features. With the BayesianGaussianMixture I have seen that
> it is sligthly better to increase the degree of freedoms to 2*nfeatures
> instead of the default nfeatures. However, this does not change the result
> when the nfeatures are 50 or more.
>
> Tommaso
>
> 2016-11-25 21:32 GMT-05:00 Jacob Schreiber <jmschreiber91 at gmail.com>:
>
>> Typically this means that the model is so confident in its predictions it
>> does not believe it possible for the sample to come from the other
>> component. Do you get the same results with a regular GaussianMixture?
>>
>> On Fri, Nov 25, 2016 at 11:34 AM, Tommaso Costanzo <
>> tommaso.costanzo01 at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am facing some problem with the "BayesianGaussianMixture" function,
>>> but I do not know if it is because of my poor knowledge on this type of
>>> statistics or if it is something related to the algorithm. I have set of
>>> data of around 1000 to 4000 observation (every feature is a spectrum of
>>> around 200 point) so in the end I have n_samples = ~1000 and n_features =
>>> ~20. The good things is that I am getting the same results of KMeans
>>> however the "predict_proba" has value only of 0 or 1.
>>>
>>> I have wrote a small function to simulate my problem with random data
>>> that is reported below. The first 1/2 of the array has the point with a
>>> positive slope while the second 1/2 has a negative slope, so the cross in
>>> the middle. What I have seen is that for a small number of features I
>>> obtain good probability, but if the number of features increases (say 50)
>>> than the probability become only 0 or 1.
>>> Can someone help me in interpret this result?
>>>
>>> Here is the code I wrote with the generated random number, I'll
>>> generally run it with ncomponent=2 and nfeatures=5 or 10 or 50 or 100. I am
>>> not sure if it will work in every case is not very highly tested. I have
>>> also attached as a file!
>>>
>>> ############################################################
>>> ##############
>>> import numpy as np
>>>
>>> from sklearn.mixture import GaussianMixture,
>>> BayesianGaussianMixture
>>> import matplotlib.pyplot as plt
>>>
>>>
>>>
>>> def test_bgm(ncomponent, nfeatures):
>>>
>>>     temp = np.random.randn(500,nfeatures)
>>>
>>>     temp = temp + np.arange(-1,1, 2.0/nfeatures)
>>>
>>>     temp1 = np.random.randn(400,nfeatures)
>>>
>>>     temp1 = temp1 + np.arange(1,-1, (-2.0/nfeatures))
>>>
>>>     X = np.vstack((temp, temp1))
>>>
>>>
>>>
>>>     bgm = BayesianGaussianMixture(ncomponent,degrees_of_freedom_prior=nfeatures*2).fit(X)
>>>
>>>     bgm_proba = bgm.predict_proba(X)
>>>
>>>     bgm_labels = bgm.predict(X)
>>>
>>>
>>>
>>>     plt.figure(-1)
>>>
>>>     plt.imshow(bgm_labels.reshape(30,-1), origin='lower',
>>> interpolatio='none')
>>>     plt.colorbar()
>>>
>>>
>>>
>>>     for i in np.arange(0,ncomponent):
>>>
>>>         plt.figure(i)
>>>
>>>         plt.imshow(bgm_proba[:,i].reshape(30,-1), origin='lower',
>>> interpolatio='none')
>>>         plt.colorbar()
>>>
>>>
>>>
>>>     plt.show()
>>> ############################################################
>>> ##################
>>>
>>> Tommaso
>>>
>>>
>>> --
>>> Please do NOT send Microsoft Office Attachments:
>>> http://www.gnu.org/philosophy/no-word-attachments.html
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
>
> --
> Please do NOT send Microsoft Office Attachments:
> http://www.gnu.org/philosophy/no-word-attachments.html
>
>
> _______________________________________________
> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161130/c5903637/attachment-0001.html>
```