<p dir="ltr">Dear Andreas,</p>
<p dir="ltr">thank you so much for your answser now I can see my mistake. What I am trying to do is convince myself that the fact that when I analyze my data I am getting probability of only 0 and 1 is it because the data are well separated so I was trying to make some synthetic data where there is a probabioity different from 0 or 1, but I did it in the wrong way. Does it sounds correct if I make 300 samples with random number centered at 0 and STD 1 and other 300 centered at 0.5 and then adding some samples in between these two gaussian distributions (say in between 0.15 and 0.35)? In this case I think that I should expect probability different from 0 or 1 in the two components (when using 2 components).</p>
<p dir="ltr">Thank you in advance<br>
Tommaso</p>
<div class="gmail_quote">On Nov 28, 2016 11:58 AM, "Andreas Mueller" <<a href="mailto:t3kcit@gmail.com">t3kcit@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Hi Tommaso.<br>
So what's the issue? The distributions are very distinct, so there
is no confusion.<br>
The higher the dimensionality, the further apart the points are
(compare the distance between (-1, 1) and (1, -1) to the one between
(-1, -.5, 0, .5, 1) and (1, .5, 0, -.5, -1).<br>
I'm not sure what you mean by "the cross in the middle".<br>
You create two fixed points, one at np.arange(-1,1, 2.0/nfeatures)
and one at np.arange(1,-1, (-2.0/nfeatures)). In high dimensions,
these points are very far apart.<br>
Then you add standard normal noise to it. So this data is two
perfect Gaussians. In low dimensions, they are "close together" so
there is some confusion,<br>
in high dimensions, they are "far apart" so there is less confusion.<br>
<br>
Hth,<br>
Andy<br>
<br>
<div class="m_-3344882405610298014moz-cite-prefix">On 11/27/2016 11:47 AM, Tommaso
Costanzo wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>Hi Jacob,<br>
<br>
</div>
I have just changed my code from BayesianGaussianMixture to
GaussianMixture, and the results is the same. I attached
here the picture of the first component when I runned the
code with 5, 10, and 50 nfeatures and 2 components. In my
short test function I expect to have point that they can be
in one component as well as another has visible for small
number of nfeatures, but 0 1 for nfeatures >50 does not
sounds correct. Seems that is just related to the size of
the model and in particular to the number of features. With
the BayesianGaussianMixture I have seen that it is sligthly
better to increase the degree of freedoms to 2*nfeatures
instead of the default nfeatures. However, this does not
change the result when the nfeatures are 50 or more.<br>
<br>
</div>
Thank you in advance<br>
</div>
Tommaso<br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">2016-11-25 21:32 GMT-05:00 Jacob
Schreiber <span dir="ltr"><<a href="mailto:jmschreiber91@gmail.com" target="_blank">jmschreiber91@gmail.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Typically this means that the model is so
confident in its predictions it does not believe it
possible for the sample to come from the other component.
Do you get the same results with a regular
GaussianMixture? </div>
<div class="gmail_extra"><br>
<div class="gmail_quote">
<div>
<div class="m_-3344882405610298014h5">On Fri, Nov 25, 2016 at 11:34 AM,
Tommaso Costanzo <span dir="ltr"><<a href="mailto:tommaso.costanzo01@gmail.com" target="_blank">tommaso.costanzo01@gmail.com</a>></span>
wrote:<br>
</div>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div class="m_-3344882405610298014h5">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>
<div>
<div>Hi,<br>
<br>
</div>
I am facing some problem with the
"BayesianGaussianMixture" function,
but I do not know if it is because
of my poor knowledge on this type of
statistics or if it is something
related to the algorithm. I have set
of data of around 1000 to 4000
observation (every feature is a
spectrum of around 200 point) so in
the end I have n_samples = ~1000 and
n_features = ~20. The good things is
that I am getting the same results
of KMeans however the
"predict_proba" has value only of 0
or 1.<br>
</div>
<br>
</div>
I have wrote a small function to
simulate my problem with random data
that is reported below. The first 1/2 of
the array has the point with a positive
slope while the second 1/2 has a
negative slope, so the cross in the
middle. What I have seen is that for a
small number of features I obtain good
probability, but if the number of
features increases (say 50) than the
probability become only 0 or 1.<br>
</div>
Can someone help me in interpret this
result?<br>
<br>
</div>
Here is the code I wrote with the generated
random number, I'll generally run it with
ncomponent=2 and nfeatures=5 or 10 or 50 or
100. I am not sure if it will work in every
case is not very highly tested. I have also
attached as a file!<br>
<br>
##############################<wbr>##############################<wbr>##############<br>
import numpy as
np <wbr> <wbr>
<br>
from sklearn.mixture import GaussianMixture,
BayesianGaussianMixture <wbr>
<br>
import matplotlib.pyplot as
plt <wbr> <wbr>
<br>
<wbr> <wbr> <wbr>
<br>
def test_bgm(ncomponent,
nfeatures): <wbr> <wbr>
<br>
temp = np.random.randn(500,nfeatures)<wbr> <wbr>
<br>
temp = temp + np.arange(-1,1,
2.0/nfeatures) <wbr> <wbr>
<br>
temp1 = np.random.randn(400,nfeatures)<wbr> <wbr>
<br>
temp1 = temp1 + np.arange(1,-1,
(-2.0/nfeatures)) <wbr> <wbr>
<br>
X = np.vstack((temp,
temp1)) <wbr> <wbr>
<br>
<wbr> <wbr> <wbr>
<br>
bgm = BayesianGaussianMixture(ncompo<wbr>nent,degrees_of_freedom_prior=<wbr>nfeatures*2).fit(X)
<br>
bgm_proba =
bgm.predict_proba(X) <wbr> <wbr>
<br>
bgm_labels =
bgm.predict(X) <wbr> <wbr>
<br>
<wbr> <wbr> <wbr>
<br>
plt.figure(-1) <wbr> <wbr> <wbr>
<br>
plt.imshow(bgm_labels.reshape(<wbr>30,-1),
origin='lower',
interpolatio='none') <wbr>
<br>
plt.colorbar() <wbr> <wbr> <wbr>
<br>
<wbr> <wbr> <wbr>
<br>
for i in np.arange(0,ncomponent): <wbr> <wbr>
<br>
plt.figure(i) <wbr> <wbr> <wbr>
<br>
plt.imshow(bgm_proba[:,i].resh<wbr>ape(30,-1),
origin='lower',
interpolatio='none') <wbr> <br>
plt.colorbar() <wbr> <wbr> <wbr>
<br>
<wbr> <wbr> <wbr>
<br>
plt.show() <br>
##############################<wbr>##############################<wbr>##################<br>
<br>
</div>
Thank you in advance<span class="m_-3344882405610298014m_-2484064570224270571HOEnZb"><font color="#888888"><br>
</font></span></div>
<span class="m_-3344882405610298014m_-2484064570224270571HOEnZb"><font color="#888888">Tommaso<br>
<br clear="all">
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div><br>
-- <br>
<div class="m_-3344882405610298014m_-2484064570224270571m_390746433550541163gmail_signature">
<div dir="ltr"><span></span><span>Please do NOT
send Microsoft
Office
Attachments:</span><br>
<div>
<a href="http://www.gnu.org/philosophy/no-word-attachments.html" target="_blank">http://www.gnu.org/philosophy/<wbr>no-word-attachments.html</a></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</font></span></div>
<br>
</div>
</div>
______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/listinfo/scikit-learn</a><br>
<br>
</blockquote>
</div>
<br>
</div>
<br>
______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/listinfo/scikit-learn</a><br>
<br>
</blockquote>
</div>
<br>
<br clear="all">
<br>
-- <br>
<div class="m_-3344882405610298014gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr"><span></span><span>Please do NOT send Microsoft
Office Attachments:</span><br>
<div>
<a href="http://www.gnu.org/philosophy/no-word-attachments.html" target="_blank">http://www.gnu.org/philosophy/<wbr>no-word-attachments.html</a></div>
</div>
</div>
</div>
<br>
<fieldset class="m_-3344882405610298014mimeAttachmentHeader"></fieldset>
<br>
<pre>______________________________<wbr>_________________
scikit-learn mailing list
<a class="m_-3344882405610298014moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a>
<a class="m_-3344882405610298014moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a>
</pre>
</blockquote>
<br>
</div>
<br>______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
<br></blockquote></div>