<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi Tommaso.<br>
So what's the issue? The distributions are very distinct, so there
is no confusion.<br>
The higher the dimensionality, the further apart the points are
(compare the distance between (-1, 1) and (1, -1) to the one between
(-1, -.5, 0, .5, 1) and (1, .5, 0, -.5, -1).<br>
I'm not sure what you mean by "the cross in the middle".<br>
You create two fixed points, one at np.arange(-1,1, 2.0/nfeatures)
and one at np.arange(1,-1, (-2.0/nfeatures)). In high dimensions,
these points are very far apart.<br>
Then you add standard normal noise to it. So this data is two
perfect Gaussians. In low dimensions, they are "close together" so
there is some confusion,<br>
in high dimensions, they are "far apart" so there is less confusion.<br>
<br>
Hth,<br>
Andy<br>
<br>
<div class="moz-cite-prefix">On 11/27/2016 11:47 AM, Tommaso
Costanzo wrote:<br>
</div>
<blockquote
cite="mid:CAHMJyZfDzC_joCk9HkEfz+wi5JvX8mvT2O2Xt+8yUE3NWBtJ-w@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>
<div>Hi Jacob,<br>
<br>
</div>
I have just changed my code from BayesianGaussianMixture to
GaussianMixture, and the results is the same. I attached
here the picture of the first component when I runned the
code with 5, 10, and 50 nfeatures and 2 components. In my
short test function I expect to have point that they can be
in one component as well as another has visible for small
number of nfeatures, but 0 1 for nfeatures >50 does not
sounds correct. Seems that is just related to the size of
the model and in particular to the number of features. With
the BayesianGaussianMixture I have seen that it is sligthly
better to increase the degree of freedoms to 2*nfeatures
instead of the default nfeatures. However, this does not
change the result when the nfeatures are 50 or more.<br>
<br>
</div>
Thank you in advance<br>
</div>
Tommaso<br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">2016-11-25 21:32 GMT-05:00 Jacob
Schreiber <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:jmschreiber91@gmail.com" target="_blank">jmschreiber91@gmail.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Typically this means that the model is so
confident in its predictions it does not believe it
possible for the sample to come from the other component.
Do you get the same results with a regular
GaussianMixture? </div>
<div class="gmail_extra"><br>
<div class="gmail_quote">
<div>
<div class="h5">On Fri, Nov 25, 2016 at 11:34 AM,
Tommaso Costanzo <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:tommaso.costanzo01@gmail.com"
target="_blank">tommaso.costanzo01@gmail.com</a>></span>
wrote:<br>
</div>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div class="h5">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>
<div>
<div>Hi,<br>
<br>
</div>
I am facing some problem with the
"BayesianGaussianMixture" function,
but I do not know if it is because
of my poor knowledge on this type of
statistics or if it is something
related to the algorithm. I have set
of data of around 1000 to 4000
observation (every feature is a
spectrum of around 200 point) so in
the end I have n_samples = ~1000 and
n_features = ~20. The good things is
that I am getting the same results
of KMeans however the
"predict_proba" has value only of 0
or 1.<br>
</div>
<br>
</div>
I have wrote a small function to
simulate my problem with random data
that is reported below. The first 1/2 of
the array has the point with a positive
slope while the second 1/2 has a
negative slope, so the cross in the
middle. What I have seen is that for a
small number of features I obtain good
probability, but if the number of
features increases (say 50) than the
probability become only 0 or 1.<br>
</div>
Can someone help me in interpret this
result?<br>
<br>
</div>
Here is the code I wrote with the generated
random number, I'll generally run it with
ncomponent=2 and nfeatures=5 or 10 or 50 or
100. I am not sure if it will work in every
case is not very highly tested. I have also
attached as a file!<br>
<br>
##############################<wbr>##############################<wbr>##############<br>
import numpy as
np <wbr> <wbr>
<br>
from sklearn.mixture import GaussianMixture,
BayesianGaussianMixture <wbr>
<br>
import matplotlib.pyplot as
plt <wbr> <wbr>
<br>
<wbr> <wbr> <wbr>
<br>
def test_bgm(ncomponent,
nfeatures): <wbr> <wbr>
<br>
temp = np.random.randn(500,nfeatures)<wbr> <wbr>
<br>
temp = temp + np.arange(-1,1,
2.0/nfeatures) <wbr> <wbr>
<br>
temp1 = np.random.randn(400,nfeatures)<wbr> <wbr>
<br>
temp1 = temp1 + np.arange(1,-1,
(-2.0/nfeatures)) <wbr> <wbr>
<br>
X = np.vstack((temp,
temp1)) <wbr> <wbr>
<br>
<wbr> <wbr> <wbr>
<br>
bgm = BayesianGaussianMixture(ncompo<wbr>nent,degrees_of_freedom_prior=<wbr>nfeatures*2).fit(X)
<br>
bgm_proba =
bgm.predict_proba(X) <wbr> <wbr>
<br>
bgm_labels =
bgm.predict(X) <wbr> <wbr>
<br>
<wbr> <wbr> <wbr>
<br>
plt.figure(-1) <wbr> <wbr> <wbr>
<br>
plt.imshow(bgm_labels.reshape(<wbr>30,-1),
origin='lower',
interpolatio='none') <wbr>
<br>
plt.colorbar() <wbr> <wbr> <wbr>
<br>
<wbr> <wbr> <wbr>
<br>
for i in np.arange(0,ncomponent): <wbr> <wbr>
<br>
plt.figure(i) <wbr> <wbr> <wbr>
<br>
plt.imshow(bgm_proba[:,i].resh<wbr>ape(30,-1),
origin='lower',
interpolatio='none') <wbr> <br>
plt.colorbar() <wbr> <wbr> <wbr>
<br>
<wbr> <wbr> <wbr>
<br>
plt.show() <br>
##############################<wbr>##############################<wbr>##################<br>
<br>
</div>
Thank you in advance<span
class="m_-2484064570224270571HOEnZb"><font
color="#888888"><br>
</font></span></div>
<span class="m_-2484064570224270571HOEnZb"><font
color="#888888">Tommaso<br>
<br clear="all">
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div><br>
-- <br>
<div
class="m_-2484064570224270571m_390746433550541163gmail_signature">
<div dir="ltr"><span
style="font-family:"lucida
console","courier new",courier,monospace"></span><span
style="font-family:"lucida
console","courier new",courier,monospace">Please do NOT
send Microsoft
Office
Attachments:</span><br
style="font-family:"lucida console","courier
new",courier,monospace">
<div>
<a
moz-do-not-send="true"
style="font-family:"lucida console","courier
new",courier,monospace"
href="http://www.gnu.org/philosophy/no-word-attachments.html"
target="_blank">http://www.gnu.org/philosophy/<wbr>no-word-attachments.html</a></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</font></span></div>
<br>
</div>
</div>
______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a moz-do-not-send="true"
href="mailto:scikit-learn@python.org"
target="_blank">scikit-learn@python.org</a><br>
<a moz-do-not-send="true"
href="https://mail.python.org/mailman/listinfo/scikit-learn"
rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/listinfo/scikit-learn</a><br>
<br>
</blockquote>
</div>
<br>
</div>
<br>
______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a moz-do-not-send="true"
href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
<a moz-do-not-send="true"
href="https://mail.python.org/mailman/listinfo/scikit-learn"
rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
<br>
</blockquote>
</div>
<br>
<br clear="all">
<br>
-- <br>
<div class="gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr"><span style="font-family:'Lucida
Console','Courier New',Courier,monospace"></span><span
style="font-family:'Lucida Console','Courier
New',Courier,monospace">Please do NOT send Microsoft
Office Attachments:</span><br style="font-family:'Lucida
Console','Courier New',Courier,monospace">
<div>
<a moz-do-not-send="true"
href="http://www.gnu.org/philosophy/no-word-attachments.html"
style="font-family:'Lucida Console','Courier
New',Courier,monospace" target="_blank">http://www.gnu.org/philosophy/no-word-attachments.html</a></div>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
scikit-learn mailing list
<a class="moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>
<a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>
</pre>
</blockquote>
<br>
</body>
</html>