<p dir="ltr">Dear Andreas,</p>

<p dir="ltr">thank you so much for your answser now I can see my mistake. What I am trying to do is convince myself that the fact that when I analyze my data I am getting probability of only 0 and 1 is it because the data are well separated so I was trying to make some synthetic data where there is a probabioity different from 0 or 1, but I did it in the wrong way. Does it sounds correct if I make 300 samples with random number centered at 0 and STD 1 and other 300 centered at 0.5 and then adding some samples in between these two gaussian distributions (say in between 0.15 and 0.35)? In this case I think that I should expect probability different from 0 or 1 in the two components (when using 2 components).</p>

<p dir="ltr">Thank you in advance<br>

Tommaso</p>

<div class="gmail_quote">On Nov 28, 2016 11:58 AM, "Andreas Mueller" <<a href="mailto:t3kcit@gmail.com">t3kcit@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    Hi Tommaso.<br>

    So what's the issue? The distributions are very distinct, so there

    is no confusion.<br>

    The higher the dimensionality, the further apart the points are

    (compare the distance between (-1, 1) and (1, -1) to the one between

    (-1, -.5, 0, .5, 1)  and (1, .5, 0, -.5, -1).<br>

    I'm not sure what you mean by "the cross in the middle".<br>

    You create two fixed points, one at np.arange(-1,1, 2.0/nfeatures)

    and one at np.arange(1,-1, (-2.0/nfeatures)). In high dimensions,

    these points are very far apart.<br>

    Then you add standard normal noise to it. So this data is two

    perfect Gaussians. In low dimensions, they are "close together" so

    there is some confusion,<br>

    in high dimensions, they are "far apart" so there is less confusion.<br>

    <br>

    Hth,<br>

    Andy<br>

    <br>

    <div class="m_-3344882405610298014moz-cite-prefix">On 11/27/2016 11:47 AM, Tommaso

      Costanzo wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">

        <div>

          <div>

            <div>Hi Jacob,<br>

              <br>

            </div>

            I have just changed my code from BayesianGaussianMixture to

            GaussianMixture, and the results is the same. I attached

            here the picture of the first component when I runned the

            code with 5, 10, and 50 nfeatures and 2 components. In my

            short test function I expect to have point that they can be

            in one component as well as another has visible for small

            number of nfeatures, but 0 1 for nfeatures >50 does  not

            sounds correct. Seems that is just related to the size of

            the model and in particular to the number of features. With

            the BayesianGaussianMixture I have seen that it is sligthly

            better to increase the degree of freedoms to 2*nfeatures

            instead of the default nfeatures. However, this does not

            change the result when the nfeatures are 50 or more.<br>

            <br>

          </div>

          Thank you in advance<br>

        </div>

        Tommaso<br>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">2016-11-25 21:32 GMT-05:00 Jacob

          Schreiber <span dir="ltr"><<a href="mailto:jmschreiber91@gmail.com" target="_blank">jmschreiber91@gmail.com</a>></span>:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div dir="ltr">Typically this means that the model is so

              confident in its predictions it does not believe it

              possible for the sample to come from the other component.

              Do you get the same results with a regular

              GaussianMixture? </div>

            <div class="gmail_extra"><br>

              <div class="gmail_quote">

                <div>

                  <div class="m_-3344882405610298014h5">On Fri, Nov 25, 2016 at 11:34 AM,

                    Tommaso Costanzo <span dir="ltr"><<a href="mailto:tommaso.costanzo01@gmail.com" target="_blank">tommaso.costanzo01@gmail.com</a>></span>

                    wrote:<br>

                  </div>

                </div>

                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  <div>

                    <div class="m_-3344882405610298014h5">

                      <div dir="ltr">

                        <div>

                          <div>

                            <div>

                              <div>

                                <div>

                                  <div>

                                    <div>Hi,<br>

                                      <br>

                                    </div>

                                    I am facing some problem with the

                                    "BayesianGaussianMixture" function,

                                    but I do not know if it is because

                                    of my poor knowledge on this type of

                                    statistics or if it is something

                                    related to the algorithm. I have set

                                    of data of around 1000 to 4000

                                    observation (every feature is a

                                    spectrum of around 200 point) so in

                                    the end I have n_samples = ~1000 and

                                    n_features = ~20. The good things is

                                    that I am getting the same results

                                    of KMeans however the

                                    "predict_proba" has value only of 0

                                    or 1.<br>

                                  </div>

                                  <br>

                                </div>

                                I have wrote a small function to

                                simulate my problem with random data

                                that is reported below. The first 1/2 of

                                the array has the point with a positive

                                slope while the second 1/2 has a

                                negative slope, so the cross in the

                                middle. What I have seen is that for a

                                small number of features I obtain good

                                probability, but if the number of

                                features increases (say 50) than the

                                probability become only 0 or 1.<br>

                              </div>

                              Can someone help me in interpret this

                              result?<br>

                              <br>

                            </div>

                            Here is the code I wrote with the generated

                            random number, I'll generally run it with

                            ncomponent=2 and nfeatures=5 or 10 or 50 or

                            100. I am not sure if it will work in every

                            case is not very highly tested. I have also

                            attached as a file!<br>

                            <br>

                            ##############################<wbr>##############################<wbr>##############<br>

                            import numpy as

                            np                            <wbr>                              <wbr>                       

                            <br>

                            from sklearn.mixture import GaussianMixture,

                            BayesianGaussianMixture       <wbr>                        

                            <br>

                            import matplotlib.pyplot as

                            plt                           <wbr>                              <wbr>           

                            <br>

                                                          <wbr>                              <wbr>                              <wbr>         

                            <br>

                            def test_bgm(ncomponent,

                            nfeatures):                   <wbr>                              <wbr>              

                            <br>

                                temp = np.random.randn(500,nfeatures)<wbr>                              <wbr>                            

                            <br>

                                temp = temp + np.arange(-1,1,

                            2.0/nfeatures)                <wbr>                              <wbr>     

                            <br>

                                temp1 = np.random.randn(400,nfeatures)<wbr>                              <wbr>                           

                            <br>

                                temp1 = temp1 + np.arange(1,-1,

                            (-2.0/nfeatures))             <wbr>                              <wbr>   

                            <br>

                                X = np.vstack((temp,

                            temp1))                       <wbr>                              <wbr>              

                            <br>

                                                          <wbr>                              <wbr>                              <wbr>         

                            <br>

                                bgm = BayesianGaussianMixture(ncompo<wbr>nent,degrees_of_freedom_prior=<wbr>nfeatures*2).fit(X)          

                            <br>

                                bgm_proba =

                            bgm.predict_proba(X)          <wbr>                              <wbr>                       

                            <br>

                                bgm_labels =

                            bgm.predict(X)                <wbr>                              <wbr>                      

                            <br>

                                                          <wbr>                              <wbr>                              <wbr>         

                            <br>

                                plt.figure(-1)                <wbr>                              <wbr>                              <wbr>     

                            <br>

                                plt.imshow(bgm_labels.reshape(<wbr>30,-1),

                            origin='lower',

                            interpolatio='none')          <wbr>           

                            <br>

                                plt.colorbar()                <wbr>                              <wbr>                              <wbr>     

                            <br>

                                                          <wbr>                              <wbr>                              <wbr>         

                            <br>

                                for i in np.arange(0,ncomponent):      <wbr>                              <wbr>                          

                            <br>

                                    plt.figure(i)                 <wbr>                              <wbr>                              <wbr> 

                            <br>

                                    plt.imshow(bgm_proba[:,i].resh<wbr>ape(30,-1),

                            origin='lower',

                            interpolatio='none')          <wbr>    <br>

                                    plt.colorbar()                <wbr>                              <wbr>                              <wbr> 

                            <br>

                                                          <wbr>                              <wbr>                              <wbr>         

                            <br>

                                plt.show()    <br>

                            ##############################<wbr>##############################<wbr>##################<br>

                            <br>

                          </div>

                          Thank you in advance<span class="m_-3344882405610298014m_-2484064570224270571HOEnZb"><font color="#888888"><br>

                            </font></span></div>

                        <span class="m_-3344882405610298014m_-2484064570224270571HOEnZb"><font color="#888888">Tommaso<br>

                            <br clear="all">

                            <div>

                              <div>

                                <div>

                                  <div>

                                    <div>

                                      <div>

                                        <div>

                                          <div>

                                            <div>

                                              <div><br>

                                                -- <br>

                                                <div class="m_-3344882405610298014m_-2484064570224270571m_390746433550541163gmail_signature">

                                                  <div dir="ltr"><span></span><span>Please do NOT

                                                      send Microsoft

                                                      Office

                                                      Attachments:</span><br>

                                                    <div>

                                                      <a href="http://www.gnu.org/philosophy/no-word-attachments.html" target="_blank">http://www.gnu.org/philosophy/<wbr>no-word-attachments.html</a></div>

                                                  </div>

                                                </div>

                                              </div>

                                            </div>

                                          </div>

                                        </div>

                                      </div>

                                    </div>

                                  </div>

                                </div>

                              </div>

                            </div>

                          </font></span></div>

                      <br>

                    </div>

                  </div>

                  ______________________________<wbr>_________________<br>

                  scikit-learn mailing list<br>

                  <a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>

                  <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/listinfo/scikit-learn</a><br>

                  <br>

                </blockquote>

              </div>

              <br>

            </div>

            <br>

            ______________________________<wbr>_________________<br>

            scikit-learn mailing list<br>

            <a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>

            <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/listinfo/scikit-learn</a><br>

            <br>

          </blockquote>

        </div>

        <br>

        <br clear="all">

        <br>

        -- <br>

        <div class="m_-3344882405610298014gmail_signature" data-smartmail="gmail_signature">

          <div dir="ltr"><span></span><span>Please do NOT send Microsoft

              Office Attachments:</span><br>

            <div>

              <a href="http://www.gnu.org/philosophy/no-word-attachments.html" target="_blank">http://www.gnu.org/philosophy/<wbr>no-word-attachments.html</a></div>

          </div>

        </div>

      </div>

      <br>

      <fieldset class="m_-3344882405610298014mimeAttachmentHeader"></fieldset>

      <br>

      <pre>______________________________<wbr>_________________

scikit-learn mailing list

<a class="m_-3344882405610298014moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a>

<a class="m_-3344882405610298014moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a>

</pre>

    </blockquote>

    <br>

  </div>

<br>______________________________<wbr>_________________<br>

scikit-learn mailing list<br>

<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

<br></blockquote></div>