<p dir="ltr">Dear Andreas,</p>
<p dir="ltr">thank you so much for your answser now I can see my mistake. What I am trying to do is convince myself that the fact that when I analyze my data I am getting probability of only 0 and 1 is it because the data are well separated so I was trying to make some synthetic data where there is a probabioity different from 0 or 1, but I did it in the wrong way. Does it sounds correct if I make 300 samples with random number centered at 0 and STD 1 and other 300 centered at 0.5 and then adding some samples in between these two gaussian distributions (say in between 0.15 and 0.35)? In this case I think that I should expect probability different from 0 or 1 in the two components (when using 2 components).</p>
<p dir="ltr">Thank you in advance<br>
Tommaso</p>
<div class="gmail_quote">On Nov 28, 2016 11:58 AM, "Andreas Mueller" <<a href="mailto:t3kcit@gmail.com">t3kcit@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    Hi Tommaso.<br>
    So what's the issue? The distributions are very distinct, so there
    is no confusion.<br>
    The higher the dimensionality, the further apart the points are
    (compare the distance between (-1, 1) and (1, -1) to the one between
    (-1, -.5, 0, .5, 1)  and (1, .5, 0, -.5, -1).<br>
    I'm not sure what you mean by "the cross in the middle".<br>
    You create two fixed points, one at np.arange(-1,1, 2.0/nfeatures)
    and one at np.arange(1,-1, (-2.0/nfeatures)). In high dimensions,
    these points are very far apart.<br>
    Then you add standard normal noise to it. So this data is two
    perfect Gaussians. In low dimensions, they are "close together" so
    there is some confusion,<br>
    in high dimensions, they are "far apart" so there is less confusion.<br>
    <br>
    Hth,<br>
    Andy<br>
    <br>
    <div class="m_-3344882405610298014moz-cite-prefix">On 11/27/2016 11:47 AM, Tommaso
      Costanzo wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr">
        <div>
          <div>
            <div>Hi Jacob,<br>
              <br>
            </div>
            I have just changed my code from BayesianGaussianMixture to
            GaussianMixture, and the results is the same. I attached
            here the picture of the first component when I runned the
            code with 5, 10, and 50 nfeatures and 2 components. In my
            short test function I expect to have point that they can be
            in one component as well as another has visible for small
            number of nfeatures, but 0 1 for nfeatures >50 does  not
            sounds correct. Seems that is just related to the size of
            the model and in particular to the number of features. With
            the BayesianGaussianMixture I have seen that it is sligthly
            better to increase the degree of freedoms to 2*nfeatures
            instead of the default nfeatures. However, this does not
            change the result when the nfeatures are 50 or more.<br>
            <br>
          </div>
          Thank you in advance<br>
        </div>
        Tommaso<br>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">2016-11-25 21:32 GMT-05:00 Jacob
          Schreiber <span dir="ltr"><<a href="mailto:jmschreiber91@gmail.com" target="_blank">jmschreiber91@gmail.com</a>></span>:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">Typically this means that the model is so
              confident in its predictions it does not believe it
              possible for the sample to come from the other component.
              Do you get the same results with a regular
              GaussianMixture? </div>
            <div class="gmail_extra"><br>
              <div class="gmail_quote">
                <div>
                  <div class="m_-3344882405610298014h5">On Fri, Nov 25, 2016 at 11:34 AM,
                    Tommaso Costanzo <span dir="ltr"><<a href="mailto:tommaso.costanzo01@gmail.com" target="_blank">tommaso.costanzo01@gmail.com</a>></span>
                    wrote:<br>
                  </div>
                </div>
                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <div>
                    <div class="m_-3344882405610298014h5">
                      <div dir="ltr">
                        <div>
                          <div>
                            <div>
                              <div>
                                <div>
                                  <div>
                                    <div>Hi,<br>
                                      <br>
                                    </div>
                                    I am facing some problem with the
                                    "BayesianGaussianMixture" function,
                                    but I do not know if it is because
                                    of my poor knowledge on this type of
                                    statistics or if it is something
                                    related to the algorithm. I have set
                                    of data of around 1000 to 4000
                                    observation (every feature is a
                                    spectrum of around 200 point) so in
                                    the end I have n_samples = ~1000 and
                                    n_features = ~20. The good things is
                                    that I am getting the same results
                                    of KMeans however the
                                    "predict_proba" has value only of 0
                                    or 1.<br>
                                  </div>
                                  <br>
                                </div>
                                I have wrote a small function to
                                simulate my problem with random data
                                that is reported below. The first 1/2 of
                                the array has the point with a positive
                                slope while the second 1/2 has a
                                negative slope, so the cross in the
                                middle. What I have seen is that for a
                                small number of features I obtain good
                                probability, but if the number of
                                features increases (say 50) than the
                                probability become only 0 or 1.<br>
                              </div>
                              Can someone help me in interpret this
                              result?<br>
                              <br>
                            </div>
                            Here is the code I wrote with the generated
                            random number, I'll generally run it with
                            ncomponent=2 and nfeatures=5 or 10 or 50 or
                            100. I am not sure if it will work in every
                            case is not very highly tested. I have also
                            attached as a file!<br>
                            <br>
                            ##############################<wbr>##############################<wbr>##############<br>
                            import numpy as
                            np                            <wbr>                              <wbr>                       
                            <br>
                            from sklearn.mixture import GaussianMixture,
                            BayesianGaussianMixture       <wbr>                        
                            <br>
                            import matplotlib.pyplot as
                            plt                           <wbr>                              <wbr>           
                            <br>
                                                          <wbr>                              <wbr>                              <wbr>         
                            <br>
                            def test_bgm(ncomponent,
                            nfeatures):                   <wbr>                              <wbr>              
                            <br>
                                temp = np.random.randn(500,nfeatures)<wbr>                              <wbr>                            
                            <br>
                                temp = temp + np.arange(-1,1,
                            2.0/nfeatures)                <wbr>                              <wbr>     
                            <br>
                                temp1 = np.random.randn(400,nfeatures)<wbr>                              <wbr>                           
                            <br>
                                temp1 = temp1 + np.arange(1,-1,
                            (-2.0/nfeatures))             <wbr>                              <wbr>   
                            <br>
                                X = np.vstack((temp,
                            temp1))                       <wbr>                              <wbr>              
                            <br>
                                                          <wbr>                              <wbr>                              <wbr>         
                            <br>
                                bgm = BayesianGaussianMixture(ncompo<wbr>nent,degrees_of_freedom_prior=<wbr>nfeatures*2).fit(X)          
                            <br>
                                bgm_proba =
                            bgm.predict_proba(X)          <wbr>                              <wbr>                       
                            <br>
                                bgm_labels =
                            bgm.predict(X)                <wbr>                              <wbr>                      
                            <br>
                                                          <wbr>                              <wbr>                              <wbr>         
                            <br>
                                plt.figure(-1)                <wbr>                              <wbr>                              <wbr>     
                            <br>
                                plt.imshow(bgm_labels.reshape(<wbr>30,-1),
                            origin='lower',
                            interpolatio='none')          <wbr>           
                            <br>
                                plt.colorbar()                <wbr>                              <wbr>                              <wbr>     
                            <br>
                                                          <wbr>                              <wbr>                              <wbr>         
                            <br>
                                for i in np.arange(0,ncomponent):      <wbr>                              <wbr>                          
                            <br>
                                    plt.figure(i)                 <wbr>                              <wbr>                              <wbr> 
                            <br>
                                    plt.imshow(bgm_proba[:,i].resh<wbr>ape(30,-1),
                            origin='lower',
                            interpolatio='none')          <wbr>    <br>
                                    plt.colorbar()                <wbr>                              <wbr>                              <wbr> 
                            <br>
                                                          <wbr>                              <wbr>                              <wbr>         
                            <br>
                                plt.show()    <br>
                            ##############################<wbr>##############################<wbr>##################<br>
                            <br>
                          </div>
                          Thank you in advance<span class="m_-3344882405610298014m_-2484064570224270571HOEnZb"><font color="#888888"><br>
                            </font></span></div>
                        <span class="m_-3344882405610298014m_-2484064570224270571HOEnZb"><font color="#888888">Tommaso<br>
                            <br clear="all">
                            <div>
                              <div>
                                <div>
                                  <div>
                                    <div>
                                      <div>
                                        <div>
                                          <div>
                                            <div>
                                              <div><br>
                                                -- <br>
                                                <div class="m_-3344882405610298014m_-2484064570224270571m_390746433550541163gmail_signature">
                                                  <div dir="ltr"><span></span><span>Please do NOT
                                                      send Microsoft
                                                      Office
                                                      Attachments:</span><br>
                                                    <div>
                                                      <a href="http://www.gnu.org/philosophy/no-word-attachments.html" target="_blank">http://www.gnu.org/philosophy/<wbr>no-word-attachments.html</a></div>
                                                  </div>
                                                </div>
                                              </div>
                                            </div>
                                          </div>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </div>
                          </font></span></div>
                      <br>
                    </div>
                  </div>
                  ______________________________<wbr>_________________<br>
                  scikit-learn mailing list<br>
                  <a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
                  <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/listinfo/scikit-learn</a><br>
                  <br>
                </blockquote>
              </div>
              <br>
            </div>
            <br>
            ______________________________<wbr>_________________<br>
            scikit-learn mailing list<br>
            <a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
            <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/listinfo/scikit-learn</a><br>
            <br>
          </blockquote>
        </div>
        <br>
        <br clear="all">
        <br>
        -- <br>
        <div class="m_-3344882405610298014gmail_signature" data-smartmail="gmail_signature">
          <div dir="ltr"><span></span><span>Please do NOT send Microsoft
              Office Attachments:</span><br>
            <div>
              <a href="http://www.gnu.org/philosophy/no-word-attachments.html" target="_blank">http://www.gnu.org/philosophy/<wbr>no-word-attachments.html</a></div>
          </div>
        </div>
      </div>
      <br>
      <fieldset class="m_-3344882405610298014mimeAttachmentHeader"></fieldset>
      <br>
      <pre>______________________________<wbr>_________________
scikit-learn mailing list
<a class="m_-3344882405610298014moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a>
<a class="m_-3344882405610298014moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a>
</pre>
    </blockquote>
    <br>
  </div>

<br>______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
<br></blockquote></div>