<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Hi Tommaso.<br>
    So what's the issue? The distributions are very distinct, so there
    is no confusion.<br>
    The higher the dimensionality, the further apart the points are
    (compare the distance between (-1, 1) and (1, -1) to the one between
    (-1, -.5, 0, .5, 1)  and (1, .5, 0, -.5, -1).<br>
    I'm not sure what you mean by "the cross in the middle".<br>
    You create two fixed points, one at np.arange(-1,1, 2.0/nfeatures)
    and one at np.arange(1,-1, (-2.0/nfeatures)). In high dimensions,
    these points are very far apart.<br>
    Then you add standard normal noise to it. So this data is two
    perfect Gaussians. In low dimensions, they are "close together" so
    there is some confusion,<br>
    in high dimensions, they are "far apart" so there is less confusion.<br>
    <br>
    Hth,<br>
    Andy<br>
    <br>
    <div class="moz-cite-prefix">On 11/27/2016 11:47 AM, Tommaso
      Costanzo wrote:<br>
    </div>
    <blockquote
cite="mid:CAHMJyZfDzC_joCk9HkEfz+wi5JvX8mvT2O2Xt+8yUE3NWBtJ-w@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>
          <div>
            <div>Hi Jacob,<br>
              <br>
            </div>
            I have just changed my code from BayesianGaussianMixture to
            GaussianMixture, and the results is the same. I attached
            here the picture of the first component when I runned the
            code with 5, 10, and 50 nfeatures and 2 components. In my
            short test function I expect to have point that they can be
            in one component as well as another has visible for small
            number of nfeatures, but 0 1 for nfeatures >50 does  not
            sounds correct. Seems that is just related to the size of
            the model and in particular to the number of features. With
            the BayesianGaussianMixture I have seen that it is sligthly
            better to increase the degree of freedoms to 2*nfeatures
            instead of the default nfeatures. However, this does not
            change the result when the nfeatures are 50 or more.<br>
            <br>
          </div>
          Thank you in advance<br>
        </div>
        Tommaso<br>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">2016-11-25 21:32 GMT-05:00 Jacob
          Schreiber <span dir="ltr"><<a moz-do-not-send="true"
              href="mailto:jmschreiber91@gmail.com" target="_blank">jmschreiber91@gmail.com</a>></span>:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">Typically this means that the model is so
              confident in its predictions it does not believe it
              possible for the sample to come from the other component.
              Do you get the same results with a regular
              GaussianMixture? </div>
            <div class="gmail_extra"><br>
              <div class="gmail_quote">
                <div>
                  <div class="h5">On Fri, Nov 25, 2016 at 11:34 AM,
                    Tommaso Costanzo <span dir="ltr"><<a
                        moz-do-not-send="true"
                        href="mailto:tommaso.costanzo01@gmail.com"
                        target="_blank">tommaso.costanzo01@gmail.com</a>></span>
                    wrote:<br>
                  </div>
                </div>
                <blockquote class="gmail_quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <div>
                    <div class="h5">
                      <div dir="ltr">
                        <div>
                          <div>
                            <div>
                              <div>
                                <div>
                                  <div>
                                    <div>Hi,<br>
                                      <br>
                                    </div>
                                    I am facing some problem with the
                                    "BayesianGaussianMixture" function,
                                    but I do not know if it is because
                                    of my poor knowledge on this type of
                                    statistics or if it is something
                                    related to the algorithm. I have set
                                    of data of around 1000 to 4000
                                    observation (every feature is a
                                    spectrum of around 200 point) so in
                                    the end I have n_samples = ~1000 and
                                    n_features = ~20. The good things is
                                    that I am getting the same results
                                    of KMeans however the
                                    "predict_proba" has value only of 0
                                    or 1.<br>
                                  </div>
                                  <br>
                                </div>
                                I have wrote a small function to
                                simulate my problem with random data
                                that is reported below. The first 1/2 of
                                the array has the point with a positive
                                slope while the second 1/2 has a
                                negative slope, so the cross in the
                                middle. What I have seen is that for a
                                small number of features I obtain good
                                probability, but if the number of
                                features increases (say 50) than the
                                probability become only 0 or 1.<br>
                              </div>
                              Can someone help me in interpret this
                              result?<br>
                              <br>
                            </div>
                            Here is the code I wrote with the generated
                            random number, I'll generally run it with
                            ncomponent=2 and nfeatures=5 or 10 or 50 or
                            100. I am not sure if it will work in every
                            case is not very highly tested. I have also
                            attached as a file!<br>
                            <br>
                            ##############################<wbr>##############################<wbr>##############<br>
                            import numpy as
                            np                            <wbr>                              <wbr>                       
                            <br>
                            from sklearn.mixture import GaussianMixture,
                            BayesianGaussianMixture       <wbr>                        
                            <br>
                            import matplotlib.pyplot as
                            plt                           <wbr>                              <wbr>           
                            <br>
                                                          <wbr>                              <wbr>                              <wbr>         
                            <br>
                            def test_bgm(ncomponent,
                            nfeatures):                   <wbr>                              <wbr>              
                            <br>
                                temp = np.random.randn(500,nfeatures)<wbr>                              <wbr>                            
                            <br>
                                temp = temp + np.arange(-1,1,
                            2.0/nfeatures)                <wbr>                              <wbr>     
                            <br>
                                temp1 = np.random.randn(400,nfeatures)<wbr>                              <wbr>                           
                            <br>
                                temp1 = temp1 + np.arange(1,-1,
                            (-2.0/nfeatures))             <wbr>                              <wbr>   
                            <br>
                                X = np.vstack((temp,
                            temp1))                       <wbr>                              <wbr>              
                            <br>
                                                          <wbr>                              <wbr>                              <wbr>         
                            <br>
                                bgm = BayesianGaussianMixture(ncompo<wbr>nent,degrees_of_freedom_prior=<wbr>nfeatures*2).fit(X)          
                            <br>
                                bgm_proba =
                            bgm.predict_proba(X)          <wbr>                              <wbr>                       
                            <br>
                                bgm_labels =
                            bgm.predict(X)                <wbr>                              <wbr>                      
                            <br>
                                                          <wbr>                              <wbr>                              <wbr>         
                            <br>
                                plt.figure(-1)                <wbr>                              <wbr>                              <wbr>     
                            <br>
                                plt.imshow(bgm_labels.reshape(<wbr>30,-1),
                            origin='lower',
                            interpolatio='none')          <wbr>           
                            <br>
                                plt.colorbar()                <wbr>                              <wbr>                              <wbr>     
                            <br>
                                                          <wbr>                              <wbr>                              <wbr>         
                            <br>
                                for i in np.arange(0,ncomponent):      <wbr>                              <wbr>                          
                            <br>
                                    plt.figure(i)                 <wbr>                              <wbr>                              <wbr> 
                            <br>
                                    plt.imshow(bgm_proba[:,i].resh<wbr>ape(30,-1),
                            origin='lower',
                            interpolatio='none')          <wbr>    <br>
                                    plt.colorbar()                <wbr>                              <wbr>                              <wbr> 
                            <br>
                                                          <wbr>                              <wbr>                              <wbr>         
                            <br>
                                plt.show()    <br>
                            ##############################<wbr>##############################<wbr>##################<br>
                            <br>
                          </div>
                          Thank you in advance<span
                            class="m_-2484064570224270571HOEnZb"><font
                              color="#888888"><br>
                            </font></span></div>
                        <span class="m_-2484064570224270571HOEnZb"><font
                            color="#888888">Tommaso<br>
                            <br clear="all">
                            <div>
                              <div>
                                <div>
                                  <div>
                                    <div>
                                      <div>
                                        <div>
                                          <div>
                                            <div>
                                              <div><br>
                                                -- <br>
                                                <div
                                                  class="m_-2484064570224270571m_390746433550541163gmail_signature">
                                                  <div dir="ltr"><span
                                                      style="font-family:"lucida
console","courier new",courier,monospace"></span><span
                                                      style="font-family:"lucida
console","courier new",courier,monospace">Please do NOT
                                                      send Microsoft
                                                      Office
                                                      Attachments:</span><br
style="font-family:"lucida console","courier
                                                      new",courier,monospace">
                                                    <div>
                                                      <a
                                                        moz-do-not-send="true"
style="font-family:"lucida console","courier
                                                        new",courier,monospace"
href="http://www.gnu.org/philosophy/no-word-attachments.html"
                                                        target="_blank">http://www.gnu.org/philosophy/<wbr>no-word-attachments.html</a></div>
                                                  </div>
                                                </div>
                                              </div>
                                            </div>
                                          </div>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </div>
                          </font></span></div>
                      <br>
                    </div>
                  </div>
                  ______________________________<wbr>_________________<br>
                  scikit-learn mailing list<br>
                  <a moz-do-not-send="true"
                    href="mailto:scikit-learn@python.org"
                    target="_blank">scikit-learn@python.org</a><br>
                  <a moz-do-not-send="true"
                    href="https://mail.python.org/mailman/listinfo/scikit-learn"
                    rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/listinfo/scikit-learn</a><br>
                  <br>
                </blockquote>
              </div>
              <br>
            </div>
            <br>
            ______________________________<wbr>_________________<br>
            scikit-learn mailing list<br>
            <a moz-do-not-send="true"
              href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
            <a moz-do-not-send="true"
              href="https://mail.python.org/mailman/listinfo/scikit-learn"
              rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
            <br>
          </blockquote>
        </div>
        <br>
        <br clear="all">
        <br>
        -- <br>
        <div class="gmail_signature" data-smartmail="gmail_signature">
          <div dir="ltr"><span style="font-family:'Lucida
              Console','Courier New',Courier,monospace"></span><span
              style="font-family:'Lucida Console','Courier
              New',Courier,monospace">Please do NOT send Microsoft
              Office Attachments:</span><br style="font-family:'Lucida
              Console','Courier New',Courier,monospace">
            <div>
              <a moz-do-not-send="true"
                href="http://www.gnu.org/philosophy/no-word-attachments.html"
                style="font-family:'Lucida Console','Courier
                New',Courier,monospace" target="_blank">http://www.gnu.org/philosophy/no-word-attachments.html</a></div>
          </div>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
scikit-learn mailing list
<a class="moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>
<a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>