<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Should we have "low memory"/batched version of k_neighbors_graph
      and epsilon_neighbors_graph functions? I assume<br>
      those instantiate the dense matrix right now.<br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 05/13/2018 10:59 PM, Joel Nothman
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAAkaFLWep1PZaY6e-eDQrtcZgUjLcdqU1+6t6EX+WrJRVM1yPg@mail.gmail.com">
      <div dir="ltr">This is quite a common issue with our
        implementation of DBSCAN, and improvements to documentation
        would be very, very welcome.
        <div><br>
        </div>
        <div>The high memory cost comes from constructing the pairwise
          radius neighbors for all points. If using a distance metric
          that cannot be indexed with a KD-tree or Ball Tree, this
          results in n^2 floats being stored in memory even before the
          radius neighbors are computed.<br>
          <div><br>
          </div>
          <div>You have the following strategies available to you
            currently:</div>
        </div>
        <div><br>
        </div>
        <div>1. Calculate the radius neighborhoods using
          radius_neighbors_graph in chunks, so as to avoid all pairs
          being calculated and stored at once. This produces a sparse
          graph representation, which can be passed into dbscan with
          metric='precomputed'. (I've just seen Sebastian suggested the
          same.)</div>
        <div>2. Reduce the number of samples in your dataset and
          represent (near-)duplicate points with sample_weight (i.e. two
          identical points would be merged but would have a
          sample_weight of 2).</div>
        <div><br>
        </div>
        <div>There is also<span
style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> a
            proposal to offer an alternative memory-efficient mode at <a
href="https://github.com/scikit-learn/scikit-learn/pull/6813"
              moz-do-not-send="true">https://github.com/scikit-learn/scikit-learn/pull/6813</a>.
            Feedback is welcome.</span></div>
        <div><span
style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br>
          </span></div>
        <div><span
style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Cheers,</span></div>
        <div><span
style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br>
          </span></div>
        <div><span
style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Joel</span></div>
        <div><br>
        </div>
        <div><br>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
scikit-learn mailing list
<a class="moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>
<a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>