<div dir="ltr"><div><div>Hi,<br>Thanks for those tips Sebastian.That just saved my day.<br><br></div>Regards,<br></div>Rajkumar<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 13, 2018 at 12:44 AM, Sebastian Raschka <span dir="ltr"><<a href="mailto:se.raschka@gmail.com" target="_blank">se.raschka@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="font-size:12px;color:#777;font-family:'Lucida Grande',Helvetica,Arial,sans-serif;background-color:#fff;padding:4px">
<a href="https://www.boxbe.com/overview" style="text-decoration:none;color:#5e96ea" target="_blank"><img alt="Boxbe" style="margin-left:0px;border:none" width="64px"></a>

<img>

This message is eligible for Automatic Cleanup! (<a href="mailto:se.raschka@gmail.com" target="_blank">se.raschka@gmail.com</a>) 
<a style="text-decoration:none;color:#5e96ea" href="https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3D0a2mz6HiALxmseA8EtEa3hg8FtAfQyTwNzLAvbS3JOk%253D%26token%3D8qZlnKU2OJ%252BeTscNUfA9PjpDKa2%252FZO8i9dvKkAyr7bKz%252Bi2MdFTFnLILfmhv4s3s%252Bva0Dy7LpRz63wO18BlP48DNIu3aSb%252FmxAVjQq1fCD0tDxFcxxdH2mq9Otany%252FdER3CzXyokyLg%253D&tc_serial=36653890807&tc_rand=854549477&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001" title="Add a new automatic cleanup rule" target="_blank">Add cleanup rule</a>
 | <a style="text-decoration:none;color:#5e96ea" href="http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=36653890807&tc_rand=854549477&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001" title="Get info on automatic cleanup" target="_blank">More info</a>
<br>

    </div>

<br>Hi,<br>
<br>
by default, the clustering classes from sklearn, (e.g., DBSCAN), take an [num_examples, num_features] array as input, but you can also provide the distance matrix directly, e.g., by instantiating it with metric='precomputed'<br>
<br>
my_dbscan = DBSCAN(..., metric='precomputed')<br>
my_dbscan.fit(my_distance_<wbr>matrix)<br>
<br>
Not sure if it helps in that particular case (depending on how many zero elements you have), you can also use a sparse matrix in CSR format (<a href="https://docs.scipy.org/doc/scipy-1.0.0/reference/generated/scipy.sparse.csr_matrix.html" rel="noreferrer" target="_blank">https://docs.scipy.org/doc/<wbr>scipy-1.0.0/reference/<wbr>generated/scipy.sparse.csr_<wbr>matrix.html</a>).<br>
<br>
Also, you don't need to for-loop through the rows if you want to compute the pair-wise distances, you can simply do that on the complete array. E.g.,<br>
<br>
from sklearn.metrics.pairwise import cosine_distances<br>
from scipy import sparse<br>
<br>
distance_matrix = cosine_distances(sparse.csr_<wbr>matrix(X), dense_output=False)<br>
<br>
where X is your "[num_examples, num_features]" array.<br>
<br>
Best,<br>
Sebastian<br>
<br>
<br>
> On Feb 12, 2018, at 1:10 PM, prince gosavi <<a href="mailto:princegosavi12@gmail.com">princegosavi12@gmail.com</a>> wrote:<br>
><br>
> I have generated a cosine distance matrix and would like to apply clustering algorithm to the given matrix.<br>
> np.shape(distance_matrix)==(<wbr>14000,14000)<br>
><br>
> I would like to know which clustering suits better and is there any need to process the data further to get it in the form so that a model can be applied.<br>
> Also any performance tip as the matrix takes around 3-4 hrs of processing.<br>
> You can find my code here <a href="https://github.com/maxyodedara5/BE_Project/blob/master/main.ipynb" rel="noreferrer" target="_blank">https://github.com/<wbr>maxyodedara5/BE_Project/blob/<wbr>master/main.ipynb</a><br>
> Code for READ ONLY PURPOSE.<br>
> --<br>
> Regards<br>
> ______________________________<wbr>_________________<br>
> scikit-learn mailing list<br>
> <a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
<br>
______________________________<wbr>_________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>Regards<br></div></div></div></div></div>
</div>