[scikit-learn] Applying clustering to cosine distance matrix
vaggi.federico at gmail.com
Mon Feb 12 16:49:46 EST 2018
As a caveat, a lot of clustering algorithms assume that the distance matrix
is a proper metric. If your distance is not a proper metric then the
results might be meaningless (the narrative docs do a good job of
On Mon, 12 Feb 2018 at 13:30 prince gosavi <princegosavi12 at gmail.com> wrote:
> Thanks for those tips Sebastian.That just saved my day.
> On Tue, Feb 13, 2018 at 12:44 AM, Sebastian Raschka <se.raschka at gmail.com>
>> [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible
>> for Automatic Cleanup! (se.raschka at gmail.com) Add cleanup rule
>> | More info
>> by default, the clustering classes from sklearn, (e.g., DBSCAN), take an
>> [num_examples, num_features] array as input, but you can also provide the
>> distance matrix directly, e.g., by instantiating it with
>> my_dbscan = DBSCAN(..., metric='precomputed')
>> Not sure if it helps in that particular case (depending on how many zero
>> elements you have), you can also use a sparse matrix in CSR format (
>> Also, you don't need to for-loop through the rows if you want to compute
>> the pair-wise distances, you can simply do that on the complete array. E.g.,
>> from sklearn.metrics.pairwise import cosine_distances
>> from scipy import sparse
>> distance_matrix = cosine_distances(sparse.csr_matrix(X),
>> where X is your "[num_examples, num_features]" array.
>> > On Feb 12, 2018, at 1:10 PM, prince gosavi <princegosavi12 at gmail.com>
> > I have generated a cosine distance matrix and would like to apply
>> clustering algorithm to the given matrix.
>> > np.shape(distance_matrix)==(14000,14000)
>> > I would like to know which clustering suits better and is there any
>> need to process the data further to get it in the form so that a model can
>> be applied.
>> > Also any performance tip as the matrix takes around 3-4 hrs of
>> > You can find my code here
>> > Code for READ ONLY PURPOSE.
>> > --
>> > Regards
>> > _______________________________________________
>> > scikit-learn mailing list
>> > scikit-learn at python.org
>> > https://mail.python.org/mailman/listinfo/scikit-learn
>> scikit-learn mailing list
>> scikit-learn at python.org
> scikit-learn mailing list
> scikit-learn at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the scikit-learn