[scikit-learn] Agglomerative Clustering without knowing number of clusters

Ariani A b.noushin7 at gmail.com
Thu Jul 13 19:21:41 EDT 2017


Dear Shane,
Sorry bothering you!
Is the "precomputed" and "distance matrix" you are talking about, are about
"DBSCAN" ?
Thanks,
Best.

On Thu, Jul 13, 2017 at 7:03 PM, Ariani A <b.noushin7 at gmail.com> wrote:

> Dear Shane,
> Thanks for your prompt answer.
> Do you mean that for DBSCAN there is no need to feed other parameters? Do
> I just call the function or I have to manipulate the code?
> P.S. I was not able to find the DBSCAN code on github.
> Looking forward to hearing from you.
> Best,
> -Noushin
>
> On Thu, Jul 13, 2017 at 5:38 PM, Shane Grigsby <shane.grigsby at colorado.edu
> > wrote:
>
>> Hi Ariani,
>> Yes, you can use a distance matrix-- I think that what you want is
>> metric='precomputed', and then X would be your N by N distance matrix.
>> Hope that helps,
>> ~Shane
>>
>>
>> On 07/13, Ariani A wrote:
>>
>>> Dear Shane,
>>> Thanks for your answer.
>>> Does DBSCAN works with distance matrix/? I have a distance matrix
>>> (symmetric matrix which contains pairwise distances). Can you help me? I
>>> did not find DBSCAN code in that link.
>>> Best,
>>> -Ariani
>>>
>>> On Thu, Jul 6, 2017 at 12:32 PM, Shane Grigsby <
>>> shane.grigsby at colorado.edu>
>>> wrote:
>>>
>>> This sounds like it may be a problem more amenable to either DBSCAN or
>>>> OPTICS. Both algorithms don't require a priori knowledge of the number
>>>> of
>>>> clusters, and both let you specify a minimum point membership threshold
>>>> for
>>>> cluster membership. The OPTICS algorithm will also produce a dendrogram
>>>> that you can cut for sub clusters if need be.
>>>>
>>>> DBSCAN is part of the stable release and has been for some time; OPTICS
>>>> is
>>>> pending as a pull request, but it's stable and you can try it if you
>>>> like:
>>>>
>>>> https://github.com/scikit-learn/scikit-learn/pull/1984
>>>>
>>>> Cheers,
>>>> Shane
>>>>
>>>>
>>>> On 06/30, Ariani A wrote:
>>>>
>>>> I want to perform agglomerative clustering, but I have no idea of number
>>>>> of
>>>>> clusters before hand. But I want that every cluster has at least 40
>>>>> data
>>>>> points in it. How can I apply this to sklearn.agglomerative clustering?
>>>>> Should I use dendrogram and cut it somehow? I have no idea how to
>>>>> relate
>>>>> dendrogram to this and cutting it out. Any help will be appreciated!
>>>>>
>>>>>
>>>> _______________________________________________
>>>>
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>>>
>>>>
>>>> --
>>>> *PhD candidate & Research Assistant*
>>>> *Cooperative Institute for Research in Environmental Sciences (CIRES)*
>>>> *University of Colorado at Boulder*
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>>
>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>
>>
>> --
>> *PhD candidate & Research Assistant*
>> *Cooperative Institute for Research in Environmental Sciences (CIRES)*
>> *University of Colorado at Boulder*
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170713/77e56342/attachment.html>


More information about the scikit-learn mailing list