[scikit-learn] Accessing Clustering Feature Tree in Birch
Sema Atasever
s.atasever at gmail.com
Tue Oct 3 09:11:31 EDT 2017
Hi Roman,
Thank you for the detailed and informative answer.
On Mon, Oct 2, 2017 at 12:14 PM, Roman Yurchak <rth.yurchak at gmail.com>
wrote:
> Hello,
>
> sklearn.cluster.Birch follows the original BIRCH paper, that appears to be
> mostly focused on efficiently building the hierarchical clustering tree
> (and not so much on making the later analysis user friendly). The
> attributes exposed by Birch are those that could be reasonably exposed
> given the scikit-learn API constraints. Though, one does have access to the
> full cluster hierarchy via the Birch.root_.
>
> As Joel said, traversing the tree is a standard CS problem, and there is
> also probably a number of operations that could be done with it, depending
> on the application. For instance, for my use case, I found that
> re-constructing the Birch hierarchy using a custom container class for each
> subcluster was the easiest to run subsequent analysis with. A detailed
> example can be found here,
> http://freediscovery.io/doc/stable/python/examples/birch_clu
> ster_hierarchy.html
> Alternatively, I wonder if converting the tree to a format readable by
> some tree/graph specialized library (e.g. networkx) could be useful for
> analysis.
>
> Generally there is a number of places in scikit-learn where trees are used
> (Birch, AgglomerativeClustering, tree bases classifiers, etc) but for now
> there is no way to export the constructed tree to some standard format
> (apart for sklearn.tree.export_graphviz). Not sure if this is realistically
> achievable though..
>
> --
> Roman
>
> On 20/09/17 13:40, Sema Atasever wrote:
>
>> I need this information to use it in a scientific study and
>> I think that a function interface would make this easier.
>>
>> Thank you for your answer.
>>
>> On Sat, Sep 16, 2017 at 1:53 PM, Joel Nothman <joel.nothman at gmail.com
>> <mailto:joel.nothman at gmail.com>> wrote:
>>
>> There is no such thing as "the data samples in this cluster". The
>> point of Birch being online is that it loses any reference to the
>> individual samples that contributed to each node, but stores some
>> statistics on their basis. Roman Yurchak has, however, offered a PR
>> where, for the non-online case, storage of the indices contributing
>> to each node can be optionally turned on:
>> https://github.com/scikit-learn/scikit-learn/pull/8808
>> <https://github.com/scikit-learn/scikit-learn/pull/8808>
>>
>> As for finding what is contained under any particular node,
>> traversing the tree is a fairly basic task from a computer science
>> perspective. Before we were to support something to make this much
>> easier, I think we'd need to be clear on what kinds of use case we
>> were supporting. What do you hope to do with this information, and
>> what would a function interface look like that would make this much
>> easier?
>>
>> Decimals aren't a practical option as the branching factor may be
>> greater than 10, it is a hard structure to inspect, and susceptible
>> to computational imprecision. Better off with a list of tuples, but
>> what for that is not easy enough to do now?
>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> <https://mail.python.org/mailman/listinfo/scikit-learn>
>>
>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20171003/e1f1c1ef/attachment.html>
More information about the scikit-learn
mailing list