[scikit-learn] Accessing Clustering Feature Tree in Birch

Joel Nothman joel.nothman at gmail.com
Sat Sep 16 06:53:52 EDT 2017


There is no such thing as "the data samples in this cluster". The point of
Birch being online is that it loses any reference to the individual samples
that contributed to each node, but stores some statistics on their basis.
Roman Yurchak has, however, offered a PR where, for the non-online case,
storage of the indices contributing to each node can be optionally turned
on: https://github.com/scikit-learn/scikit-learn/pull/8808

As for finding what is contained under any particular node, traversing the
tree is a fairly basic task from a computer science perspective. Before we
were to support something to make this much easier, I think we'd need to be
clear on what kinds of use case we were supporting. What do you hope to do
with this information, and what would a function interface look like that
would make this much easier?

Decimals aren't a practical option as the branching factor may be greater
than 10, it is a hard structure to inspect, and susceptible to
computational imprecision. Better off with a list of tuples, but what for
that is not easy enough to do now?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170916/c19f3757/attachment.html>


More information about the scikit-learn mailing list