[scikit-learn] [semi-supervised learning] Using a pre-existing graph with LabelSpreading API
clay at woolam.org
Mon Dec 5 16:50:01 EST 2016
Heya, sorry for not responding sooner.
Running those algorithms algorithm is expensive (O(n^3) from memory), so
that's going to be a big limiting factor. And I worry that your graph may
be too big for these algorithsm. The max_iter param is certainly available
for tuning which trade-off the accuracy of the result. Totally speculating:
I don't think sparsifying would help too much with these implementations.
These both create fully connected graphs as part of the graph construction
step. I think sparsification would help a lot if you instead directly
simulated the particle movements through the graph, instead of using these
For #2, what if you subclassed the LabelSpreading class and overrode
inject the graph that you set up? May be a big hack.
On Thu, Dec 1, 2016 at 7:33 PM, Delip Rao <deliprao at gmail.com> wrote:
> I have an existing graph dataset in the edge format:
> node_i node_j weight
> The number of nodes are around 3.6M, and the number of edges are around
> I also have some labeled data (around a dozen per class with 16 classes in
> total), so overall, a perfect setting for label propagation or its
> variants. In particular, I want to try the LabelSpreading implementation
> for the regularization. I looked at the documentation and can't find a way
> to plug in a pre-computed graph (or adjacency matrix). So two questions:
> 1. What are any scaling issues I should be aware of for a dataset of this
> size? I can try sparsifying the graph, but would love to learn any knobs I
> should be aware of.
> 2. How do I plugin an existing weighted graph with the current API? Happy
> to use any undocumented features.
> Thanks in advance!
> scikit-learn mailing list
> scikit-learn at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the scikit-learn