[scikit-learn] [semi-supervised learning] Using a pre-existing graph with LabelSpreading API

Clay Woolam clay at woolam.org
Mon Dec 5 16:50:01 EST 2016


Heya, sorry for not responding sooner.

Running those algorithms algorithm is expensive (O(n^3) from memory), so
that's going to be a big limiting factor. And I worry that your graph may
be too big for these algorithsm. The max_iter param is certainly available
for tuning which trade-off the accuracy of the result. Totally speculating:
I don't think sparsifying would help too much with these implementations.
These both create fully connected graphs as part of the graph construction
step. I think sparsification would help a lot if you instead directly
simulated the particle movements through the graph, instead of using these
exact solutions.

For #2, what if you subclassed the LabelSpreading class and overrode
_build_graph
<https://github.com/scikit-learn/scikit-learn/blob/a5ab948/sklearn/semi_supervised/label_propagation.py#L449>
to
inject the graph that you set up? May be a big hack.

On Thu, Dec 1, 2016 at 7:33 PM, Delip Rao <deliprao at gmail.com> wrote:

> Hello,
>
> I have an existing graph dataset in the edge format:
>
> node_i node_j weight
>
> The number of nodes are around 3.6M, and the number of edges are around
> 72M.
>
> I also have some labeled data (around a dozen per class with 16 classes in
> total), so overall, a perfect setting for label propagation or its
> variants. In particular, I want to try the LabelSpreading implementation
> for the regularization. I looked at the documentation and can't find a way
> to plug in a pre-computed graph (or adjacency matrix). So two questions:
>
> 1. What are any scaling issues I should be aware of for a dataset of this
> size? I can try sparsifying the graph, but would love to learn any knobs I
> should be aware of.
> 2. How do I plugin an existing weighted graph with the current API? Happy
> to use any undocumented features.
>
> Thanks in advance!
> Delip
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161205/6c19c302/attachment.html>


More information about the scikit-learn mailing list