Matrix-free version of connected_components
Hi all, I work with large sparse matrices which consists of several disconnected components and I only need one of these. So far I have succesfully been using scipy.sparse.csgraph.connected_components to dig out the component I needed. This algorithm does however require the entire sparse matrix as input, which sometimes is too large to fit in memory. For this reason I wondered whether there existed a matrix-free version of connected_components, or another algorithm achieving the same, where I would only need to provide a function calculating the sparse matrix-vector product. Any help, hints, or information would be greatly appreciated :) Cheers, Per
You can try spectral graph partioning, computing the Fiedler with arpack, that only needs matrix-vector product: http://mathworld.wolfram.com/FiedlerVector.html http://en.wikipedia.org/wiki/Algebraic_connectivity HTH, Gael On Wed, Aug 01, 2012 at 02:15:51PM +0200, Per Nielsen wrote:
Hi all,
I work with large sparse matrices which consists of several disconnected components and I only need one of these. So far I have succesfully been using scipy.sparse.csgraph.connected_components to dig out the component I needed. This algorithm does however require the entire sparse matrix as input, which sometimes is too large to fit in memory.
For this reason I wondered whether there existed a matrix-free version of connected_components, or another algorithm achieving the same, where I would only need to provide a function calculating the sparse matrix-vector product.
Any help, hints, or information would be greatly appreciated :)
Cheers, Per
-- Gael Varoquaux Researcher, INRIA Parietal Laboratoire de Neuro-Imagerie Assistee par Ordinateur NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
Thanks for the reply. This approach looks like it could do the job, although it appaers to be significantly more computationally demanding than connected_components, due to the repeated need to find the Fiedler vector of very large systems. But I will give it a go! :) Per On Wed, Aug 1, 2012 at 3:18 PM, Gael Varoquaux < gael.varoquaux@normalesup.org> wrote:
You can try spectral graph partioning, computing the Fiedler with arpack, that only needs matrix-vector product: http://mathworld.wolfram.com/FiedlerVector.html http://en.wikipedia.org/wiki/Algebraic_connectivity
HTH,
Gael
On Wed, Aug 01, 2012 at 02:15:51PM +0200, Per Nielsen wrote:
Hi all,
I work with large sparse matrices which consists of several disconnected components and I only need one of these. So far I have succesfully been using scipy.sparse.csgraph.connected_components to dig out the component I needed. This algorithm does however require the entire sparse matrix as input, which sometimes is too large to fit in memory.
For this reason I wondered whether there existed a matrix-free version of connected_components, or another algorithm achieving the same, where I would only need to provide a function calculating the sparse matrix-vector product.
Any help, hints, or information would be greatly appreciated :)
Cheers, Per
-- Gael Varoquaux Researcher, INRIA Parietal Laboratoire de Neuro-Imagerie Assistee par Ordinateur NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
I am not sure if this is what you are looking for, but an alternative approach would be to transform the matrix into a graph, and then get the connected components using, for eample, networkx. You can find a complete description about the topic in: "Complex networks: Structure and dynamics" S. Boccaletti et al. (2006) Phys. Reports. Sorry that I cannot provide anything more concise. David. On Thu, Aug 2, 2012 at 8:07 AM, Per Nielsen <evilper@gmail.com> wrote:
Thanks for the reply.
This approach looks like it could do the job, although it appaers to be significantly more computationally demanding than connected_components, due to the repeated need to find the Fiedler vector of very large systems. But I will give it a go! :)
Per
On Wed, Aug 1, 2012 at 3:18 PM, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
You can try spectral graph partioning, computing the Fiedler with arpack, that only needs matrix-vector product: http://mathworld.wolfram.com/FiedlerVector.html http://en.wikipedia.org/wiki/Algebraic_connectivity
HTH,
Gael
On Wed, Aug 01, 2012 at 02:15:51PM +0200, Per Nielsen wrote:
Hi all,
I work with large sparse matrices which consists of several disconnected components and I only need one of these. So far I have succesfully been using scipy.sparse.csgraph.connected_components to dig out the component I needed. This algorithm does however require the entire sparse matrix as input, which sometimes is too large to fit in memory.
For this reason I wondered whether there existed a matrix-free version of connected_components, or another algorithm achieving the same, where I would only need to provide a function calculating the sparse matrix-vector product.
Any help, hints, or information would be greatly appreciated :)
Cheers, Per
-- Gael Varoquaux Researcher, INRIA Parietal Laboratoire de Neuro-Imagerie Assistee par Ordinateur NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
I have looked into the functions provided by the networkX package, however, as far as I have been able to understand both networkX and scipy employ similar methods for finding the connected components. Thus I still need the full sparse matrix (or graph which I think is similar in memory usage) in memory, which I would like to avoid. Thank you for your reply and the reference. Per On Thu, Aug 2, 2012 at 12:50 PM, Daπid <davidmenhur@gmail.com> wrote:
I am not sure if this is what you are looking for, but an alternative approach would be to transform the matrix into a graph, and then get the connected components using, for eample, networkx. You can find a complete description about the topic in: "Complex networks: Structure and dynamics" S. Boccaletti et al. (2006) Phys. Reports. Sorry that I cannot provide anything more concise.
David.
On Thu, Aug 2, 2012 at 8:07 AM, Per Nielsen <evilper@gmail.com> wrote:
Thanks for the reply.
This approach looks like it could do the job, although it appaers to be significantly more computationally demanding than connected_components, due to the repeated need to find the Fiedler vector of very large systems. But I will give it a go! :)
Per
On Wed, Aug 1, 2012 at 3:18 PM, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
You can try spectral graph partioning, computing the Fiedler with
arpack,
that only needs matrix-vector product: http://mathworld.wolfram.com/FiedlerVector.html http://en.wikipedia.org/wiki/Algebraic_connectivity
HTH,
Gael
On Wed, Aug 01, 2012 at 02:15:51PM +0200, Per Nielsen wrote:
Hi all,
I work with large sparse matrices which consists of several disconnected components and I only need one of these. So far I have succesfully been using scipy.sparse.csgraph.connected_components to dig out the component I needed. This algorithm does however require the entire sparse matrix as input, which sometimes is too large to fit in memory.
For this reason I wondered whether there existed a matrix-free version of connected_components, or another algorithm achieving the same, where I would only need to provide a function calculating the sparse matrix-vector product.
Any help, hints, or information would be greatly appreciated :)
Cheers, Per
-- Gael Varoquaux Researcher, INRIA Parietal Laboratoire de Neuro-Imagerie Assistee par Ordinateur NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
On Thu, Aug 02, 2012 at 02:15:07PM +0200, Per Nielsen wrote:
I have looked into the functions provided by the networkX package, however, as far as I have been able to understand both networkX and scipy employ similar methods for finding the connected components. Thus I still need the full sparse matrix (or graph which I think is similar in memory usage) in memory, which I would like to avoid.
But you can easily recode them to replace the indexing by a function call that would return the list of neighbors for a given node. That way you don't have to store the graph, you can generate it on the fly. G
You are completely right and the networkX code seems simple enough so that I can actually manage it. Now I just need to optimize my graph/matrix function as it calculates the entire graph/matrix MUCH faster than going all the rows/columns individually. Thanks. Per On Thu, Aug 2, 2012 at 2:36 PM, Gael Varoquaux < gael.varoquaux@normalesup.org> wrote:
On Thu, Aug 02, 2012 at 02:15:07PM +0200, Per Nielsen wrote:
I have looked into the functions provided by the networkX package, however, as far as I have been able to understand both networkX and scipy employ similar methods for finding the connected components. Thus I still need the full sparse matrix (or graph which I think is similar in memory usage) in memory, which I would like to avoid.
But you can easily recode them to replace the indexing by a function call that would return the list of neighbors for a given node. That way you don't have to store the graph, you can generate it on the fly.
G _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
On Thu, Aug 2, 2012 at 6:15 AM, Per Nielsen <evilper@gmail.com> wrote:
I have looked into the functions provided by the networkX package, however, as far as I have been able to understand both networkX and scipy employ similar methods for finding the connected components. Thus I still need the full sparse matrix (or graph which I think is similar in memory usage) in memory, which I would like to avoid.
Thank you for your reply and the reference.
I'm not sure what your application is, but if you just need connected components and have an easy way to find neighbors, then unionfind will partition the set for you. Although the common version doesn't make it easy to extract them, I have an implementation that keeps the connected nodes in a circular list for just that application. <snip> Chuck
I'm not sure what your application is, but if you just need connected components and have an easy way to find neighbors, then unionfind will partition the set for you. Although the common version doesn't make it easy to extract them, I have an implementation that keeps the connected nodes in a circular list for just that application.
I would very like to have copy of your algorithm, it might be easier to modify than the networkX code as Gael suggested. Per
On Mon, Aug 6, 2012 at 3:33 AM, Per Nielsen <evilper@gmail.com> wrote:
I'm not sure what your application is, but if you just need connected components and have an easy way to find neighbors, then unionfind will partition the set for you. Although the common version doesn't make it easy to extract them, I have an implementation that keeps the connected nodes in a circular list for just that application.
I would very like to have copy of your algorithm, it might be easier to modify than the networkX code as Gael suggested.
Well, it's in Cython ;) It is set up to use integer indices as node labels and all components are extracted as lists of integers. It's also pretty old and needs documentation, so I'll do that and clean it up a bit. I'll be on travel for the next few days, so unless matters are pressing, I won't get to it until the weekend. Chuck
Ah hehe. I have had a hard time optimizing my graph neighbor extraction code, so right now it is actually to slow to be of any use. I will try to find a different approach and I dont want you spending any of your time cleaning code, but thank you very much for the offer :) Per On Tue, Aug 7, 2012 at 3:41 AM, Charles R Harris <charlesr.harris@gmail.com>wrote:
On Mon, Aug 6, 2012 at 3:33 AM, Per Nielsen <evilper@gmail.com> wrote:
I'm not sure what your application is, but if you just need connected components and have an easy way to find neighbors, then unionfind will partition the set for you. Although the common version doesn't make it easy to extract them, I have an implementation that keeps the connected nodes in a circular list for just that application.
I would very like to have copy of your algorithm, it might be easier to modify than the networkX code as Gael suggested.
Well, it's in Cython ;) It is set up to use integer indices as node labels and all components are extracted as lists of integers. It's also pretty old and needs documentation, so I'll do that and clean it up a bit. I'll be on travel for the next few days, so unless matters are pressing, I won't get to it until the weekend.
Chuck
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
participants (4)
-
Charles R Harris -
Daπid -
Gael Varoquaux -
Per Nielsen