[SciPy-User] MemoryError in scipy.sparse.csgraph.shortest_path
Giovanni Luca Ciampaglia
glciampagl at gmail.com
Tue May 21 14:49:34 EDT 2013
Hi,
I want to compute the shortest path distances in a sparse directed graph with 2M
nodes but scipy.csgraph.shortest_path immediately throws a MemoryError --
regardless of the chosen method. This is what I do:
> In [2]: import numpy as np
>
> In [3]: import scipy.sparse as sp
>
> In [4]: import scipy.sparse.csgraph as csg
>
> In [5]: a = np.load('adjacency_isa.npy')
>
> In [6]: adj = sp.coo_matrix((a['weight'], (a['row'], a['col'])), (2351254,)*2)
>
> In [7]: adj = adj.tocsr()
>
And this is what I get:
> In [8]: D = csg.shortest_path(adj, directed=True)
> ---------------------------------------------------------------------------
> MemoryError Traceback (most recent call last)
> <ipython-input-8-a8a3285366c7> in <module>()
> ----> 1 D = csg.shortest_path(adj, directed=True)
>
> /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so in
> scipy.sparse.csgraph._shortest_path.shortest_path
> (scipy/sparse/csgraph/_shortest_path.c:2117)()
>
> /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so in
> scipy.sparse.csgraph._shortest_path.dijkstra
> (scipy/sparse/csgraph/_shortest_path.c:3948)()
>
> MemoryError:
>
> In [9]: D = csg.dijkstra(adj, directed=True)
> ---------------------------------------------------------------------------
> MemoryError Traceback (most recent call last)
> <ipython-input-9-dd917b3d30d9> in <module>()
> ----> 1 D = csg.dijkstra(adj, directed=True)
>
> /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so in
> scipy.sparse.csgraph._shortest_path.dijkstra
> (scipy/sparse/csgraph/_shortest_path.c:3948)()
>
> MemoryError:
>
> In [10]: D = csg.floyd_warshall(adj, directed=True)
> ---------------------------------------------------------------------------
> MemoryError Traceback (most recent call last)
> <ipython-input-10-709122a21d70> in <module>()
> ----> 1 D = csg.floyd_warshall(adj, directed=True)
>
> /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so in
> scipy.sparse.csgraph._shortest_path.floyd_warshall
> (scipy/sparse/csgraph/_shortest_path.c:2457)()
>
> /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_validation.pyc
> in validate_graph(csgraph, directed, dtype, csr_output, dense_output,
> copy_if_dense, copy_if_sparse, null_value_in, null_value_out, infinity_null,
> nan_null)
> 26 csgraph = csr_matrix(csgraph, dtype=DTYPE, copy=copy_if_sparse)
> 27 else:
> ---> 28 csgraph = csgraph_to_dense(csgraph, null_value=null_value_out)
> 29 elif np.ma.is_masked(csgraph):
> 30 if dense_output:
>
> /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_tools.so in
> scipy.sparse.csgraph._tools.csgraph_to_dense
> (scipy/sparse/csgraph/_tools.c:2984)()
>
> MemoryError:
It seems that there are two distinct issues:
1. floyd_warshall() calls validate_graph with csr_output = False
(_shortest_path.pyx:218), causing the graph to be converted to dense. I believe
this a bug.
2. dijkstra creates a dense distance matrix (_shortest_path.pyx:409). I
understand that one cannot make any assumption about the connectivity of the
graph, and thus of the sparsity of the distance matrix itself; and of course I
can get around this calling dijkstra multiple times with a manageable chunk of
indices, and discarding the values that are equal to inf, but it would be
nonetheless nice if the code tried to do something similar, at least for the
cases when one knows that most of the distances will be inf.
Best,
--
Giovanni Luca Ciampaglia
Postdoctoral fellow
Center for Complex Networks and Systems Research
Indiana University
✎ 910 E 10th St ∙ Bloomington ∙ IN 47408
☞ http://cnets.indiana.edu/
✉ gciampag at indiana.edu
More information about the SciPy-User
mailing list