[SciPy-User] MemoryError in scipy.sparse.csgraph.shortest_path

Giovanni Luca Ciampaglia glciampagl at gmail.com
Tue May 21 14:49:34 EDT 2013


Hi,

I want to compute the shortest path distances in a sparse directed graph with 2M 
nodes but scipy.csgraph.shortest_path immediately throws a MemoryError -- 
regardless of the chosen method. This is what I do:

> In [2]: import numpy as np
>
> In [3]: import scipy.sparse as sp
>
> In [4]: import scipy.sparse.csgraph as csg
>
> In [5]: a = np.load('adjacency_isa.npy')
>
> In [6]: adj = sp.coo_matrix((a['weight'], (a['row'], a['col'])), (2351254,)*2)
>
> In [7]: adj = adj.tocsr()
>

And this is what I get:

> In [8]: D = csg.shortest_path(adj, directed=True)
> ---------------------------------------------------------------------------
> MemoryError Traceback (most recent call last)
> <ipython-input-8-a8a3285366c7> in <module>()
> ----> 1 D = csg.shortest_path(adj, directed=True)
>
> /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so in 
> scipy.sparse.csgraph._shortest_path.shortest_path 
> (scipy/sparse/csgraph/_shortest_path.c:2117)()
>
> /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so in 
> scipy.sparse.csgraph._shortest_path.dijkstra 
> (scipy/sparse/csgraph/_shortest_path.c:3948)()
>
> MemoryError:
>
> In [9]: D = csg.dijkstra(adj, directed=True)
> ---------------------------------------------------------------------------
> MemoryError Traceback (most recent call last)
> <ipython-input-9-dd917b3d30d9> in <module>()
> ----> 1 D = csg.dijkstra(adj, directed=True)
>
> /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so in 
> scipy.sparse.csgraph._shortest_path.dijkstra 
> (scipy/sparse/csgraph/_shortest_path.c:3948)()
>
> MemoryError:
>
> In [10]: D = csg.floyd_warshall(adj, directed=True)
> ---------------------------------------------------------------------------
> MemoryError Traceback (most recent call last)
> <ipython-input-10-709122a21d70> in <module>()
> ----> 1 D = csg.floyd_warshall(adj, directed=True)
>
> /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_shortest_path.so in 
> scipy.sparse.csgraph._shortest_path.floyd_warshall 
> (scipy/sparse/csgraph/_shortest_path.c:2457)()
>
> /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_validation.pyc 
> in validate_graph(csgraph, directed, dtype, csr_output, dense_output, 
> copy_if_dense, copy_if_sparse, null_value_in, null_value_out, infinity_null, 
> nan_null)
> 26 csgraph = csr_matrix(csgraph, dtype=DTYPE, copy=copy_if_sparse)
> 27 else:
> ---> 28 csgraph = csgraph_to_dense(csgraph, null_value=null_value_out)
> 29 elif np.ma.is_masked(csgraph):
> 30 if dense_output:
>
> /nfs/nfs4/home/gciampag/local/lib/python/scipy/sparse/csgraph/_tools.so in 
> scipy.sparse.csgraph._tools.csgraph_to_dense 
> (scipy/sparse/csgraph/_tools.c:2984)()
>
> MemoryError:


It seems that there are two distinct issues:

1. floyd_warshall() calls validate_graph with csr_output = False 
(_shortest_path.pyx:218), causing the graph to be converted to dense. I believe 
this a bug.
2. dijkstra creates a dense distance matrix (_shortest_path.pyx:409). I 
understand that one cannot make any assumption about the connectivity of the 
graph, and thus of the sparsity of the distance matrix itself; and of course I 
can get around this calling dijkstra multiple times with a manageable chunk of 
indices, and discarding the values that are equal to inf, but it would be 
nonetheless nice if the code tried to do something similar, at least for the 
cases when one knows that most of the distances will be inf.

Best,

-- 
Giovanni Luca Ciampaglia

Postdoctoral fellow
Center for Complex Networks and Systems Research
Indiana University

✎ 910 E 10th St ∙ Bloomington ∙ IN 47408
☞ http://cnets.indiana.edu/gciampag at indiana.edu




More information about the SciPy-User mailing list