Which graph library is best suited for large graphs?
Brian J Mingus
Brian.Mingus at Colorado.EDU
Sat Dec 12 00:35:00 CET 2009
On Fri, Dec 11, 2009 at 3:12 AM, Wolodja Wentland <
wentland at cl.uni-heidelberg.de> wrote:
> Hi all,
> I am writing a library for accessing Wikipedia data and include a module
> that generates graphs from the Link structure between articles and other
> pages (like categories).
> These graphs could easily contain some million nodes which are frequently
> linked. The graphs I am building right now have around 300.000 nodes
> with an average in/out degree of - say - 4 and already need around 1-2GB of
> memory. I use networkx to model the graphs and serialise them to files on
> the disk. (using adjacency list format, pickle and/or graphml).
> The recent thread on including a graph library in the stdlib spurred my
> interest and introduced me to a number of libraries I have not seen
> before. I would like to reevaluate my choice of networkx and need some
> help in doing so.
> I really like the API of networkx but have no problem in switching to
> another one (right now) .... I have the impression that graph-tool might
> be faster and have a smaller memory footprint than networkx, but am
> unsure about that.
> Which library would you choose? This decision is quite important for me
> as the choice will influence my libraries external interface. Or is
> there something like WSGI for graph libraries?
> kind regards
I once computed the PageRank of the English Wikipedia. I ended up using the
Boost graph library, of which there is a parallel implementation that runs
on clusters. I tried to do it using Python but failed as the memory
requirements were so large. Boost and the parallel version both have python
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-list