very large graph
MRAB
google at mrabarnett.plus.com
Tue Jun 24 05:40:08 EDT 2008
On Jun 24, 1:26 am, chrispoliq... at gmail.com wrote:
> I need to represent the hyperlinks between a large number of HTML
> files as a graph. My non-directed graph will have about 63,000 nodes
> and and probably close to 500,000 edges.
>
> I have looked into igraph (http://cneurocvs.rmki.kfki.hu/igraph/doc/
> python/index.html) and networkX (https://networkx.lanl.gov/wiki) for
> generating a file to store the graph, and I have also looked into
> Graphviz for visualization. I'm just not sure which modules are
> best. I need to be able to do the following:
>
> 1) The names of my nodes are not known ahead of time, so I will
> extract the title from all the HTML files to name the nodes prior to
> parsing the files for hyperlinks (edges).
>
> 2) Every file will be parsed for links and nondirectional connections
> will be drawn between the two nodes.
>
> 3) The files might link to each other so the graph package needs to
> be able to check to see if an edge between two nodes already exists,
> or at least not double draw connections between the two nodes when
> adding edges.
>
> I'm relatively new to graph theory so I would greatly appreciate any
> suggestions for filetypes. I imagine doing this as a python
> dictionary with a list for the edges and a node:list paring is out of
> the question for such a large graph?
Perhaps a dictionary where the key is a node and the value is a set of
destination nodes?
More information about the Python-list
mailing list