very large graph

boggom at comcast.net boggom at comcast.net
Thu Jun 26 13:54:34 CEST 2008


Drawing a large graph like this is not
very insightful by itself, and doing this well
is still an art form.
Many cool visualizations, and all
very domain and question dependent,
can be found at
http://www.visualcomplexity.com/vc/

You can also search on flickr for network
and graph drawing.

Much of it is eyecandy though. What do you
really want to know?

Your graph is not overly huge for python, provided
you do not want to compute nonlocal statistics
such as betweenness metrics. (If you have
a laptop running windows and with
less than 1GB RAM, then you have a really large graph.)

Given that you do not know much about graph theory,
I would suggest that networkx applied to a
subset of your large website --- one of the
representative branches --- will allow
for the fastest way to figure out what you
want to do. Python provides access to many other
libraries for html mangling or even webpage
analysis. Imagine writing C code to ask questions
like: how many pdfs and images? how big are they?
when were these pages last updated? etc.
You can come up with 10 questions faster
than you can write C code, and this where
python has its great advantage.

You can learn graph theory while using these libraries,
on one of the smaller branches of your website tree.
Even interactively by using ipython.
This provides at least a feeling of great power while
stumbling in the dark.

Many of your edges are probably repetitions
associated with navigation menus to provide
a standard look-and-feel. See how much of that
you can strip out and find the cross-links that took
effort to insert an manage. (I suggest
doing the analyis on a digraph rather than a graph,
even if you want to draw it as graph.)


For visualization, the new ubigraph is quite
fast compared to graphviz.
See the cool networkx + ubigraph
video at http://ubietylab.net/blog/


Ask on the networkx mailinglist when stuck.




More information about the Python-list mailing list