[Python-ideas] Fwd: Graph class

Sun Dec 16 23:41:28 CET 2012

On 9 December 2012 20:31, Mark Adam <dreamingforward at gmail.com> wrote:
>
> On Sun, Dec 9, 2012 at 5:40 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> > On 9 December 2012 01:29, Mark Adam <dreamingforward at gmail.com> wrote:
> >> All very interesting.  I'm going to suggest a sort of "meta-discussion"
> >> about why -- despite the power of graphs as a data structure -- such a
> >> feature has not stabilized into a workable solution for inclusion in a
> >> high-level language like Python.
> >>
> >> I identity the following points of "wavery":
> >>
> >> 1) the naming of methods (add_edge, vs add(1,2)):  aesthetic grounds,
> >> 2) what methods to include (degree + neighbors or the standard dict's
> >> __len__ + __getitem__):  API grounds
> >> 3) how much flexibility to be offered (directed, multi-graphs, edge weights
> >> with arbitrary labeling, etc.):  functionality grounds
> >> 4) what underlying data structure to use (sparse adjacency dicts, matrices,
> >> etc):  representation conflicts.

For all the reasons above I don't much see the utility of implementing
some kind of standard graph class. There are too many possibilities
for any one implementation to be generally applicable. I have
implemented graphs in Python many times and I very often find that the
detail of what I want to do leads me to create a different
implementation. Another consideration that you've not mentioned is the
occasional need for a OrderedGraph that keeps track of some kind of
order for its vertices.

> > 4) Whether the library requires some sort of "Vertex" type, or works
> > with arbitrary values, similarly whether there is a defined "Edge"
> > class or edges can be labelled, weighted, etc with arbitrary Python
> > values.
>
> This I put under #3 (functionality grounds) "edge weights with
> arbitrary labeling", Vertex's with abitrary values i think would be
> included.

Having implemented graphs a few times now, I have come to the
conclusion that it is a good idea to make the restriction that the
vertices should be hashable. Otherwise, how would you get O(1)
behaviour for methods like has_edge()?

> > 5) Granularity - if all I want is a depth-first search algorithm, why
> > pull in a dependency on 100 graph algorithms I'm not interested in?
>
> Hmm, I would call this "5) comprehensiveness: whether to include every
> graph algorithm known to mankind."

This is the one part of a graph library that is really useful.
Creating a class or a data structure that represents a graph in some
way is trivially easy. Creating trustworthy implementations of all the
graph-theoretic algorithms with the right kind of big-O behaviour is
not.

> > My feeling is that graphs are right on the borderline of a data
> > structure that is simple enough that people invent their own rather
> > than bother conforming to a "standard" model but complex enough that
> > it's worth using library functions rather than getting the details
> > wrong.

What details would you get wrong? I contend that it is very easy to
implement a graph without getting any of the details wrong. It is the
graph algorithms that are hard, not the data structure. Here's a
couple of examples:

G = {
    'A':{'B', 'C'},
    'B':{'A'},
    'C':{'B'}
}

M = [[0, 1, 1],
     [1, 0, 0],
     [0, 1, 0]]

You may want to wrap the above in some kind of class in which case
you'll end up with something like the following (from a private
project - modified a little before posting so it may not work now):

class Graph:

    def __init__(self, nodes, edges):
        self._nodes = frozenset(nodes)
        self._edges = defaultdict(set)
        for n1, n2 in edges:
            self._edges[n1].add(n2)

    @property
    def nodes(self):
        return iter(self._nodes)

    @property
    def edges(self):
        for n1 in self._nodes:
            for n2 in self.edges_node(n1):
                yield (n1, n2)

    def edges_node(self, node):
        return iter(self._edges[node])

    def has_edge(self, nfrom, nto):
        return nto in self._edges[nfrom]

    def __str__(self):
        return '\n'.join(self._iterdot())

    def _iterdot(self):
        yield 'digraph G {'
        for n in self.nodes:
            yield '    %s;' % n
        for nfrom, nto in self.edges:
            yield '    %s -> %s;' % (nfrom, nto)
        yield '}'

G2 = Graph('ABC', [('A', 'B'), ('A', 'C'), ('B', 'C'), ('C', 'A')])

The above class is unusual in the sense that it is a pure Graph class.
Normally I would simply be adding a few graphy methods onto a class
that represents a network of some kind.

The problem that I found with current support for graphs in Python is
not the lack of an appropriate data structure. Rather the problem is
that implementations of graph-theoretic algorithms (as in e.g.
pygraph) are tied to a specific Graph class that I didn't want to or
couldn't use in my own project. This means that to determine if you
have something that represents a strongly connected graph you first
need to create a separate redundant data structure and then pass that
into the algorithm.

What would be more useful than a new Graph class would be
implementations of graph algorithms that can easily be applied to any
representation of a graph. As an example, I can write a function for
determining if a graph is a DAG using a small subset of the possible
methods that a Graph class would have:

def is_dag(nodes, edges_node):
    '''Determine if a directed graph is acyclic

    nodes is an iterable yielding all vertices in the graph
    edges_node(node) is an iterable giving all nodes that node connects to
    '''
    visited = set()
    visiting = set()
    for node in nodes:
        if node not in visited:
            if has_backedge(node, edges_node, visited, visiting):
                return False
    else:
        return True

def has_backedge(node, edges_node, visited, visiting):
    '''Helper for is_dag()'''
    if node in visiting:
        return True
    visited.add(node)
    visiting.add(node)
    for childnode in edges_node(node):
        if has_backedge(childnode, edges_node, visited, visiting):
            return True
    visiting.remove(node)
    return False

This can be used with the Graph class or just as easily with the
dict-of-sets like so:

is_dag(G2.nodes, G2.edges_node)
is_dag(G, G.__getitem__)

It is possible to do something like this for all of the graph
algorithms and I think that a library like this would be more useful
than a new Graph type.