[Python-ideas] Graph class

Tue Dec 18 13:08:50 CET 2012

On 18 December 2012 09:06, Terry Reedy <tjreedy at udel.edu> wrote:
> On 12/17/2012 10:26 PM, Nick Coghlan wrote:
>> A graph library that focused on defining a good abstraction (and
>> adapters) that allowed graph algorithms to be written that worked with
>> multiple existing Python graph data structures could be quite interesting.
>
>
> I was just thinking that what is needed, at least as a first step, is a
> graph api, like the db api, that would allow the writing of algorithms to
> one api and adapters to various implementations. I expect to be writing some
> graph algorithms (in Python) in the next year and will try to keep that idea
> in mind and see if it makes any sense, versus just whipping up a
> implementation that fits the particular problem.

I'd be interested to use (and possibly to contribute to) a graph
library of this type on PyPI. I have some suggestions about the
appropriate level of abstraction below.

The graph algorithms that are the most useful can be written in terms
of two things:
1) An iterator over the nodes
2) A way to map each node into an iterator over its children (or partners)

It is also required to place some restriction on how the nodes can be
used. Descriptions of graph algorithms refer to marking/colouring the
nodes of a graph. If the nodes are instances of user defined classes,
then you can do this in a relatively literal sense by adding
attributes to the nodes, but this is fragile in the event of errors,
not thread-safe, etc.

Really though, the idea of marking the nodes just means that you need
an O(1) method for determining if a node has been marked or checking
the value that it was marked with. In Python this is easily done with
sets and dicts, which is not fragile and is thread-safe etc. (provided
the graph is not being mutated). This requires that the nodes be
hashable.

In the thread about deleting keys from a dict yesterday it occurred to
me (after MRAB's suggestion) that you can still apply the same methods
to non-hashable objects. That is, provided you have a situation where
node equality is determined by node identity you can just use id(node)
in each hash table. While this method works equally well for
user-defined class instances it does not work for immutable types
where, for example, two strings may be equal but have differing id()s.

One way to cover all cases is simply to provide a hashkey argument to
each algorithm that defaults to the identity function (lambda x: x),
but may be replaced by the id function in appropriate cases. This
means that all of the graph algorithms that I would want can be
implemented with a basic signature that goes like:

def strongly_connected(nodes, edgesfunc, hashkey=None):
    '''
    `nodes` is an iterable over the nodes of the graph.
    `edgesfunc(node)` is an iterable over the children of node.
    `hashkey` is an optional key function to apply when adding
    nodes to a hash-table. For mutable objects where identity
    is equality use `hashkey=id`.
    '''
    if hashkey is None:
        # Would be great to have operator.identity here
        hashkey = lambda x: x

There are some cases where optimisation is possible given additional
information. One example: I think it is possible to conclude that an
undirected graph contains at least one cycle if |E|>=|V|, so in this
case an optional hint parameter could give a shortcut for some graphs.
Generally, though, there are few algorithms where other quantities are
either required or are sufficient for all input graphs (exceptions to
the sufficient part of this rule are typically the relatively easy
algorithms like determining the mean degree).

Once you have algorithms that are implemented in this way it becomes
possible to piece them together as a concrete graph class, a mixin, an
ABC, a decorator that works like @functools.total_ordering or some
other class-based idiom. Crucially, though, unlike all of these class
based approaches, defining the algorithms firstly in a functional way
makes it easy to apply them to any data structure composed of
elementary types or of classes that you yourself cannot write or
subclass.

Oscar