[Python-ideas] Graph class

Tue Dec 18 19:21:42 CET 2012

On 18 December 2012 16:06, Terry Reedy <tjreedy at udel.edu> wrote:
> On 12/18/2012 9:24 AM, Yuval Greenfield wrote:
>> On Tue, Dec 18, 2012 at 2:08 PM, Oscar Benjamin
>> <oscar.j.benjamin at gmail.com
>> <mailto:oscar.j.benjamin at gmail.com>> wrote:
>>
>>     The graph algorithms that are the most useful can be written in terms
>>     of two things:
>>     1) An iterator over the nodes
>
>
> Or iterable if re-iteration is needed.

True. Although there aren't many cases where re-iteration is needed.
The main exception would be if you wanted to instantiate a new Graph
as a result of the algorithm. For example a transitive pruning
function could be written to accept a factory like

def transitive_prune(nodes, childfunc, factory):
    return factory(nodes, pruned_edges(nodes, childfunc))

in which case you need to be able to iterate once over the nodes for
the pruning algorithm and once to construct the new graph. In these
cases, the fact that you want to instantiate a new graph suggests that
your original graph was a concrete data structure so that it is
probably okay to require an iterable. To mutate the graph in place the
user would need to supply a function to remove edges:

def transitive_prune(nodes, childfunc, remove_edge):

>>     2) A way to map each node into an iterator over its children (or
>>     partners)
>
> A callable could be either an iterator class or a generator function.
>
>> Some graphs don't care for the nodes, all their information is in the
>> edges. That's why most graph frameworks have iter_edges and iter_nodes
>> functions.

This is true. Some algorithms would rather have this information.
There are also a few that can proceed just from a particular node
rather than needing an iterable over all nodes.

>> I'm not sure what's the clean way to represent the
>> optional directionality of edges though.

I would have said that each API entry point should state how it will
interpret the edges. Are there algorithms that simultaneously make
sense for directed and undirected graphs while needing to behave
differently in the two cases (in which case is it really the same
algorithm)?

> Oscar, I don't consider hashability an issue. General class instances are
> hashable by default. One can even consider such instances as hashable
> facades for unhashable dicts. Giving each instance a list attribute does the
> same for lists.

True, I've not found hashability to be a problem in practice.

> The more important question, it seems to me, is whether to represent nodes
> by counts and let the algorithm do its bookkeeping in private structures, or
> to represent them by externally defined instances that the algorithm
> mutates.

I don't think I understand: How would the "externally defined instances" work?

Do you mean that the caller of a function would supply functions like
mark(), is_marked(), set_colour(), get_colour() and so on? If that's
the case what would the advantages be? I can think of one: if desired
the algorithm could be made to store all of its computations in say a
database so that it would be very scalable. Though to me that seems
like quite a specialised case that would probably merit from
reimplementing the desired algorithm anyway. Otherwise I guess it's a
lot simpler/safer to implement everything in private data structures.

Oscar