
Meant this to go to the whole list. Sorry. On Sun, Dec 9, 2012 at 5:40 AM, Paul Moore <p.f.moore@gmail.com> wrote:
This I put under #3 (functionality grounds) "edge weights with arbitrary labeling", Vertex's with abitrary values i think would be included.
5) Granularity - if all I want is a depth-first search algorithm, why pull in a dependency on 100 graph algorithms I'm not interested in?
Hmm, I would call this "5) comprehensiveness: whether to include every graph algorithm known to mankind."
But this is also why (on both counts) it would be good to include it in the standard library. The *simplicity* of a graph makes everyone re-implement it, or (worse) work with some cruder work-around (like a dict of dicts but not at all clear you're dealing with an actual graph). But imagine if, for example, Subversion used a python graph class to track all branches and nodes in it's distributed revision control system. Then how easy it would be for third parties to make tools to view repos or other developers to come in and work with the dev team: they're already familiar with the standard graph class structure. And for the *complex enough* case, obviously it helps to have a standard library help you out and just provide the sophistication of a graph class for you. There are a lot of obvious uses for a graph, but if you don't know of it, a beginning programmer won't *think* of it and make some crude work-around. Mark

Mark Adam writes:
This is a fallacy. As has been pointed out, there is a variety of graphs, a large variety of computations to be done on and in them, and a huge variety in algorithms for dealing with those varied tasks. For a "standard" graph class to be useful enough to become the OOWTDI, it would need to deal with a large fraction of those aspects of graph theory. Even so, people would only really internalize the parts they need for the present task, forgetting or (worse) misremembering functionality that doesn't work for them right now. Corner cases would force many tasks to be done outside of the standard class. Differences in taste would surely result in a large number of API variants to reflect users' preferred syntaxes for representing graphs, and so on. I think making a "Graph" class that has a chance of becoming the OOWTDI is a big task. Not as big as SciPy, say, but then, SciPy isn't being proposed for stdlib inclusion, either. As usual for stdlib additions, I think this discussion would best be advanced not by "going all meta", but rather by proposing specific packages (either already available, perhaps on PyPI, or new ones -- but with actual code) for inclusion. The "meta" discussion should be conducted with specific reference to the advantages or shortcomings of those specific packages. N.B. A reasonably comprehensive package that has seen significant real-world use, and preferably has a primary distribution point of PyPI, would be the shortest path to inclusion.

On Sun, Dec 9, 2012 at 8:08 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Yes, but the basic data structure concept is NOT in development, it is already well-developed. The remaining issues of API are wholly secondary. The usefulness of a graph is unquestioned and creates cross-functionality across a large number of possible interesting domains. Really its like having a car when everyone else is walking (...but should the car be a buick or a toyota? a four-door or two? -- is all besides the point) But your issue of proposing an actual implementation is well-taken rather than spend a lot of time arguing over it all. With any luck, I'll try to distill networkx with my work and put it all together. haha mark

Mark Adam <dreamingforward@...> writes:
In terms of use cases, you might be interested in potential users of any stdlib graph library. I'm working on distlib [1], which evolved out of distutils2 and uses graphs in a couple of places: 1. A dependency graph for distributions. This came directly from distutils2, though I've added a couple of bits to it such as topological sorting and determination of strongly-connected components. 2. A lightweight sequencer for build steps, added to avoid the approach in distutils/distutils2 which makes it harder than necessary to handle custom build steps. I didn't use the graph system used in point 1, as it was too specific, and I haven't had time to look at refactoring it. There's another potential use case in the area of packaging, though perhaps not in distlib itself: the idea of generating build artifacts based on their dependencies. Ideally, this would consider not only build artifacts and their dependencies, but also the builders themselves as part of the graph. Regards, Vinay Sajip [1] https://distlib.readthedocs.org/en/latest/

I think of graphs and trees as patterns, not data structures. -- --Guido van Rossum (on iPad)

On 12/16/2012 04:41 PM, Guido van Rossum wrote:
I think of graphs and trees as patterns, not data structures.
How do you draw line between what is data structure and what is pattern ? Do you have any ideas on how to represent "patterns" in python standard library ? By a set of samples ? By (a set of) classes realising the patterns ? By a set of functions working on existing structures which implement the pattern ? Duck-typing should lend itself well to this last approach. Do we currently have any modules in standard library which are more patterns and less data structures ? ----------------------- Hannu

On Mon, Dec 17, 2012 at 9:28 AM, Hannu Krosing <hannu@krosing.net> wrote:
A rough rule of thumb is that if it's harder to remember the configuration options in the API than it is to just write a purpose-specific function, it's probably better as a pattern that can be tweaked for a given use case than it is as an actual data structure. More generally, ABCs and magic methods are used to express patterns (like iteration), which may be implemented by various data structures. A graph library that focused on defining a good abstraction (and adapters) that allowed graph algorithms to be written that worked with multiple existing Python graph data structures could be quite interesting. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 12/17/2012 10:26 PM, Nick Coghlan wrote:
I was just thinking that what is needed, at least as a first step, is a graph api, like the db api, that would allow the writing of algorithms to one api and adapters to various implementations. I expect to be writing some graph algorithms (in Python) in the next year and will try to keep that idea in mind and see if it makes any sense, versus just whipping up a implementation that fits the particular problem. -- Terry Jan Reedy

On 18 December 2012 09:06, Terry Reedy <tjreedy@udel.edu> wrote:
I'd be interested to use (and possibly to contribute to) a graph library of this type on PyPI. I have some suggestions about the appropriate level of abstraction below. The graph algorithms that are the most useful can be written in terms of two things: 1) An iterator over the nodes 2) A way to map each node into an iterator over its children (or partners) It is also required to place some restriction on how the nodes can be used. Descriptions of graph algorithms refer to marking/colouring the nodes of a graph. If the nodes are instances of user defined classes, then you can do this in a relatively literal sense by adding attributes to the nodes, but this is fragile in the event of errors, not thread-safe, etc. Really though, the idea of marking the nodes just means that you need an O(1) method for determining if a node has been marked or checking the value that it was marked with. In Python this is easily done with sets and dicts, which is not fragile and is thread-safe etc. (provided the graph is not being mutated). This requires that the nodes be hashable. In the thread about deleting keys from a dict yesterday it occurred to me (after MRAB's suggestion) that you can still apply the same methods to non-hashable objects. That is, provided you have a situation where node equality is determined by node identity you can just use id(node) in each hash table. While this method works equally well for user-defined class instances it does not work for immutable types where, for example, two strings may be equal but have differing id()s. One way to cover all cases is simply to provide a hashkey argument to each algorithm that defaults to the identity function (lambda x: x), but may be replaced by the id function in appropriate cases. This means that all of the graph algorithms that I would want can be implemented with a basic signature that goes like: def strongly_connected(nodes, edgesfunc, hashkey=None): ''' `nodes` is an iterable over the nodes of the graph. `edgesfunc(node)` is an iterable over the children of node. `hashkey` is an optional key function to apply when adding nodes to a hash-table. For mutable objects where identity is equality use `hashkey=id`. ''' if hashkey is None: # Would be great to have operator.identity here hashkey = lambda x: x There are some cases where optimisation is possible given additional information. One example: I think it is possible to conclude that an undirected graph contains at least one cycle if |E|>=|V|, so in this case an optional hint parameter could give a shortcut for some graphs. Generally, though, there are few algorithms where other quantities are either required or are sufficient for all input graphs (exceptions to the sufficient part of this rule are typically the relatively easy algorithms like determining the mean degree). Once you have algorithms that are implemented in this way it becomes possible to piece them together as a concrete graph class, a mixin, an ABC, a decorator that works like @functools.total_ordering or some other class-based idiom. Crucially, though, unlike all of these class based approaches, defining the algorithms firstly in a functional way makes it easy to apply them to any data structure composed of elementary types or of classes that you yourself cannot write or subclass. Oscar

On Tue, Dec 18, 2012 at 2:08 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com>wrote:
Some graphs don't care for the nodes, all their information is in the edges. That's why most graph frameworks have iter_edges and iter_nodes functions. I'm not sure what's the clean way to represent the optional directionality of edges though. Some example API's from networkx: http://networkx.lanl.gov/reference/classes.html http://networkx.lanl.gov/reference/classes.digraph.html

On 12/18/2012 9:24 AM, Yuval Greenfield wrote:
Or iterable if re-iteration is needed.
2) A way to map each node into an iterator over its children (or partners)
A callable could be either an iterator class or a generator function.
Thank you both the the 'thought food'. Defining things in terms of iterables and iterators instead of (for instance) sets is certainly the Python3 way. Oscar, I don't consider hashability an issue. General class instances are hashable by default. One can even consider such instances as hashable facades for unhashable dicts. Giving each instance a list attribute does the same for lists. The more important question, it seems to me, is whether to represent nodes by counts and let the algorithm do its bookkeeping in private structures, or to represent them by externally defined instances that the algorithm mutates. -- Terry Jan Reedy

On 18 December 2012 16:06, Terry Reedy <tjreedy@udel.edu> wrote:
True. Although there aren't many cases where re-iteration is needed. The main exception would be if you wanted to instantiate a new Graph as a result of the algorithm. For example a transitive pruning function could be written to accept a factory like def transitive_prune(nodes, childfunc, factory): return factory(nodes, pruned_edges(nodes, childfunc)) in which case you need to be able to iterate once over the nodes for the pruning algorithm and once to construct the new graph. In these cases, the fact that you want to instantiate a new graph suggests that your original graph was a concrete data structure so that it is probably okay to require an iterable. To mutate the graph in place the user would need to supply a function to remove edges: def transitive_prune(nodes, childfunc, remove_edge):
This is true. Some algorithms would rather have this information. There are also a few that can proceed just from a particular node rather than needing an iterable over all nodes.
I'm not sure what's the clean way to represent the optional directionality of edges though.
I would have said that each API entry point should state how it will interpret the edges. Are there algorithms that simultaneously make sense for directed and undirected graphs while needing to behave differently in the two cases (in which case is it really the same algorithm)?
True, I've not found hashability to be a problem in practice.
I don't think I understand: How would the "externally defined instances" work? Do you mean that the caller of a function would supply functions like mark(), is_marked(), set_colour(), get_colour() and so on? If that's the case what would the advantages be? I can think of one: if desired the algorithm could be made to store all of its computations in say a database so that it would be very scalable. Though to me that seems like quite a specialised case that would probably merit from reimplementing the desired algorithm anyway. Otherwise I guess it's a lot simpler/safer to implement everything in private data structures. Oscar

On 9 December 2012 20:31, Mark Adam <dreamingforward@gmail.com> wrote:
For all the reasons above I don't much see the utility of implementing some kind of standard graph class. There are too many possibilities for any one implementation to be generally applicable. I have implemented graphs in Python many times and I very often find that the detail of what I want to do leads me to create a different implementation. Another consideration that you've not mentioned is the occasional need for a OrderedGraph that keeps track of some kind of order for its vertices.
Having implemented graphs a few times now, I have come to the conclusion that it is a good idea to make the restriction that the vertices should be hashable. Otherwise, how would you get O(1) behaviour for methods like has_edge()?
This is the one part of a graph library that is really useful. Creating a class or a data structure that represents a graph in some way is trivially easy. Creating trustworthy implementations of all the graph-theoretic algorithms with the right kind of big-O behaviour is not.
What details would you get wrong? I contend that it is very easy to implement a graph without getting any of the details wrong. It is the graph algorithms that are hard, not the data structure. Here's a couple of examples: G = { 'A':{'B', 'C'}, 'B':{'A'}, 'C':{'B'} } M = [[0, 1, 1], [1, 0, 0], [0, 1, 0]] You may want to wrap the above in some kind of class in which case you'll end up with something like the following (from a private project - modified a little before posting so it may not work now): class Graph: def __init__(self, nodes, edges): self._nodes = frozenset(nodes) self._edges = defaultdict(set) for n1, n2 in edges: self._edges[n1].add(n2) @property def nodes(self): return iter(self._nodes) @property def edges(self): for n1 in self._nodes: for n2 in self.edges_node(n1): yield (n1, n2) def edges_node(self, node): return iter(self._edges[node]) def has_edge(self, nfrom, nto): return nto in self._edges[nfrom] def __str__(self): return '\n'.join(self._iterdot()) def _iterdot(self): yield 'digraph G {' for n in self.nodes: yield ' %s;' % n for nfrom, nto in self.edges: yield ' %s -> %s;' % (nfrom, nto) yield '}' G2 = Graph('ABC', [('A', 'B'), ('A', 'C'), ('B', 'C'), ('C', 'A')]) The above class is unusual in the sense that it is a pure Graph class. Normally I would simply be adding a few graphy methods onto a class that represents a network of some kind. The problem that I found with current support for graphs in Python is not the lack of an appropriate data structure. Rather the problem is that implementations of graph-theoretic algorithms (as in e.g. pygraph) are tied to a specific Graph class that I didn't want to or couldn't use in my own project. This means that to determine if you have something that represents a strongly connected graph you first need to create a separate redundant data structure and then pass that into the algorithm. What would be more useful than a new Graph class would be implementations of graph algorithms that can easily be applied to any representation of a graph. As an example, I can write a function for determining if a graph is a DAG using a small subset of the possible methods that a Graph class would have: def is_dag(nodes, edges_node): '''Determine if a directed graph is acyclic nodes is an iterable yielding all vertices in the graph edges_node(node) is an iterable giving all nodes that node connects to ''' visited = set() visiting = set() for node in nodes: if node not in visited: if has_backedge(node, edges_node, visited, visiting): return False else: return True def has_backedge(node, edges_node, visited, visiting): '''Helper for is_dag()''' if node in visiting: return True visited.add(node) visiting.add(node) for childnode in edges_node(node): if has_backedge(childnode, edges_node, visited, visiting): return True visiting.remove(node) return False This can be used with the Graph class or just as easily with the dict-of-sets like so: is_dag(G2.nodes, G2.edges_node) is_dag(G, G.__getitem__) It is possible to do something like this for all of the graph algorithms and I think that a library like this would be more useful than a new Graph type.

Mark Adam writes:
This is a fallacy. As has been pointed out, there is a variety of graphs, a large variety of computations to be done on and in them, and a huge variety in algorithms for dealing with those varied tasks. For a "standard" graph class to be useful enough to become the OOWTDI, it would need to deal with a large fraction of those aspects of graph theory. Even so, people would only really internalize the parts they need for the present task, forgetting or (worse) misremembering functionality that doesn't work for them right now. Corner cases would force many tasks to be done outside of the standard class. Differences in taste would surely result in a large number of API variants to reflect users' preferred syntaxes for representing graphs, and so on. I think making a "Graph" class that has a chance of becoming the OOWTDI is a big task. Not as big as SciPy, say, but then, SciPy isn't being proposed for stdlib inclusion, either. As usual for stdlib additions, I think this discussion would best be advanced not by "going all meta", but rather by proposing specific packages (either already available, perhaps on PyPI, or new ones -- but with actual code) for inclusion. The "meta" discussion should be conducted with specific reference to the advantages or shortcomings of those specific packages. N.B. A reasonably comprehensive package that has seen significant real-world use, and preferably has a primary distribution point of PyPI, would be the shortest path to inclusion.

On Sun, Dec 9, 2012 at 8:08 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Yes, but the basic data structure concept is NOT in development, it is already well-developed. The remaining issues of API are wholly secondary. The usefulness of a graph is unquestioned and creates cross-functionality across a large number of possible interesting domains. Really its like having a car when everyone else is walking (...but should the car be a buick or a toyota? a four-door or two? -- is all besides the point) But your issue of proposing an actual implementation is well-taken rather than spend a lot of time arguing over it all. With any luck, I'll try to distill networkx with my work and put it all together. haha mark

Mark Adam <dreamingforward@...> writes:
In terms of use cases, you might be interested in potential users of any stdlib graph library. I'm working on distlib [1], which evolved out of distutils2 and uses graphs in a couple of places: 1. A dependency graph for distributions. This came directly from distutils2, though I've added a couple of bits to it such as topological sorting and determination of strongly-connected components. 2. A lightweight sequencer for build steps, added to avoid the approach in distutils/distutils2 which makes it harder than necessary to handle custom build steps. I didn't use the graph system used in point 1, as it was too specific, and I haven't had time to look at refactoring it. There's another potential use case in the area of packaging, though perhaps not in distlib itself: the idea of generating build artifacts based on their dependencies. Ideally, this would consider not only build artifacts and their dependencies, but also the builders themselves as part of the graph. Regards, Vinay Sajip [1] https://distlib.readthedocs.org/en/latest/

I think of graphs and trees as patterns, not data structures. -- --Guido van Rossum (on iPad)

On 12/16/2012 04:41 PM, Guido van Rossum wrote:
I think of graphs and trees as patterns, not data structures.
How do you draw line between what is data structure and what is pattern ? Do you have any ideas on how to represent "patterns" in python standard library ? By a set of samples ? By (a set of) classes realising the patterns ? By a set of functions working on existing structures which implement the pattern ? Duck-typing should lend itself well to this last approach. Do we currently have any modules in standard library which are more patterns and less data structures ? ----------------------- Hannu

On Mon, Dec 17, 2012 at 9:28 AM, Hannu Krosing <hannu@krosing.net> wrote:
A rough rule of thumb is that if it's harder to remember the configuration options in the API than it is to just write a purpose-specific function, it's probably better as a pattern that can be tweaked for a given use case than it is as an actual data structure. More generally, ABCs and magic methods are used to express patterns (like iteration), which may be implemented by various data structures. A graph library that focused on defining a good abstraction (and adapters) that allowed graph algorithms to be written that worked with multiple existing Python graph data structures could be quite interesting. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 12/17/2012 10:26 PM, Nick Coghlan wrote:
I was just thinking that what is needed, at least as a first step, is a graph api, like the db api, that would allow the writing of algorithms to one api and adapters to various implementations. I expect to be writing some graph algorithms (in Python) in the next year and will try to keep that idea in mind and see if it makes any sense, versus just whipping up a implementation that fits the particular problem. -- Terry Jan Reedy

On 18 December 2012 09:06, Terry Reedy <tjreedy@udel.edu> wrote:
I'd be interested to use (and possibly to contribute to) a graph library of this type on PyPI. I have some suggestions about the appropriate level of abstraction below. The graph algorithms that are the most useful can be written in terms of two things: 1) An iterator over the nodes 2) A way to map each node into an iterator over its children (or partners) It is also required to place some restriction on how the nodes can be used. Descriptions of graph algorithms refer to marking/colouring the nodes of a graph. If the nodes are instances of user defined classes, then you can do this in a relatively literal sense by adding attributes to the nodes, but this is fragile in the event of errors, not thread-safe, etc. Really though, the idea of marking the nodes just means that you need an O(1) method for determining if a node has been marked or checking the value that it was marked with. In Python this is easily done with sets and dicts, which is not fragile and is thread-safe etc. (provided the graph is not being mutated). This requires that the nodes be hashable. In the thread about deleting keys from a dict yesterday it occurred to me (after MRAB's suggestion) that you can still apply the same methods to non-hashable objects. That is, provided you have a situation where node equality is determined by node identity you can just use id(node) in each hash table. While this method works equally well for user-defined class instances it does not work for immutable types where, for example, two strings may be equal but have differing id()s. One way to cover all cases is simply to provide a hashkey argument to each algorithm that defaults to the identity function (lambda x: x), but may be replaced by the id function in appropriate cases. This means that all of the graph algorithms that I would want can be implemented with a basic signature that goes like: def strongly_connected(nodes, edgesfunc, hashkey=None): ''' `nodes` is an iterable over the nodes of the graph. `edgesfunc(node)` is an iterable over the children of node. `hashkey` is an optional key function to apply when adding nodes to a hash-table. For mutable objects where identity is equality use `hashkey=id`. ''' if hashkey is None: # Would be great to have operator.identity here hashkey = lambda x: x There are some cases where optimisation is possible given additional information. One example: I think it is possible to conclude that an undirected graph contains at least one cycle if |E|>=|V|, so in this case an optional hint parameter could give a shortcut for some graphs. Generally, though, there are few algorithms where other quantities are either required or are sufficient for all input graphs (exceptions to the sufficient part of this rule are typically the relatively easy algorithms like determining the mean degree). Once you have algorithms that are implemented in this way it becomes possible to piece them together as a concrete graph class, a mixin, an ABC, a decorator that works like @functools.total_ordering or some other class-based idiom. Crucially, though, unlike all of these class based approaches, defining the algorithms firstly in a functional way makes it easy to apply them to any data structure composed of elementary types or of classes that you yourself cannot write or subclass. Oscar

On Tue, Dec 18, 2012 at 2:08 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com>wrote:
Some graphs don't care for the nodes, all their information is in the edges. That's why most graph frameworks have iter_edges and iter_nodes functions. I'm not sure what's the clean way to represent the optional directionality of edges though. Some example API's from networkx: http://networkx.lanl.gov/reference/classes.html http://networkx.lanl.gov/reference/classes.digraph.html

On 12/18/2012 9:24 AM, Yuval Greenfield wrote:
Or iterable if re-iteration is needed.
2) A way to map each node into an iterator over its children (or partners)
A callable could be either an iterator class or a generator function.
Thank you both the the 'thought food'. Defining things in terms of iterables and iterators instead of (for instance) sets is certainly the Python3 way. Oscar, I don't consider hashability an issue. General class instances are hashable by default. One can even consider such instances as hashable facades for unhashable dicts. Giving each instance a list attribute does the same for lists. The more important question, it seems to me, is whether to represent nodes by counts and let the algorithm do its bookkeeping in private structures, or to represent them by externally defined instances that the algorithm mutates. -- Terry Jan Reedy

On 18 December 2012 16:06, Terry Reedy <tjreedy@udel.edu> wrote:
True. Although there aren't many cases where re-iteration is needed. The main exception would be if you wanted to instantiate a new Graph as a result of the algorithm. For example a transitive pruning function could be written to accept a factory like def transitive_prune(nodes, childfunc, factory): return factory(nodes, pruned_edges(nodes, childfunc)) in which case you need to be able to iterate once over the nodes for the pruning algorithm and once to construct the new graph. In these cases, the fact that you want to instantiate a new graph suggests that your original graph was a concrete data structure so that it is probably okay to require an iterable. To mutate the graph in place the user would need to supply a function to remove edges: def transitive_prune(nodes, childfunc, remove_edge):
This is true. Some algorithms would rather have this information. There are also a few that can proceed just from a particular node rather than needing an iterable over all nodes.
I'm not sure what's the clean way to represent the optional directionality of edges though.
I would have said that each API entry point should state how it will interpret the edges. Are there algorithms that simultaneously make sense for directed and undirected graphs while needing to behave differently in the two cases (in which case is it really the same algorithm)?
True, I've not found hashability to be a problem in practice.
I don't think I understand: How would the "externally defined instances" work? Do you mean that the caller of a function would supply functions like mark(), is_marked(), set_colour(), get_colour() and so on? If that's the case what would the advantages be? I can think of one: if desired the algorithm could be made to store all of its computations in say a database so that it would be very scalable. Though to me that seems like quite a specialised case that would probably merit from reimplementing the desired algorithm anyway. Otherwise I guess it's a lot simpler/safer to implement everything in private data structures. Oscar

On 9 December 2012 20:31, Mark Adam <dreamingforward@gmail.com> wrote:
For all the reasons above I don't much see the utility of implementing some kind of standard graph class. There are too many possibilities for any one implementation to be generally applicable. I have implemented graphs in Python many times and I very often find that the detail of what I want to do leads me to create a different implementation. Another consideration that you've not mentioned is the occasional need for a OrderedGraph that keeps track of some kind of order for its vertices.
Having implemented graphs a few times now, I have come to the conclusion that it is a good idea to make the restriction that the vertices should be hashable. Otherwise, how would you get O(1) behaviour for methods like has_edge()?
This is the one part of a graph library that is really useful. Creating a class or a data structure that represents a graph in some way is trivially easy. Creating trustworthy implementations of all the graph-theoretic algorithms with the right kind of big-O behaviour is not.
What details would you get wrong? I contend that it is very easy to implement a graph without getting any of the details wrong. It is the graph algorithms that are hard, not the data structure. Here's a couple of examples: G = { 'A':{'B', 'C'}, 'B':{'A'}, 'C':{'B'} } M = [[0, 1, 1], [1, 0, 0], [0, 1, 0]] You may want to wrap the above in some kind of class in which case you'll end up with something like the following (from a private project - modified a little before posting so it may not work now): class Graph: def __init__(self, nodes, edges): self._nodes = frozenset(nodes) self._edges = defaultdict(set) for n1, n2 in edges: self._edges[n1].add(n2) @property def nodes(self): return iter(self._nodes) @property def edges(self): for n1 in self._nodes: for n2 in self.edges_node(n1): yield (n1, n2) def edges_node(self, node): return iter(self._edges[node]) def has_edge(self, nfrom, nto): return nto in self._edges[nfrom] def __str__(self): return '\n'.join(self._iterdot()) def _iterdot(self): yield 'digraph G {' for n in self.nodes: yield ' %s;' % n for nfrom, nto in self.edges: yield ' %s -> %s;' % (nfrom, nto) yield '}' G2 = Graph('ABC', [('A', 'B'), ('A', 'C'), ('B', 'C'), ('C', 'A')]) The above class is unusual in the sense that it is a pure Graph class. Normally I would simply be adding a few graphy methods onto a class that represents a network of some kind. The problem that I found with current support for graphs in Python is not the lack of an appropriate data structure. Rather the problem is that implementations of graph-theoretic algorithms (as in e.g. pygraph) are tied to a specific Graph class that I didn't want to or couldn't use in my own project. This means that to determine if you have something that represents a strongly connected graph you first need to create a separate redundant data structure and then pass that into the algorithm. What would be more useful than a new Graph class would be implementations of graph algorithms that can easily be applied to any representation of a graph. As an example, I can write a function for determining if a graph is a DAG using a small subset of the possible methods that a Graph class would have: def is_dag(nodes, edges_node): '''Determine if a directed graph is acyclic nodes is an iterable yielding all vertices in the graph edges_node(node) is an iterable giving all nodes that node connects to ''' visited = set() visiting = set() for node in nodes: if node not in visited: if has_backedge(node, edges_node, visited, visiting): return False else: return True def has_backedge(node, edges_node, visited, visiting): '''Helper for is_dag()''' if node in visiting: return True visited.add(node) visiting.add(node) for childnode in edges_node(node): if has_backedge(childnode, edges_node, visited, visiting): return True visiting.remove(node) return False This can be used with the Graph class or just as easily with the dict-of-sets like so: is_dag(G2.nodes, G2.edges_node) is_dag(G, G.__getitem__) It is possible to do something like this for all of the graph algorithms and I think that a library like this would be more useful than a new Graph type.
participants (9)
-
Guido van Rossum
-
Hannu Krosing
-
Mark Adam
-
Nick Coghlan
-
Oscar Benjamin
-
Stephen J. Turnbull
-
Terry Reedy
-
Vinay Sajip
-
Yuval Greenfield