Detecting a cycle in a graph
MRAB
python at mrabarnett.plus.com
Mon Jan 15 15:24:54 EST 2018
On 2018-01-15 06:15, Frank Millman wrote:
> "Christian Gollwitzer" wrote in message news:p3gh84$kfm$1 at dont-email.me...
>>
>> Am 14.01.18 um 22:04 schrieb Christian Gollwitzer:
>> > Am 14.01.18 um 09:30 schrieb Frank Millman:
>> >> I need to detect when a 'cycle' occurs - when a path loops back on
>> >> itself and revisits a node previously visited. I have figured out a way
>> >> to do this, but I have a problem.
>> >
>> > I don't know if that helps, but there is a classic graph theory
>> > algorithm called "Floyd's cycle detector". The idea is to have a pointer
>> > move along the graph and a second one which runs at double the speed. If
>> > they meet, you found a cycle. It is not straight-forward to come up with
>> > this algorithm, which even runs in constant storage. ISTR that there are
>> > some variants which can give you the split point of the cycle, too.
>>
>> And here is an algorithm to enumerate all cycles in a directed graph:
>>
>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.335.5999&rep=rep1&type=pdf
>>
>> with an implementation in C++ here:
>>
>> https://github.com/hellogcc/circuit-finding-algorithm
>>
>
> I appreciate the input, Christian, but I am afraid both of those were over
> my head :-(
>
> I think/hope that a business process graph does not require such a complex
> solution. Here is my cycle-detecting algorithm.
>
> In BPMN terms, each node can have 0->* incoming connections, and 0->*
> outgoing connections.
>
> Any node with 0 incoming is deemed to start the process. Normally there is
> only one such node.
>
> Any node with 0 outgoing represents the end of an 'active path'. If there is
> more than one such node, all 'active' ones must reach the end before the
> process is finished. There is no formal definition of an 'active path', and
> I can think of a few corner cases which could prove problematic, but
> normally the meaning is clear.
>
> I start my cycle-detection with a node with 0 incoming connections.
>
> def find_cycle(node, path):
> for output in node.outputs:
> if output in path:
> print('Cycle found in', path)
> else:
> new_path = path[:]
> new_path.append(output)
> find_cycle(output, new_path)
>
> find_cycle(node, [node])
>
> This traverses every possible path in the graph. I think/hope that a typical
> business process will not grow so large as to cause a problem.
>
> If anyone can see a flaw in this, please let me know.
>
A couple of suggestions:
1. Instead of copying the list and appending, I'd do:
find_cycle(output, path + [output])
2. Lists are ordered, but searching them is O(n), so I'd pass a set too
to speed it up a bit:
def find_cycle(node, path, visited):
for output in node.outputs:
if output in visited:
print('Cycle found in', path)
else:
find_cycle(output, path + [output], visited | {output})
That will help as the paths get longer, although on a small graph,
it won't matter.
More information about the Python-list
mailing list