Detecting a cycle in a graph

Mon Jan 15 15:24:54 EST 2018

On 2018-01-15 06:15, Frank Millman wrote:
> "Christian Gollwitzer"  wrote in message news:p3gh84$kfm$1 at dont-email.me...
>>
>> Am 14.01.18 um 22:04 schrieb Christian Gollwitzer:
>> > Am 14.01.18 um 09:30 schrieb Frank Millman:
>> >> I need to detect when a 'cycle' occurs - when a path loops back on 
>> >> itself and revisits a node previously visited. I have figured out a way 
>> >> to do this, but I have a problem.
>> >
>> > I don't know if that helps, but there is a classic graph theory 
>> > algorithm called "Floyd's cycle detector". The idea is to have a pointer 
>> > move along the graph and a second one which runs at double the speed. If 
>> > they meet, you found a cycle. It is not straight-forward to come up with 
>> > this algorithm, which even runs in constant storage. ISTR that there are 
>> > some variants which can give you the split point of the cycle, too.
>>
>> And here is an algorithm to enumerate all cycles in a directed graph:
>>
>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.335.5999&rep=rep1&type=pdf
>>
>> with an implementation in C++ here:
>>
>> https://github.com/hellogcc/circuit-finding-algorithm
>>
> 
> I appreciate the input, Christian, but I am afraid both of those were over
> my head :-(
> 
> I think/hope that a business process graph does not require such a complex
> solution. Here is my cycle-detecting algorithm.
> 
> In BPMN terms, each node can have 0->* incoming connections, and 0->*
> outgoing connections.
> 
> Any node with 0 incoming is deemed to start the process. Normally there is
> only one such node.
> 
> Any node with 0 outgoing represents the end of an 'active path'. If there is
> more than one such node, all 'active' ones must reach the end before the
> process is finished. There is no formal definition of an 'active path', and
> I can think of a few corner cases which could prove problematic, but
> normally the meaning is clear.
> 
> I start my cycle-detection with a node with 0 incoming connections.
> 
> def find_cycle(node, path):
>      for output in node.outputs:
>          if output in path:
>              print('Cycle found in', path)
>          else:
>              new_path = path[:]
>              new_path.append(output)
>              find_cycle(output, new_path)
> 
> find_cycle(node, [node])
> 
> This traverses every possible path in the graph. I think/hope that a typical
> business process will not grow so large as to cause a problem.
> 
> If anyone can see a flaw in this, please let me know.
> 
A couple of suggestions:

1. Instead of copying the list and appending, I'd do:

     find_cycle(output, path + [output])

2. Lists are ordered, but searching them is O(n), so I'd pass a set too 
to speed it up a bit:

     def find_cycle(node, path, visited):
         for output in node.outputs:
             if output in visited:
                 print('Cycle found in', path)
             else:
                 find_cycle(output, path + [output], visited | {output})

    That will help as the paths get longer, although on a small graph, 
it won't matter.