[New-bugs-announce] [issue47145] Improve graphlib.TopologicalSort by removing the prepare step

Mon Mar 28 12:20:34 EDT 2022

New submission from Larry Hastings <larry at hastings.org>:

I've maintained my own topological sort class for a while now.  The API I came up with (or was it Tim and I?) was adopted for graphlib.TopologicalSort().  Some of my method names are slightly different, but the APIs are so similar, I can do a little renaming and run Lib/test/test_graphlib using my implementation.  (For clarity I'm going to write the rest of this issue using the names from graphlib.)

Recently I made an improvement to my version: it no longer has the modal behavior where there's a "before prepare" phase where you add nodes and an "after prepare" phase where you can call get_ready and done.

Instead, in my new and improved version the graph is always maintained in a "ready" state.  You can call add, get_ready, and done in any order, as much as you like.  prepare doesn't do anything besides the cycle check--but read on!

This approach required tweaking some semantics behind the scenes.  My graph object now maintains an internal "dirty" flag.  It's set to True after any edges are added, and only gets cleared after checking for cycles.  prepare runs this cycle check, but get_ready *also* runs this cycle check.  (Though in both cases, only when the dirty flag is set, of course.)  In theory this means you no longer need the prepare call.  However, it's still useful, because you can call it to check for a CycleError.

So, on the one hand, this means that get_ready can throw a CycleError, which it never did before.  On the other hand, it won't throw for existing users of the library, because they never add edges after their prepare call.  So this semantic change shouldn't break existing users, assuming they're not misusing the existing API.

Another wrinkle: adding a dependency on a node that's already been handed out by get_ready (but hasn't been returned by done) is ignored.  Again, not something existing users of graphlib can do, so this shouldn't break code.

There's one final semantic change I made worth knowing about: when you return a node via done, the graph simply forgets about it.  This means you could add it a second time (!) and it could go for a complete ride through the graph again, wheeee!  This also means that when all nodes have been returned by done, the graph is essentially returned to its initial pristine state.  This too seems completely harmless, because with the existing library it's illegal to add nodes to a graph after calling prepare, so nobody's doing it, so it won't break any code.

I'm happy to contribute my version.  But I was lazy and didn't worry about speed or memory implementation; it's strewn with sets and things.  I figured I could always optimize it later.  But "later" could become "now" if we were interested in trying to merge this behavior into Python's graphlib.

Interested?

----------
assignee: larry
components: Library (Lib)
messages: 416182
nosy: eric.smith, larry, pablogsal, tim.peters
priority: normal
severity: normal
stage: needs patch
status: open
title: Improve graphlib.TopologicalSort by removing the prepare step
type: enhancement
versions: Python 3.11

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue47145>
_______________________________________