[Python-Dev] Re: Proto-PEP part 1: Forward declaration of classes

April 24, 2022

      Hi Larry,

On Sat, Apr 23, 2022 at 1:53 AM Larry Hastings <larry@hastings.org> wrote:
...
But rather than speculate further, perhaps someone who works on one of the static type analysis checkers will join the discussion and render an informed opinion about how easy or hard it would be to support "forward class" and "continue class".
I work on a Python static type checker.

I think a major issue with this proposal is that (in the
separate-modules case) it requires monkey-patching as an import side
effect, which is quite hard for both humans and static analysis tools
to reason effectively about.

Imagine we have a module `foo` that contains `forward class Bar`, a
separate module `foo.impl` that contains `continue class Bar: ...`,
and then a module `baz` that contains `import foo`. What type of
object is `foo.Bar` during the import of `baz`? Will it work for the
module body of `baz` to create a singleton instance `my_bar =
foo.Bar()`?

The answer is that we have no idea. `foo.Bar` might be a
non-instantiable "forward class declaration" (or proxy object, in your
second variation), or it might be a fully-constituted class. Which one
it is depends on accidents of import order anywhere else in the
codebase. If any other module happens to have imported `foo.impl`
before `baz` is imported, then `foo.Bar` will be the full class. If
nothing else has imported `foo.impl`, then it will be a
non-instantiable declaration/proxy. This question of import order
potentially involves any other module in the codebase, and the only
way to reliably answer it is to run the entire program; neither a
static type checker nor a reader of the code can reliably answer it in
the general case. It will be very easy to write a module `baz` that
does `import foo; my_bar = foo.Bar()` and have it semi-accidentally
work initially, then later break mysteriously due to a change in
imports in a seemingly unrelated part of the codebase, which causes
`baz` to now be imported before `foo.impl` is imported, instead of
after.

There is another big problem for static type checkers with this
hypothetical module `baz` that only imports `foo`. The type checker
cannot know the shape of the full class `Bar` unless it sees the right
`continue Bar: ...` statement. When analyzing `baz`, it can't just go
wandering the filesystem aimlessly in hopes of encountering some
module with `continue Bar: ...` in it, and hope that's the right one.
(Even worse considering it might be `continue snodgrass: ...` or
anything else instead.) So this means a type checker must require that
any module that imports `Bar` MUST itself import `foo.impl` so the
type checker has a chance of understanding what `Bar` actually is.

This highlights an important difference between this proposal and
languages with real forward declarations. In, say, C++, a forward
declaration of a function or class contains the full interface of the
function or class, i.e. everything a type checker (or human reader)
would need to know in order to know how it can use the function or
class. In this proposal, that is not true; lots of critical
information about the _interface_ of the class (what methods and
attributes does it have, what are the signatures of its methods?) are
not knowable without also seeing the "implementation." This proposal
does not actually forward declare a class interface; all it declares
is the existence of the class (and its inheritance hierarchy.) That's
not sufficient information for a type checker or a human reader to
make use of the class.

Taken together, this means that every single `import foo` in the
codebase would have to be accompanied by an `import foo.impl` right
next to it. In some cases (if `foo.Bar` is not used in module-level
code and we are working around a cycle) it might be safe for the
`import foo.impl` to be within an `if TYPE_CHECKING:` block; otherwise
it would need to be a real runtime import. But it must always be
there. So every single `import foo` in the codebase must now become
two or three lines rather than one.

There are of course other well-known problems with import-time side
effects. All the imports of `foo.impl` in the codebase would exist
only for their side effect of "completing" Bar, not because anyone
actually uses a name defined in `foo.impl`. Linters would flag these
imports as unused, requiring extra cruft to silence the linter. Even
worse, these imports would tend to appear unused to human readers, who
might remove them and be confused why that breaks the program.

All of these import side-effect problems can be resolved by
dis-allowing module separation and requiring `forward class` and
`continue class` to appear in the same module. But then the proposal
no longer helps with resolving inter-module cycles, only intra-module
ones.

Because of these issues (and others that have been mentioned), I don't
think this proposal is a good solution to forward references. I think
PEP 649, with some tricks that I've mentioned elsewhere to allow
introspecting annotations usefully even if not all names referenced in
them are defined yet at the time of introspection, is much less
invasive and more practical.

Carl

[Python-Dev] Re: Proto-PEP part 1: Forward declaration of classes

Carl Meyer