On Fri, Sep 25, 2020 at 4:59 PM Eric Traut <eric@traut.com> wrote:
PEP 561 indicates that “package maintainers who wish to support type checking of their code MUST add a marker file named py.typed…”. It doesn’t define what “support type checking” means or what expectations are implied. This has led to a situation where packages claim to support type checking but omit many type annotations. There’s currently no tooling that validates the level of “type completeness” for a package, so even well-intentioned package maintainers are unable to confirm that their packages are properly and completely annotated. This leads to situations where type checkers and language servers need to fall back on type inference, which is costly and gives inconsistent results across tools. Ideally, all py.typed packages would have their entire public interface completely annotated.

I’m working on a new feature in Pyright that allows package maintainers to determine whether any of the public symbols in their package are missing type annotations. To do this, I need to clearly define what constitutes a “public symbol”. In most cases, the rules are pretty straightforward and follow the naming guidelines set forth in PEP 8 and PEP 484. For example, symbols that begin with an underscore are excluded from the list of public symbols.

One area of ambiguity is related to import statements. PEP 484 indicates that within stub files, a symbol is not considered exported unless it is used within an import statement of the form `import x as y` or `from x import y as z` or `from x import *`. The problem is that this rule applies only to “.pyi” files and not to “.py” files. For packages that use inlined types, it’s ambiguous whether an import statement of the form `import x` or `from y import x` should treat `x` as a public symbol that is exported from that module.

One problem with the `from x import y as z` heuristic in .py files is it's sometimes used to either avoid shadowing a similar name or to give a shorter, more obvious name.

For instance, I have code where I do `from . import builtins as debuiltins`. This is specifically to avoid confusion over the actual 'builtins' module in Python (and the naming of the module in my package is on purpose as it is meant to act as a reimplementation of part of the 'builtins' module). So this heuristic would break for me. I have also used `from x import y as z` in other cases like `from very.long.package.name import util as very_util` to disambiguate with other 'util' modules being imported.

I can think of a few solutions here:
1. For py.typed packages, type checkers should always apply PEP 484 import rules for “.py” files. If a symbol `x` is imported with an `import x` or `from y import x`, it is treated as “not public”, and any attempt to import it from another package will result in an error.
2. For py.typed packages, PEP 484 rules are _not_ applied for import statements. This maintains backward compatibility. Package maintainers can opt in to PEP 484 rules using some well-defined mechanism. For example, we could define a special flag “stub_import_rules” that can be added to a “py.typed” file. Type checkers could then conditionally use PEP 484 rules for imports.

Option 1 will likely break some assumptions for existing packages. Option 2 avoids that break, but it involves more complexity.

Any suggestions? Thoughts?

Guido already brought it up, but __all__ is a good heuristic to go by when provided.

I would also say that for the typical __init__.py that re-exports, often they are nothing but comments, docstrings, and import statements. So if you can detect that a file is essentially just import statements that would be another reasonable heuristic to say those names are meant to be re-exported.



Eric Traut
Contributor to Pyright and Pylance
Typing-sig mailing list -- typing-sig@python.org
To unsubscribe send an email to typing-sig-leave@python.org
Member address: brett@python.org