On Sat, Apr 10, 2021 at 9:08 PM Guido van Rossum <guido@python.org> wrote:

I owe you an answer here. For large projects with long lifetimes that expect their API to evolve, experience has taught  that any part of the API that *can* be used *will* be used, even if it was clearly not intended to be used. And once enough users have code that depends on reaching into some private part of a package and using a method or attribute that was available even though it was undocumented or even clearly marked as "internal" or "private", if an evolution of the package wants to remove that method or attribute (or rename it or change its type, signature or semantics/meaning), those users will complain that the package "broke their code" by making a "backwards incompatible change." For a casually maintained package that may not be a big deal, but for a package with serious maintainers this can prevent ordered evolution of the API, especially since a package's developers may not always be aware of how their internal attributes/methods are being used or perceived by users. So experienced package developers who are planning for the long term are always looking for ways to prevent such situations (because it is frustrating for both users and maintainers). Being able to lock down the exported symbols is just one avenue to prevent disappointment in the future.
 

That's also my experience with large code bases that have to be maintained over more than, say, 5 years.

OTOH, I or my teammates have run into numerous cases (in Python and Java) of using a library developed by a third-party, which does almost exactly what we want but for which our particular use case is not covered by the original implementation, using their intended extension mechanism (subclassing, hook points...). It this case, when you have to ship a feature now and no time to wait for an hypothetical proper solution, Python allows monkey-patching or other types of messing with the private parts of the library, which we agree are not the "proper" solution (i.e. incur quite a bit of technical debt) but at least can provide a solution. (The other solution would be forking the library, which probably incurs even more technical debt).

Now back to the initial question: my understanding is that __all__ only controls the behavior of "from something import *". Some people, and tools, interpret it more broadly as "everything in __all__ is the public API for the package, everything else is private (treat it as a warning)".

The is problematic for several reasons:

1) "import *" is frowned upon, as pep8 says:

Wildcard imports (from <module> import *) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools. There is one defensible use case for a wildcard import, which is to republish an internal interface as part of a public API (for example, overwriting a pure Python implementation of an interface with the definitions from an optional accelerator module and exactly which definitions will be overwritten isn’t known in advance).

IMHO even the "defensible" use case described above is not defensible.

2) What's a "public API" ? A particular library can have an internal API (with regular public functions and methods, etc,) but only willingly expose a subset of this API to its clients. __all__ doesn't help much here, as the tools I know of don't have this distinction in mind.

3) A library author can decide that the public API of its library is the part exposed by its top-level namespace, e.g:

from flask import Flask, resquest, ...

where Flask, request, etc. are defined in subpackages of flask.

this, or similar cases, can be confusing for IDEs that provide a "quick fix" for importing stuff that is currently not defined in the current open module. Sometimes it works (i.e. it correctly imports from the top-level namespace) and sometimes it doesn't (it imports from the module where the imported name is originally defined).

This can be a minor issue (i.e. cosmetic) but sometimes, the library author will refactor their package internally, while keeping the top-level "exports" identical, and if you don't import from the top-level package, you run into an error after a library upgrade.

I guess this issue also comes from the lack of a proper way to define (either as a language-level construct or as a convention among developers and tools authors) a proper notion of a "public API", and, once again, I don't believe __all__ helps much with this issue.


=> For these use cases, a simple solution could be devised, that doesn't involve language-level changes but  needs a wide consensus among both library authors and tools authors:

1) Mark public API namespaces package with a special marker (for instance: __public_api__ = True).

2) Statical and runtime tools could be easily devised that raise a warning when:

a) Such a marker is present in one or more modules of the package.

b) and: one imports another module of the same package from another package.

This is just a rough idea. Additional use cases could easily be added by adding other marker types.

An alternative could be to use decorators (eg. @public like mentioned in another message), as long as we don't confuse "public = part of the public API of the library" with "public = not private to this particular module".

  S.


--
Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier
Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/
Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/
Founder & Organiser, PyParis & PyData Paris - http://pyparis.org/http://pydata.fr/