
On Sun, 11 Apr 2021 at 10:25, Stéfane Fermigier <sf@fermigier.com> wrote:
On Sat, Apr 10, 2021 at 9:08 PM Guido van Rossum <guido@python.org> wrote:
I owe you an answer here. For large projects with long lifetimes that expect their API to evolve, experience has taught that any part of the API that *can* be used *will* be used, even if it was clearly not intended to be used. And once enough users have code that depends on reaching into some private part of a package and using a method or attribute that was available even though it was undocumented or even clearly marked as "internal" or "private", if an evolution of the package wants to remove that method or attribute (or rename it or change its type, signature or semantics/meaning), those users will complain that the package "broke their code" by making a "backwards incompatible change." For a casually maintained package that may not be a big deal, but for a package with serious maintainers this can prevent ordered evolution of the API, especially since a package's developers may not always be aware of how their internal attributes/methods are being used or perceived by users. So experienced package developers who are planning for the long term are always looking for ways to prevent such situations (because it is frustrating for both users and maintainers). Being able to lock down the exported symbols is just one avenue to prevent disappointment in the future.
<snip>
2) What's a "public API" ? A particular library can have an internal API (with regular public functions and methods, etc,) but only willingly expose a subset of this API to its clients. __all__ doesn't help much here, as the tools I know of don't have this distinction in mind.
I don't think that this distinction is so important in practice. What matters is the boundary between different codebases. Within a codebase a team can agree whatever conventions they want because any time something is changed it can be changed everywhere at once in a single commit. What matters is communicating from project A to all downstream developers and users what can be expected to remain compatible across different versions of project A.
3) A library author can decide that the public API of its library is the part exposed by its top-level namespace, e.g:
from flask import Flask, resquest, ...
where Flask, request, etc. are defined in subpackages of flask.
This approach really doesn't scale. For one thing imports in Python have a run-time performance cost and if everything is imported at top-level like this then it implies that the top level __init__.py has imported every module in the codebase. Also organising around sub-packages and modules is much nicer for organising documentation, module level docstrings etc.
I guess this issue also comes from the lack of a proper way to define (either as a language-level construct or as a convention among developers and tools authors) a proper notion of a "public API", and, once again, I don't believe __all__ helps much with this issue.
=> For these use cases, a simple solution could be devised, that doesn't involve language-level changes but needs a wide consensus among both library authors and tools authors:
1) Mark public API namespaces package with a special marker (for instance: __public_api__ = True).
2) Statical and runtime tools could be easily devised that raise a warning when:
a) Such a marker is present in one or more modules of the package.
b) and: one imports another module of the same package from another package.
This is just a rough idea. Additional use cases could easily be added by adding other marker types.
An alternative could be to use decorators (eg. @public like mentioned in another message), as long as we don't confuse "public = part of the public API of the library" with "public = not private to this particular module".
I don't think __all__ helps either but for different reasons. Python makes everything implicitly public. What is actually needed is a way to clearly mark the internal code (which is most of the code in a large project) as being obviously private. That's why I prefer the Jax approach of putting the implementation in a jax._src package. That's a clear sign to all about what is private meaning that all contributors and users can clearly see what is internal to the codebase. If someone submits a patch to project B that does from "projA._src.x.y import z" then it's clear to anyone reviewing that patch what it means. They can do that if they want on a consenting adults basis but there will be no ambiguity about the privateness of that API if a refactor in project A causes B to break. The _src approach means that project A can freely refactor its internals including deleting, renaming, merging and splitting modules, making a module into a package etc, without worrying about leaving left over dummy modules or deprecation warnings. Anyone reviewing a patch for project A can clearly see when it is and when it is not allowed to make these kinds of changes. Within projA._src it could be agreed that a leading underscore indicates something like "internal to the module" but *everything* in _src is clearly internal to the codebase. Project A can organise everything outside of _src into modules according to what makes sense for users and for the organisation of the documentation rather than what makes sense for the implementation. The _src package can be cleanly separated from everything else in automatically generated documentation. It's much better if the top-level of the docs has clearly separate links for the public API and the internal development docs. This also makes it clear what sort of information should go in the different parts of the docs which would be very different for jax._src.x.y compared to jax.x.y. In Jax every public module jax.x.y just seems to do "from jax._src.x.y import z, t" but it's also possible to actually put the top-level function there in the module like: """ Big module docstring for jax.x.y. This is for someone doing help in the repl. The proper web docs are in an rst file somewhere else. """ from jax._src.x.y import _do_stuff __all__ = ['do_stuff'] def do_stuff(foo): """ Big do_stuff docstring """ return _do_stuff(foo) That way you've made a module that is entirely about defining and documenting a public API. The do_stuff function shows as being from this module and no automated analysis/introspection tools would get confused about that. There could be a minimum of high-level code in do_stuff e.g. for dispatching to different low-level routines or checking arguments but anything more should go in _src. Oscar