Add an export keyword to better manage __all__
data:image/s3,"s3://crabby-images/8c947/8c947f33e5d0a75fb7078a7776957979b9be7206" alt=""
Hi, I was refactoring some code today and ran into an issue that always bugs me with Python modules. It bugged me enough this time that I spent an hour banging out this potential proposal to add a new contextual keyword. Let me know what you think! Theia -------------------------------------------------------------------------------- A typical pattern for a python module is to have an __init__.py that looks something like: from .foo import ( A, B, C, ) from .bar import ( D, E, ) def baz(): pass __all__ = [ "A", "B", "C", "D", "E", "baz", ] This is annoying for a few reasons: 1. It requires name duplication a. It's easy for the top-level imports to get out of sync with __all__, meaning that __all__, instead of being useful for documentation, is actively misleading b. This encourages people to do `from .bar import *`, which screws up many linting tools like flake8, since they can't introspect the names, and also potentially allows definitions that have been deleted to accidentally persist in __all__. 2. Many symbol-renaming tools won't pick up on the names in __all__, as they're strings. Prior art: ================================================================================ # Rust Rust distinguishes between "use", which is a private import, "pub use", which is a globally public import, and "pub(crate) use", which is a library-internal import ("crate" is Rust's word for library) # Javascript In Javascript modules, there's an "export" keyword: export function foo() { ... } And there's a pattern called the "barrel export" that looks similar to a Python import, but additionally exports the imported names: export * from "./foo"; // re-exports all of foo's definitions Additionally, a module can be gathered and exported by name, but not in one line: import * as foo from "./foo"; export { foo }; # Python decorators People have written utility Python decorators that allow exporting a single function, such as this SO answer: https://stackoverflow.com/a/35710527/1159735 import sys def export(fn): mod = sys.modules[fn.__module__] if hasattr(mod, '__all__'): mod.__all__.append(fn.__name__) else: mod.__all__ = [fn.__name__] return fn , which allows you to write: @export def foo(): pass # __all__ == ["foo"] , but this doesn't allow re-exporting imported values. # Python implicit behavior Python already has a rule that, if __all__ isn't declared, all non-underscore-prefixed names are automatically exported. This is /ok/, but it's not very explicit (Zen) -- it's easy to accidentally "import sys" instead of "import sys as _sys" -- it makes doing the wrong thing the default state. Proposal: ================================================================================ Add a contextual keyword "export" that has meaning in three places: 1. Preceding an "import" statement, which directs all names imported by that statement to be added to __all__: import sys export import .foo export import ( A, B, C, D ) from .bar # __all__ == ["foo", "A", "B", "C", "D"] 2. Preceding a "def", "async def", or "class" keyword, directing that function or class's name to be added to __all__: def private(): pass export def foo(): pass export async def async_foo(): pass export class Foo: pass # __all__ == ["foo", "async_foo", "Foo"] 3. Preceding a bare name at top-level, directing that name to be added to __all__: x = 1 y = 2 export y # __all__ == ["y"] # Big Caveat For this scheme to work, __all__ needs to not be auto-populated with names. While the behavior is possibly suprising, I think the best way to handle this is to have __all__ not auto-populate if an "export" keyword appears in the file. While this is somewhat-implicit behavior, it seems reasonable to me to expect that if a user uses "export", they are opting in to the new way of managing __all__. Likewise, I think manually assigning __all__ when using "export" should raise an error, as it would overwrite all previous exports and be very confusing.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Yeah, all the shenanigans with `__all__` make it clear that it's the wrong solution, and we should do something better. Fortunately the PEG parser and its "soft keywords" feature (debuting for match/case in 3.10) makes it much easier to do this. I had thought about this and came up with similar syntax as you did (`export def` etc.) but instead of writing ``` y = 2 export y ``` That's okay, but maybe we can do better, like this? ``` export y = 2 ``` This could also be combined with a type annotation, e.g. ``` export y: int = 2 ``` I'm not sure about the import+export syntax you gave, maybe something like this instead? ``` import export foo import foo, export bar, baz as export babaz ``` Hm, maybe your version is okay too -- just bikeshedding here. :-) You write about auto-populating `__all__`. I am not aware of it ever auto-populating. What are you referring to here? (The behavior that in the absence of `__all__`, everything not starting with `_` is exported, is not auto-population -- it's a default behavior implemented by `import *`, not by the exporting module.) I'm not sure that I would let `export` use the existing `__all__` machinery anyway. Maybe in a module that uses `export` there should be a different rule that disallows importing anything from it that isn't explicitly exported, regardless of what form of import is used (`__all__` *only* affects `import *`). Maybe these ideas should be considered together with lazy import (another thread here). On Fri, Mar 12, 2021 at 3:08 PM Theia Vogel <theia@vgel.me> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
+1 here. As you point out, __all__ is only really required (or does anything) for "import *" -- and frankly, import * is rarely the right thing to do anyway. So I virtually never use import *, and never write an __all__. But having a way to specify exactly what names are importable at all could make for a cleaner and more explicit API for a module. I am curious how hard that would be to implement though. A module as a namespace (global) -- that namespace is used in the module itself, and is exactly what gets imported, yes? So would a whole etra layer of some sort be required to support this feature? Also, if you did: import the_module would all the names be available in the modules namespace? but some couldn't be imported specifically? That is: import the_module the_module.sys would work, but from the_module import sys would not work? That might be odd and confusing. -CHB Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/8c947/8c947f33e5d0a75fb7078a7776957979b9be7206" alt=""
That was a misconception on my part -- always thought it worked the other way around! Not sure why I thought that. I'm open to the idea of not using __all__, if there's the possibility to have a better interface by breaking away from it. My worries with creating a new, more strict interface would be: 1, breaking compatibility with old tools that look for __all__ specifically-- though I'm not sure how big the set of "tools that dynamically introspect __all__" is. It's probably pretty small. 2, I do like that Python gives you the ability to import everything, even if the library author didn't intend for it. There's benefits to encapsulation of course, but I like having the option to fiddle with some library internals if I really need to. I wonder if there's a way to keep this ability but make it a little "louder", so it's clear you're doing something abnormal -- like the underscore-prefixed private member convention or dangerouslySetInnerHTML in React. As for `export x: int = 1` vs ``` x: int = 1 export x ``` , I like both syntaxes pretty equally. My choice in the original post was mostly because I was unsure about exactly how flexible the soft keywords in the new parser are, and if it would be tricky to insert them before an arbitrary assignment like that instead of making them be their own statement. If that's not an issue, I'm in favor of either syntax :)
data:image/s3,"s3://crabby-images/8c947/8c947f33e5d0a75fb7078a7776957979b9be7206" alt=""
import the_module
the_module.sys
would work, but
from the_module import sys
would not work?
That might be odd and confusing.
Good point. I'm not familiar with CPython internals so I'm not sure how this would work on the implementation side, but I definitely think it would be important to not have an inconsistency here.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Sun, Mar 14, 2021 at 7:11 AM Theia Vogel <theia@vgel.me> wrote:
You probably should design this in partnership with someone who's more familiar with those internals. I believe that familiarity with Python internals is pretty important if you want to be able to design a new feature. Without thinking about the implementation you might design something that's awkward to implement; but you might also not think of something that's easy to do in the implementation but not obvious from superficial usage. (There are two lines in the Zen of Python devoted to this. :-) It is already possible to completely customize what happens when you write "import X". The returned object doesn't even strictly need to have a `__dict__` attribute. For `from X import Y` it first does `import X` (but doesn't bind X in your namespace) and then gives you `X.Y`, except if `Y` is `*`, in which case it looks in `X.__all__`, or if that's not found it looks in `X.__dict__` and imports all keys not starting with `_`. So to support `from X import *` you need either `__all__` or `__dict__`, else you get an import error (but neither of those is needed to support `from X import Y` or `import X; X.Y`). So I think there is no technical reason why we couldn't make it so that a module that uses `export` returns a proxy that only lets you use attributes that are explicitly exported. If users disagree with this, they can negotiate directly with the module authors. It's not a principle of Python that users *must* be given access to implementation details, after all -- it was just more convenient to do it this way when the language was young. (For example, it's not possible to root around like this in C extensions -- these only export what the author intends to export.) I do think that this means that existing libraries need to be careful when adding `export` statements, because it effectively becomes a backwards incompatibility. We could also implement a flag to weaken the strictness in this area (the flag would have to be set by the module, not by the importing code). Such a proxy could also be used to implement lazy imports, to some extent. (I'm not sure how to do `from X import Y` lazily -- it would have to wrap `Y` in another proxy, and then it becomes awkward, e.g. if `Y` is supposed to represent a simple number or string -- we don't want any other part of the language or stdlib to need to become aware of such proxies.) Long answer short, yes, we can make it so that `the_module.sys` in your example above is forbidden -- if we want to. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/5aa14/5aa14ac3ad4a9294af9a19151eb1fccf924aa3fa" alt=""
On Sun, 14 Mar 2021 at 18:58, Guido van Rossum <guido@python.org> wrote:
If this is implemented, then please ensure that some mechanism is included (either keeping __dict__, or functions in inspect, or some other mechanism) to both get and set all attributes of a module irrespective of any export/__all__ controls. I understand that perspective that says library authors should be able to control their API, for various reasons, but this has to be balanced against the fact that library authors are not perfect, and can either include bugs, or fail to consider all reasonable use-cases when designing their code. Historically, this has been done, by convention, through use of the "_" private specifier for module-level objects. The value of being able to (in specific cases) reach into third-party code, and customize it to work for your specific situation should not be disregarded. I think every large codebase that I've worked with has had to monkey-patch a method deep within at least one third-party library at runtime, in production (this is also commonly used for testing purposes), at some point to work-around either a limitation of the library, incompatibility with other library, or to patch a bug. This also applies to module-globals that are themselves imported modules, being able to rebind a name to reference a different module (within careful constraints) in a library module has saved me several times. If external visibility of code units within libraries starts to be restricted so that this sort of patching isn't possible, then the cost of fixing some problems may start to become significantly greater. I strongly believe that the access to members of modules should be kept as similar to the access of members of class instances as possible. Not only will this keep the model simpler, the same arguments and/or concerns apply when talking about classes as when talking about modules. Maybe this means that non-exported members of modules get name-mangled as protected class attributes do, or maybe the '__dict__' member is always exported. I think the idea of resolving the issues mentioned above by talking with third-party module authors directly, seems like an unlikely solution. In my experience, this sort of 'keyhole surgery' patching is often against large, undersupported public libraries (think sqlalchemy, pandas, etc.. where two factors make it unlikely that these projects will be able to help directly: 1. If your project have unique or uncommon requirements that require this sort of patch, often library authors do not want to incur the maintenance burden of changing their carefully-designed code to support the use-cases, especially if they do not consider the requirements to be valid (even if they are in a different context to that being considered by the authors) 2. Release cycles of these projects can be long and complex. Maybe you're using a previous version of this library, and the current release is not compatible with your stack, so even if the library does release a fix, then you'd not be able to use it without a major refactor/retest cycle internally. I really like the "we're all adults here" approach to private/protected access that python currently takes, please don't weaken this without a lot of serious consideration :) Thanks Steve
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
If you feel so strongly about it maybe we need to see some credentials. What's your background? (This may sound like an odd request, but reputation matters, and right now I have no idea who you are or why I should take your opinion seriously, no matter how eloquently it is stated.) Note that C extensions allow none of the introspection mechanisms you're proposing here (they have a `__dict__` but it is fully controlled by the author of the C code) and they thrive just fine. I am happy with giving module authors a way to tell the system "it's okay, users can reach in to access other attributes as well". But I think module authors should *also* be given a way to tell the system "please prevent users from accessing the private parts of this API no matter how hard they try". And using 'export' seems a reasonable way to enable the latter. (To be clear I am fine if there's a flag to override this default even when 'export' is used.) I know there are package authors out there who have this desire to restrict access (presumably because they've been burned when trying to evolve their API) and feel so strongly about it that they implement their own restrictive access controls (without resorting to writing C code). On Sun, Mar 14, 2021 at 12:42 PM Stestagg <stestagg@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/5aa14/5aa14ac3ad4a9294af9a19151eb1fccf924aa3fa" alt=""
On Sun, Mar 14, 2021 at 8:22 PM Guido van Rossum <guido@python.org> wrote:
Hi Guido, I have enormous respect for you, but have no interest in attempting to prove I'm a 'good-enough egg' to warrant your serious attention. The entire question makes me somewhat uncomfortable, and to illustrate why, I'm going to willfully misinterpret it to make my point: My background is: White, Middle-class British male, mid-30s, my family is descended from the illigitimate son of the governor of the Isle of Wight. I've dined with the Queen, and my last two schools were both over 400 years old. Does this qualify me for attention? What if I were a BAME teen woman living in Bangladesh? Of course, sarcastic responses aside, I assume you meant technical background/credentials. Similar concerns still apply there, this python-ideas list is full of people with ideas, many of which are held strongly, that don't get their background questioned openly (although, as in most communities, there's evidence of this happening subtextually in places). In fact, when I previously pointed out a critical flaw in one of your proposals in another thread on this list, I wasn't questioned about my credentials then (I assume because you realised your mistake at that time). If you're still interested in my CV, it's easily findable online, along with other technical context about me that may be relevant to assessing my worthiness. Regardless of the above, I'm not going to argue further about the proposal or my plea to keep some control/override. That's why I made my case as robustly as I could initially. It's a single data-point, other people may reinforce it, or counter it, in the disucssion. By broaching the subject, I was hoping to ensure that this aspect of the proposed change would be considered, if you decide I'm not worth it, then that's your call. Regards Steve Note that C extensions allow none of the introspection mechanisms you're
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Hi Steve, I don't think I can explain adequately why I challenged you, so I'll just apologize. Your feedback (about the 'export' proposal, and about my challenge of your credentials) is duly noted. Sorry! --Guido On Sun, Mar 14, 2021 at 4:11 PM Stestagg <stestagg@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 3/14/21 12:42 PM, Stestagg wrote:
The value of being able to (in specific cases) reach into third-party code, and customize it to work for your specific situation should not be disregarded.
I completely agree with this. One of the hallmarks of Python is the ability to query, introspect, and modify Python code. It helps with debugging, with experimenting, with fixing. Indeed, one of the few frustrating bits about Python is the inability to work with the C portions as easily as the Python portions (note: I am /not/ suggesting we do away with the C portions). What would be the benefits of locking down modules in this way? -- ~Ethan~
data:image/s3,"s3://crabby-images/efe10/efe107798b959240e12a33a55e62a713508452f0" alt=""
I like the idea of improving the way interfaces are exported in Python. I still don't know what the standard is today. Some of my favourite projects like Jax have started tucking their source in a _src directory: https://github.com/google/jax/tree/master/jax/_src, and then importing the exported interface in their main project: https://github.com/google/jax/blob/master/jax/scipy/signal.py. As far as I know, this "private source" method is nicest way to control the interface. What I've been doing is importing * from my __init__.py (https://github.com/NeilGirdhar/tjax/blob/master/tjax/__init__.py) and then making sure every module has an __all__. This "init-all" method is problematic if you want to export a symbol with the same name as a folder. It also means that all of your modules and folders are exported names. If I started again, I'd probably go with the way the Jax team did it. I agree that the proposed export keyword is nicer than __all__, and accomplishes what I've been doing more elegantly. However, it suffers from the same extraneous symbol problem. I would still choose the private source method over export. I'm curious if anyone has ideas about how exporting an interface should be done today, or could be done tomorrow. On Monday, March 15, 2021 at 9:48:11 AM UTC-4 Rob Cliffe via Python-ideas wrote:
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Sun, Mar 14, 2021 at 10:55 PM Ethan Furman <ethan@stoneleaf.us> wrote:
I owe you an answer here. For large projects with long lifetimes that expect their API to evolve, experience has taught that any part of the API that *can* be used *will* be used, even if it was clearly not intended to be used. And once enough users have code that depends on reaching into some private part of a package and using a method or attribute that was available even though it was undocumented or even clearly marked as "internal" or "private", if an evolution of the package wants to remove that method or attribute (or rename it or change its type, signature or semantics/meaning), those users will complain that the package "broke their code" by making a "backwards incompatible change." For a casually maintained package that may not be a big deal, but for a package with serious maintainers this can prevent ordered evolution of the API, especially since a package's developers may not always be aware of how their internal attributes/methods are being used or perceived by users. So experienced package developers who are planning for the long term are always looking for ways to prevent such situations (because it is frustrating for both users and maintainers). Being able to lock down the exported symbols is just one avenue to prevent disappointment in the future. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/d9209/d9209bf5d3a65e4774057bb062dfa432fe6a311a" alt=""
I'd like to give a shoutout to a package I wrote called mkinit, which helps autogenerate explicit `__init__.py` files to mitigate some of the issues you mentioned (especially with respect to `from blah import *`). https://github.com/Erotemic/mkinit On Sat, Apr 10, 2021 at 3:08 PM Guido van Rossum <guido@python.org> wrote:
-- -Dr. Jon Crall (him)
data:image/s3,"s3://crabby-images/e34ce/e34ce984c2026b18e14713adfaa865106f309d5c" alt=""
On Sat, Apr 10, 2021 at 9:08 PM Guido van Rossum <guido@python.org> wrote:
That's also my experience with large code bases that have to be maintained over more than, say, 5 years. OTOH, I or my teammates have run into numerous cases (in Python and Java) of using a library developed by a third-party, which does almost exactly what we want but for which our particular use case is not covered by the original implementation, using their intended extension mechanism (subclassing, hook points...). It this case, when you have to ship a feature now and no time to wait for an hypothetical proper solution, Python allows monkey-patching or other types of messing with the private parts of the library, which we agree are not the "proper" solution (i.e. incur quite a bit of technical debt) but at least can provide a solution. (The other solution would be forking the library, which probably incurs even more technical debt). Now back to the initial question: my understanding is that __all__ only controls the behavior of "from something import *". Some people, and tools, interpret it more broadly as "everything in __all__ is the public API for the package, everything else is private (treat it as a warning)". The is problematic for several reasons: 1) "import *" is frowned upon, as pep8 says: *Wildcard imports (from <module> import *) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools. There is one defensible use case for a wildcard import, which is to republish an internal interface as part of a public API (for example, overwriting a pure Python implementation of an interface with the definitions from an optional accelerator module and exactly which definitions will be overwritten isn’t known in advance).* IMHO even the "defensible" use case described above is not defensible. 2) What's a "public API" ? A particular library can have an internal API (with regular public functions and methods, etc,) but only willingly expose a subset of this API to its clients. __all__ doesn't help much here, as the tools I know of don't have this distinction in mind. 3) A library author can decide that the public API of its library is the part exposed by its top-level namespace, e.g: from flask import Flask, resquest, ... where Flask, request, etc. are defined in subpackages of flask. this, or similar cases, can be confusing for IDEs that provide a "quick fix" for importing stuff that is currently not defined in the current open module. Sometimes it works (i.e. it correctly imports from the top-level namespace) and sometimes it doesn't (it imports from the module where the imported name is originally defined). This can be a minor issue (i.e. cosmetic) but sometimes, the library author will refactor their package internally, while keeping the top-level "exports" identical, and if you don't import from the top-level package, you run into an error after a library upgrade. I guess this issue also comes from the lack of a proper way to define (either as a language-level construct or as a convention among developers and tools authors) a proper notion of a "public API", and, once again, I don't believe __all__ helps much with this issue. => For these use cases, a simple solution could be devised, that doesn't involve language-level changes but needs a wide consensus among both library authors and tools authors: 1) Mark public API namespaces package with a special marker (for instance: __public_api__ = True). 2) Statical and runtime tools could be easily devised that raise a warning when: a) Such a marker is present in one or more modules of the package. b) and: one imports another module of the same package from another package. This is just a rough idea. Additional use cases could easily be added by adding other marker types. An alternative could be to use decorators (eg. @public like mentioned in another message), as long as we don't confuse "public = part of the public API of the library" with "public = not private to this particular module". S. -- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyParis & PyData Paris - http://pyparis.org/ & http://pydata.fr/
data:image/s3,"s3://crabby-images/5f8b2/5f8b2ad1b2b61ef91eb396773cce6ee17c3a4eca" alt=""
On Sun, 11 Apr 2021 at 10:25, Stéfane Fermigier <sf@fermigier.com> wrote:
On Sat, Apr 10, 2021 at 9:08 PM Guido van Rossum <guido@python.org> wrote:
I owe you an answer here. For large projects with long lifetimes that expect their API to evolve, experience has taught that any part of the API that *can* be used *will* be used, even if it was clearly not intended to be used. And once enough users have code that depends on reaching into some private part of a package and using a method or attribute that was available even though it was undocumented or even clearly marked as "internal" or "private", if an evolution of the package wants to remove that method or attribute (or rename it or change its type, signature or semantics/meaning), those users will complain that the package "broke their code" by making a "backwards incompatible change." For a casually maintained package that may not be a big deal, but for a package with serious maintainers this can prevent ordered evolution of the API, especially since a package's developers may not always be aware of how their internal attributes/methods are being used or perceived by users. So experienced package developers who are planning for the long term are always looking for ways to prevent such situations (because it is frustrating for both users and maintainers). Being able to lock down the exported symbols is just one avenue to prevent disappointment in the future.
<snip>
2) What's a "public API" ? A particular library can have an internal API (with regular public functions and methods, etc,) but only willingly expose a subset of this API to its clients. __all__ doesn't help much here, as the tools I know of don't have this distinction in mind.
I don't think that this distinction is so important in practice. What matters is the boundary between different codebases. Within a codebase a team can agree whatever conventions they want because any time something is changed it can be changed everywhere at once in a single commit. What matters is communicating from project A to all downstream developers and users what can be expected to remain compatible across different versions of project A.
This approach really doesn't scale. For one thing imports in Python have a run-time performance cost and if everything is imported at top-level like this then it implies that the top level __init__.py has imported every module in the codebase. Also organising around sub-packages and modules is much nicer for organising documentation, module level docstrings etc.
I don't think __all__ helps either but for different reasons. Python makes everything implicitly public. What is actually needed is a way to clearly mark the internal code (which is most of the code in a large project) as being obviously private. That's why I prefer the Jax approach of putting the implementation in a jax._src package. That's a clear sign to all about what is private meaning that all contributors and users can clearly see what is internal to the codebase. If someone submits a patch to project B that does from "projA._src.x.y import z" then it's clear to anyone reviewing that patch what it means. They can do that if they want on a consenting adults basis but there will be no ambiguity about the privateness of that API if a refactor in project A causes B to break. The _src approach means that project A can freely refactor its internals including deleting, renaming, merging and splitting modules, making a module into a package etc, without worrying about leaving left over dummy modules or deprecation warnings. Anyone reviewing a patch for project A can clearly see when it is and when it is not allowed to make these kinds of changes. Within projA._src it could be agreed that a leading underscore indicates something like "internal to the module" but *everything* in _src is clearly internal to the codebase. Project A can organise everything outside of _src into modules according to what makes sense for users and for the organisation of the documentation rather than what makes sense for the implementation. The _src package can be cleanly separated from everything else in automatically generated documentation. It's much better if the top-level of the docs has clearly separate links for the public API and the internal development docs. This also makes it clear what sort of information should go in the different parts of the docs which would be very different for jax._src.x.y compared to jax.x.y. In Jax every public module jax.x.y just seems to do "from jax._src.x.y import z, t" but it's also possible to actually put the top-level function there in the module like: """ Big module docstring for jax.x.y. This is for someone doing help in the repl. The proper web docs are in an rst file somewhere else. """ from jax._src.x.y import _do_stuff __all__ = ['do_stuff'] def do_stuff(foo): """ Big do_stuff docstring """ return _do_stuff(foo) That way you've made a module that is entirely about defining and documenting a public API. The do_stuff function shows as being from this module and no automated analysis/introspection tools would get confused about that. There could be a minimum of high-level code in do_stuff e.g. for dispatching to different low-level routines or checking arguments but anything more should go in _src. Oscar
data:image/s3,"s3://crabby-images/e34ce/e34ce984c2026b18e14713adfaa865106f309d5c" alt=""
On Sun, Apr 11, 2021 at 2:07 PM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
What matters is the boundary between different codebases. Within a codebase
We agree that the most important API is the Public API, not the internal API (except for very large scale libraries where this can become important). But Python mostly supports (via https://docs.python.org/3/tutorial/classes.html#tut-private or the _-prefix naming convention) the notion of internal APIs, not of Public API.
It scales as long as it's expected that when you import one object from the library, you will use most of the library. In some cases (for instance: SQLAlchemy where you can want to use the Engine but not the ORM) the library author(s) have already thought about the issue and made several top-level namespaces. See for instance: https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/__init__... https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/orm/__in... etc. For one thing imports in Python
I think we agree for the same reason actually.
We agree that there needs to be a way to mark what is part of the public API vs. what is the (library-)private implementation. Being explicit about what is public vs. being explicit what is private are two complementary approaches (as long as we agree that public vs. private is a two-state choice, in this context).
That's why I prefer the Jax approach of putting the implementation in a jax._src package.
That's another approach, which reminds me of some Java projects I have worked on in the past. First time I've seen it in a Python project, though, vs. the other approach which I have found in several projects I work with on a daily basis. S. -- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyParis & PyData Paris - http://pyparis.org/ & http://pydata.fr/
data:image/s3,"s3://crabby-images/ab456/ab456d7b185e9d28a958835d5e138015926e5808" alt=""
I don't think that trying to do this programmatically via some kind of language construct is a good approach. The public API of a library is the one which is documented as such. That's really all there is to it. Documentation is written explicitly by the author of a package and languages provides a lot more nuances than using some programmatic mechanism. E.g. you may have situations where you have more than one level of documented APIs, e.g. a public stable one and an experimental one which is likely to change over time, or a high level one and a low-level one, where more care has to be applied in using it. Or you have APIs which are deprecated but still public and you know that the APIs will go away in a future version, so you should probably not start using them in new code and plan for a refactoring of existing code. Superimposing the extra meaning of __all__ = "public API", which has been done at times, is misleading, esp. when using "from module import *", for which __all__ was added, is frowned upon unless needed in special use cases. You'd really only define __all__ if you expect that "from module import *" will be used, which you typically don't :-) ... "Package authors may also decide not to support it, if they don’t see a use for importing * from their package." https://docs.python.org/3/tutorial/modules.html?highlight=__all__#importing-... Aside: __all__ and namespace package don't work well together either - they don't have __init__.py initializers. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Apr 12 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
data:image/s3,"s3://crabby-images/e34ce/e34ce984c2026b18e14713adfaa865106f309d5c" alt=""
On Mon, Apr 12, 2021 at 11:39 AM M.-A. Lemburg <mal@egenix.com> wrote:
I don't think that trying to do this programmatically via some kind of language construct is a good approach.
It's not the only approach, but I argue that it's useful. Two use cases: - IDE's that will nudge the developer towards importing the "right" objects. - Linters and runtime checkers that will issue warnings / errors and minimise the amount of human code auditing or reviewing.
The public API of a library is the one which is documented as such.
Right, except that in practice: 1) Many useful libraries are not documented or properly documented. 2) People don't read the docs (at least not always, and/or not in details).
Yes. This can be formalized by having more than one namespace for the Public API.
There are already conventions and tools for that, e.g. the @deprecated decorator (https://pypi.org/project/Deprecated/). S. -- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyParis & PyData Paris - http://pyparis.org/ & http://pydata.fr/
data:image/s3,"s3://crabby-images/ab456/ab456d7b185e9d28a958835d5e138015926e5808" alt=""
On 12.04.2021 12:13, Stéfane Fermigier wrote:
When writing code against an API you typically read the documentation. Without knowing which APIs you need or how they are used, a list of public APIs won't really get you to running code :-)
In those cases, I'd argue that such libraries then do not really care for a specific public API either :-)
I just listed some variants. If you want to capture all nuances, you'd end up creating a mini-language specifically for defining the public APIs... then why not just write this down in the documentation rather than inventing a new DSL ?
Sure, but again: this is just another aspect to consider. There can be plenty more, e.g. say some APIs are only available on specific platforms, or may only be used when certain external libraries are present or configurations are used. Or let's say that a specific API only works on CPython, but not in PyPy, or Jython. And the platform your IDE is running on may not have those requirements enabled or available. Should those "public API" attributes then include all variants or just the ones which do work on your platform ? (playing devil's advocate here) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Apr 12 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On Mon, 12 Apr 2021 at 11:41, M.-A. Lemburg <mal@egenix.com> wrote:
Is the problem here that we're trying to apply a technical solution to what is, in practice, a people problem? I don't think I've ever seen (in any language) a system of declaring names as public/private/whatever that substituted well for writing (and reading!) good documentation... At best, hiding stuff makes people work a bit harder to write bad code :-) Paul
data:image/s3,"s3://crabby-images/83003/83003405cb3e437d91969f4da1e4d11958d94f27" alt=""
On 2021-04-12 03:13, Stéfane Fermigier wrote:
Then they're buggy. I'm not convinced by the argument that "in practice" people do things they shouldn't do and that therefore we should encourage them to do more of that.
2) People don't read the docs (at least not always, and/or not in details).
Insofar as someone relies on behavior other than that given in the docs, they are being foolish. Again, I'm not convinced by the argument that "in practice" people do foolish things and that therefore we should encourage them to do more of that. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Wow, super helpful response. On Mon, Apr 12, 2021 at 1:26 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/5f8b2/5f8b2ad1b2b61ef91eb396773cce6ee17c3a4eca" alt=""
On Mon, 12 Apr 2021 at 21:27, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
A library can be useful and buggy at the same time or it can be useful but have some parts that are buggy and less useful. Ideally if some code is (potentially) useful but buggy then someone would come along and make it less buggy. If one of the deficiencies of the code to be improved is that it does not have a clear distinction between public and internal API then that task can be made much more difficult.
Maybe the docs were not so clear or maybe the library just has a lot of users or maybe some combination of the two. Either way it's much better for everyone involved if the code can be improved or extended without breaking things for people who are already using it. Standardising on simple practice that helps to make that happen is no bad thing. Oscar
data:image/s3,"s3://crabby-images/52acc/52acc955a1cbe88973a951325c7e9ca1c0c76e28" alt=""
So I think there are multiple behaviors that are being described here and I think there is validity in being able to potentially patch/modify definitions when absolutely necessary, but I wonder if there's room here for something in-between (also, I find the 'import export' or 'export import' statements quite hard to read. Please don't combine 2 verbs in one statement) Here's my preference (which admittedly is partially taken from NodeJS/Javascript): """ # Base examples export x = 2 export x as y # Export a previous name under a new name (not quoted to emphasize it must be a valid variable name, not arbitrary string) export class MyClass: pass export DEFAULT = MyClass() DEFAULT.setup_with_defaults() # exported entities named within this module are named entities # Exporting from other modules from .my_submodule export * # This is just syntactic sugar similar to "from .my_submodule import *" where in this case it's a pure forwarding reference. Debatable on whether or not the found name are assigned to in the current local namespace from .my_other_submodule export ThisInterface as ThatInterface """ That said, I'm very new here and understand there are likely multiple edge-cases I'm not considering. But I am a very interested stakeholder in the result :-) Cheers, -Jeff On Mon, Apr 12, 2021 at 2:36 PM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
data:image/s3,"s3://crabby-images/efe10/efe107798b959240e12a33a55e62a713508452f0" alt=""
This is a great discussion. However, there's one issue that seems to be getting lost. Consider this library structure: test/lemon test/lemon/__init__.py test/module.py where test/module.py is: __all__ = ['lemon'] def lemon(): pass and test/__init__.py is: from .module import * from .lemon import * and test/lemon/__init__.py is empty. What do you think test.lemon is? It's not the function because it's hidden by the eponymous package. Also, test.module is exported whether you wanted to export it or not. This is the problem with any mechanism like __all__ or export. I think the main motivation behind the Jax team tucking everything in _src is that it hides the package names and module names from the public interface. I think I'm going to start only using __all__ for collecting private symbols into packages under a _src package. For the external interface, I'm going to explicitly import from dummy modules. I'd love to know if something better exists, but I don't think the export keyword works because of the above problem. Best, Neil On Mon, Apr 12, 2021 at 7:39 AM Joseph Martinot-Lagarde <contrebasse@gmail.com> wrote:
data:image/s3,"s3://crabby-images/3c316/3c31677f0350484505fbc9b436d43c966f3627ad" alt=""
Theia Vogel wrote:
I like the @public decorator like https://public.readthedocs.io/en/latest/index.html (which came from https://bugs.python.org/issue22247). The fact that it's at the definition of a function (or constant) makes it quite enjoyable to use. Joseph
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Yeah, all the shenanigans with `__all__` make it clear that it's the wrong solution, and we should do something better. Fortunately the PEG parser and its "soft keywords" feature (debuting for match/case in 3.10) makes it much easier to do this. I had thought about this and came up with similar syntax as you did (`export def` etc.) but instead of writing ``` y = 2 export y ``` That's okay, but maybe we can do better, like this? ``` export y = 2 ``` This could also be combined with a type annotation, e.g. ``` export y: int = 2 ``` I'm not sure about the import+export syntax you gave, maybe something like this instead? ``` import export foo import foo, export bar, baz as export babaz ``` Hm, maybe your version is okay too -- just bikeshedding here. :-) You write about auto-populating `__all__`. I am not aware of it ever auto-populating. What are you referring to here? (The behavior that in the absence of `__all__`, everything not starting with `_` is exported, is not auto-population -- it's a default behavior implemented by `import *`, not by the exporting module.) I'm not sure that I would let `export` use the existing `__all__` machinery anyway. Maybe in a module that uses `export` there should be a different rule that disallows importing anything from it that isn't explicitly exported, regardless of what form of import is used (`__all__` *only* affects `import *`). Maybe these ideas should be considered together with lazy import (another thread here). On Fri, Mar 12, 2021 at 3:08 PM Theia Vogel <theia@vgel.me> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
+1 here. As you point out, __all__ is only really required (or does anything) for "import *" -- and frankly, import * is rarely the right thing to do anyway. So I virtually never use import *, and never write an __all__. But having a way to specify exactly what names are importable at all could make for a cleaner and more explicit API for a module. I am curious how hard that would be to implement though. A module as a namespace (global) -- that namespace is used in the module itself, and is exactly what gets imported, yes? So would a whole etra layer of some sort be required to support this feature? Also, if you did: import the_module would all the names be available in the modules namespace? but some couldn't be imported specifically? That is: import the_module the_module.sys would work, but from the_module import sys would not work? That might be odd and confusing. -CHB Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/8c947/8c947f33e5d0a75fb7078a7776957979b9be7206" alt=""
That was a misconception on my part -- always thought it worked the other way around! Not sure why I thought that. I'm open to the idea of not using __all__, if there's the possibility to have a better interface by breaking away from it. My worries with creating a new, more strict interface would be: 1, breaking compatibility with old tools that look for __all__ specifically-- though I'm not sure how big the set of "tools that dynamically introspect __all__" is. It's probably pretty small. 2, I do like that Python gives you the ability to import everything, even if the library author didn't intend for it. There's benefits to encapsulation of course, but I like having the option to fiddle with some library internals if I really need to. I wonder if there's a way to keep this ability but make it a little "louder", so it's clear you're doing something abnormal -- like the underscore-prefixed private member convention or dangerouslySetInnerHTML in React. As for `export x: int = 1` vs ``` x: int = 1 export x ``` , I like both syntaxes pretty equally. My choice in the original post was mostly because I was unsure about exactly how flexible the soft keywords in the new parser are, and if it would be tricky to insert them before an arbitrary assignment like that instead of making them be their own statement. If that's not an issue, I'm in favor of either syntax :)
data:image/s3,"s3://crabby-images/8c947/8c947f33e5d0a75fb7078a7776957979b9be7206" alt=""
import the_module
the_module.sys
would work, but
from the_module import sys
would not work?
That might be odd and confusing.
Good point. I'm not familiar with CPython internals so I'm not sure how this would work on the implementation side, but I definitely think it would be important to not have an inconsistency here.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Sun, Mar 14, 2021 at 7:11 AM Theia Vogel <theia@vgel.me> wrote:
You probably should design this in partnership with someone who's more familiar with those internals. I believe that familiarity with Python internals is pretty important if you want to be able to design a new feature. Without thinking about the implementation you might design something that's awkward to implement; but you might also not think of something that's easy to do in the implementation but not obvious from superficial usage. (There are two lines in the Zen of Python devoted to this. :-) It is already possible to completely customize what happens when you write "import X". The returned object doesn't even strictly need to have a `__dict__` attribute. For `from X import Y` it first does `import X` (but doesn't bind X in your namespace) and then gives you `X.Y`, except if `Y` is `*`, in which case it looks in `X.__all__`, or if that's not found it looks in `X.__dict__` and imports all keys not starting with `_`. So to support `from X import *` you need either `__all__` or `__dict__`, else you get an import error (but neither of those is needed to support `from X import Y` or `import X; X.Y`). So I think there is no technical reason why we couldn't make it so that a module that uses `export` returns a proxy that only lets you use attributes that are explicitly exported. If users disagree with this, they can negotiate directly with the module authors. It's not a principle of Python that users *must* be given access to implementation details, after all -- it was just more convenient to do it this way when the language was young. (For example, it's not possible to root around like this in C extensions -- these only export what the author intends to export.) I do think that this means that existing libraries need to be careful when adding `export` statements, because it effectively becomes a backwards incompatibility. We could also implement a flag to weaken the strictness in this area (the flag would have to be set by the module, not by the importing code). Such a proxy could also be used to implement lazy imports, to some extent. (I'm not sure how to do `from X import Y` lazily -- it would have to wrap `Y` in another proxy, and then it becomes awkward, e.g. if `Y` is supposed to represent a simple number or string -- we don't want any other part of the language or stdlib to need to become aware of such proxies.) Long answer short, yes, we can make it so that `the_module.sys` in your example above is forbidden -- if we want to. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/5aa14/5aa14ac3ad4a9294af9a19151eb1fccf924aa3fa" alt=""
On Sun, 14 Mar 2021 at 18:58, Guido van Rossum <guido@python.org> wrote:
If this is implemented, then please ensure that some mechanism is included (either keeping __dict__, or functions in inspect, or some other mechanism) to both get and set all attributes of a module irrespective of any export/__all__ controls. I understand that perspective that says library authors should be able to control their API, for various reasons, but this has to be balanced against the fact that library authors are not perfect, and can either include bugs, or fail to consider all reasonable use-cases when designing their code. Historically, this has been done, by convention, through use of the "_" private specifier for module-level objects. The value of being able to (in specific cases) reach into third-party code, and customize it to work for your specific situation should not be disregarded. I think every large codebase that I've worked with has had to monkey-patch a method deep within at least one third-party library at runtime, in production (this is also commonly used for testing purposes), at some point to work-around either a limitation of the library, incompatibility with other library, or to patch a bug. This also applies to module-globals that are themselves imported modules, being able to rebind a name to reference a different module (within careful constraints) in a library module has saved me several times. If external visibility of code units within libraries starts to be restricted so that this sort of patching isn't possible, then the cost of fixing some problems may start to become significantly greater. I strongly believe that the access to members of modules should be kept as similar to the access of members of class instances as possible. Not only will this keep the model simpler, the same arguments and/or concerns apply when talking about classes as when talking about modules. Maybe this means that non-exported members of modules get name-mangled as protected class attributes do, or maybe the '__dict__' member is always exported. I think the idea of resolving the issues mentioned above by talking with third-party module authors directly, seems like an unlikely solution. In my experience, this sort of 'keyhole surgery' patching is often against large, undersupported public libraries (think sqlalchemy, pandas, etc.. where two factors make it unlikely that these projects will be able to help directly: 1. If your project have unique or uncommon requirements that require this sort of patch, often library authors do not want to incur the maintenance burden of changing their carefully-designed code to support the use-cases, especially if they do not consider the requirements to be valid (even if they are in a different context to that being considered by the authors) 2. Release cycles of these projects can be long and complex. Maybe you're using a previous version of this library, and the current release is not compatible with your stack, so even if the library does release a fix, then you'd not be able to use it without a major refactor/retest cycle internally. I really like the "we're all adults here" approach to private/protected access that python currently takes, please don't weaken this without a lot of serious consideration :) Thanks Steve
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
If you feel so strongly about it maybe we need to see some credentials. What's your background? (This may sound like an odd request, but reputation matters, and right now I have no idea who you are or why I should take your opinion seriously, no matter how eloquently it is stated.) Note that C extensions allow none of the introspection mechanisms you're proposing here (they have a `__dict__` but it is fully controlled by the author of the C code) and they thrive just fine. I am happy with giving module authors a way to tell the system "it's okay, users can reach in to access other attributes as well". But I think module authors should *also* be given a way to tell the system "please prevent users from accessing the private parts of this API no matter how hard they try". And using 'export' seems a reasonable way to enable the latter. (To be clear I am fine if there's a flag to override this default even when 'export' is used.) I know there are package authors out there who have this desire to restrict access (presumably because they've been burned when trying to evolve their API) and feel so strongly about it that they implement their own restrictive access controls (without resorting to writing C code). On Sun, Mar 14, 2021 at 12:42 PM Stestagg <stestagg@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/5aa14/5aa14ac3ad4a9294af9a19151eb1fccf924aa3fa" alt=""
On Sun, Mar 14, 2021 at 8:22 PM Guido van Rossum <guido@python.org> wrote:
Hi Guido, I have enormous respect for you, but have no interest in attempting to prove I'm a 'good-enough egg' to warrant your serious attention. The entire question makes me somewhat uncomfortable, and to illustrate why, I'm going to willfully misinterpret it to make my point: My background is: White, Middle-class British male, mid-30s, my family is descended from the illigitimate son of the governor of the Isle of Wight. I've dined with the Queen, and my last two schools were both over 400 years old. Does this qualify me for attention? What if I were a BAME teen woman living in Bangladesh? Of course, sarcastic responses aside, I assume you meant technical background/credentials. Similar concerns still apply there, this python-ideas list is full of people with ideas, many of which are held strongly, that don't get their background questioned openly (although, as in most communities, there's evidence of this happening subtextually in places). In fact, when I previously pointed out a critical flaw in one of your proposals in another thread on this list, I wasn't questioned about my credentials then (I assume because you realised your mistake at that time). If you're still interested in my CV, it's easily findable online, along with other technical context about me that may be relevant to assessing my worthiness. Regardless of the above, I'm not going to argue further about the proposal or my plea to keep some control/override. That's why I made my case as robustly as I could initially. It's a single data-point, other people may reinforce it, or counter it, in the disucssion. By broaching the subject, I was hoping to ensure that this aspect of the proposed change would be considered, if you decide I'm not worth it, then that's your call. Regards Steve Note that C extensions allow none of the introspection mechanisms you're
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Hi Steve, I don't think I can explain adequately why I challenged you, so I'll just apologize. Your feedback (about the 'export' proposal, and about my challenge of your credentials) is duly noted. Sorry! --Guido On Sun, Mar 14, 2021 at 4:11 PM Stestagg <stestagg@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 3/14/21 12:42 PM, Stestagg wrote:
The value of being able to (in specific cases) reach into third-party code, and customize it to work for your specific situation should not be disregarded.
I completely agree with this. One of the hallmarks of Python is the ability to query, introspect, and modify Python code. It helps with debugging, with experimenting, with fixing. Indeed, one of the few frustrating bits about Python is the inability to work with the C portions as easily as the Python portions (note: I am /not/ suggesting we do away with the C portions). What would be the benefits of locking down modules in this way? -- ~Ethan~
data:image/s3,"s3://crabby-images/efe10/efe107798b959240e12a33a55e62a713508452f0" alt=""
I like the idea of improving the way interfaces are exported in Python. I still don't know what the standard is today. Some of my favourite projects like Jax have started tucking their source in a _src directory: https://github.com/google/jax/tree/master/jax/_src, and then importing the exported interface in their main project: https://github.com/google/jax/blob/master/jax/scipy/signal.py. As far as I know, this "private source" method is nicest way to control the interface. What I've been doing is importing * from my __init__.py (https://github.com/NeilGirdhar/tjax/blob/master/tjax/__init__.py) and then making sure every module has an __all__. This "init-all" method is problematic if you want to export a symbol with the same name as a folder. It also means that all of your modules and folders are exported names. If I started again, I'd probably go with the way the Jax team did it. I agree that the proposed export keyword is nicer than __all__, and accomplishes what I've been doing more elegantly. However, it suffers from the same extraneous symbol problem. I would still choose the private source method over export. I'm curious if anyone has ideas about how exporting an interface should be done today, or could be done tomorrow. On Monday, March 15, 2021 at 9:48:11 AM UTC-4 Rob Cliffe via Python-ideas wrote:
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Sun, Mar 14, 2021 at 10:55 PM Ethan Furman <ethan@stoneleaf.us> wrote:
I owe you an answer here. For large projects with long lifetimes that expect their API to evolve, experience has taught that any part of the API that *can* be used *will* be used, even if it was clearly not intended to be used. And once enough users have code that depends on reaching into some private part of a package and using a method or attribute that was available even though it was undocumented or even clearly marked as "internal" or "private", if an evolution of the package wants to remove that method or attribute (or rename it or change its type, signature or semantics/meaning), those users will complain that the package "broke their code" by making a "backwards incompatible change." For a casually maintained package that may not be a big deal, but for a package with serious maintainers this can prevent ordered evolution of the API, especially since a package's developers may not always be aware of how their internal attributes/methods are being used or perceived by users. So experienced package developers who are planning for the long term are always looking for ways to prevent such situations (because it is frustrating for both users and maintainers). Being able to lock down the exported symbols is just one avenue to prevent disappointment in the future. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/d9209/d9209bf5d3a65e4774057bb062dfa432fe6a311a" alt=""
I'd like to give a shoutout to a package I wrote called mkinit, which helps autogenerate explicit `__init__.py` files to mitigate some of the issues you mentioned (especially with respect to `from blah import *`). https://github.com/Erotemic/mkinit On Sat, Apr 10, 2021 at 3:08 PM Guido van Rossum <guido@python.org> wrote:
-- -Dr. Jon Crall (him)
data:image/s3,"s3://crabby-images/e34ce/e34ce984c2026b18e14713adfaa865106f309d5c" alt=""
On Sat, Apr 10, 2021 at 9:08 PM Guido van Rossum <guido@python.org> wrote:
That's also my experience with large code bases that have to be maintained over more than, say, 5 years. OTOH, I or my teammates have run into numerous cases (in Python and Java) of using a library developed by a third-party, which does almost exactly what we want but for which our particular use case is not covered by the original implementation, using their intended extension mechanism (subclassing, hook points...). It this case, when you have to ship a feature now and no time to wait for an hypothetical proper solution, Python allows monkey-patching or other types of messing with the private parts of the library, which we agree are not the "proper" solution (i.e. incur quite a bit of technical debt) but at least can provide a solution. (The other solution would be forking the library, which probably incurs even more technical debt). Now back to the initial question: my understanding is that __all__ only controls the behavior of "from something import *". Some people, and tools, interpret it more broadly as "everything in __all__ is the public API for the package, everything else is private (treat it as a warning)". The is problematic for several reasons: 1) "import *" is frowned upon, as pep8 says: *Wildcard imports (from <module> import *) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools. There is one defensible use case for a wildcard import, which is to republish an internal interface as part of a public API (for example, overwriting a pure Python implementation of an interface with the definitions from an optional accelerator module and exactly which definitions will be overwritten isn’t known in advance).* IMHO even the "defensible" use case described above is not defensible. 2) What's a "public API" ? A particular library can have an internal API (with regular public functions and methods, etc,) but only willingly expose a subset of this API to its clients. __all__ doesn't help much here, as the tools I know of don't have this distinction in mind. 3) A library author can decide that the public API of its library is the part exposed by its top-level namespace, e.g: from flask import Flask, resquest, ... where Flask, request, etc. are defined in subpackages of flask. this, or similar cases, can be confusing for IDEs that provide a "quick fix" for importing stuff that is currently not defined in the current open module. Sometimes it works (i.e. it correctly imports from the top-level namespace) and sometimes it doesn't (it imports from the module where the imported name is originally defined). This can be a minor issue (i.e. cosmetic) but sometimes, the library author will refactor their package internally, while keeping the top-level "exports" identical, and if you don't import from the top-level package, you run into an error after a library upgrade. I guess this issue also comes from the lack of a proper way to define (either as a language-level construct or as a convention among developers and tools authors) a proper notion of a "public API", and, once again, I don't believe __all__ helps much with this issue. => For these use cases, a simple solution could be devised, that doesn't involve language-level changes but needs a wide consensus among both library authors and tools authors: 1) Mark public API namespaces package with a special marker (for instance: __public_api__ = True). 2) Statical and runtime tools could be easily devised that raise a warning when: a) Such a marker is present in one or more modules of the package. b) and: one imports another module of the same package from another package. This is just a rough idea. Additional use cases could easily be added by adding other marker types. An alternative could be to use decorators (eg. @public like mentioned in another message), as long as we don't confuse "public = part of the public API of the library" with "public = not private to this particular module". S. -- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyParis & PyData Paris - http://pyparis.org/ & http://pydata.fr/
data:image/s3,"s3://crabby-images/5f8b2/5f8b2ad1b2b61ef91eb396773cce6ee17c3a4eca" alt=""
On Sun, 11 Apr 2021 at 10:25, Stéfane Fermigier <sf@fermigier.com> wrote:
On Sat, Apr 10, 2021 at 9:08 PM Guido van Rossum <guido@python.org> wrote:
I owe you an answer here. For large projects with long lifetimes that expect their API to evolve, experience has taught that any part of the API that *can* be used *will* be used, even if it was clearly not intended to be used. And once enough users have code that depends on reaching into some private part of a package and using a method or attribute that was available even though it was undocumented or even clearly marked as "internal" or "private", if an evolution of the package wants to remove that method or attribute (or rename it or change its type, signature or semantics/meaning), those users will complain that the package "broke their code" by making a "backwards incompatible change." For a casually maintained package that may not be a big deal, but for a package with serious maintainers this can prevent ordered evolution of the API, especially since a package's developers may not always be aware of how their internal attributes/methods are being used or perceived by users. So experienced package developers who are planning for the long term are always looking for ways to prevent such situations (because it is frustrating for both users and maintainers). Being able to lock down the exported symbols is just one avenue to prevent disappointment in the future.
<snip>
2) What's a "public API" ? A particular library can have an internal API (with regular public functions and methods, etc,) but only willingly expose a subset of this API to its clients. __all__ doesn't help much here, as the tools I know of don't have this distinction in mind.
I don't think that this distinction is so important in practice. What matters is the boundary between different codebases. Within a codebase a team can agree whatever conventions they want because any time something is changed it can be changed everywhere at once in a single commit. What matters is communicating from project A to all downstream developers and users what can be expected to remain compatible across different versions of project A.
This approach really doesn't scale. For one thing imports in Python have a run-time performance cost and if everything is imported at top-level like this then it implies that the top level __init__.py has imported every module in the codebase. Also organising around sub-packages and modules is much nicer for organising documentation, module level docstrings etc.
I don't think __all__ helps either but for different reasons. Python makes everything implicitly public. What is actually needed is a way to clearly mark the internal code (which is most of the code in a large project) as being obviously private. That's why I prefer the Jax approach of putting the implementation in a jax._src package. That's a clear sign to all about what is private meaning that all contributors and users can clearly see what is internal to the codebase. If someone submits a patch to project B that does from "projA._src.x.y import z" then it's clear to anyone reviewing that patch what it means. They can do that if they want on a consenting adults basis but there will be no ambiguity about the privateness of that API if a refactor in project A causes B to break. The _src approach means that project A can freely refactor its internals including deleting, renaming, merging and splitting modules, making a module into a package etc, without worrying about leaving left over dummy modules or deprecation warnings. Anyone reviewing a patch for project A can clearly see when it is and when it is not allowed to make these kinds of changes. Within projA._src it could be agreed that a leading underscore indicates something like "internal to the module" but *everything* in _src is clearly internal to the codebase. Project A can organise everything outside of _src into modules according to what makes sense for users and for the organisation of the documentation rather than what makes sense for the implementation. The _src package can be cleanly separated from everything else in automatically generated documentation. It's much better if the top-level of the docs has clearly separate links for the public API and the internal development docs. This also makes it clear what sort of information should go in the different parts of the docs which would be very different for jax._src.x.y compared to jax.x.y. In Jax every public module jax.x.y just seems to do "from jax._src.x.y import z, t" but it's also possible to actually put the top-level function there in the module like: """ Big module docstring for jax.x.y. This is for someone doing help in the repl. The proper web docs are in an rst file somewhere else. """ from jax._src.x.y import _do_stuff __all__ = ['do_stuff'] def do_stuff(foo): """ Big do_stuff docstring """ return _do_stuff(foo) That way you've made a module that is entirely about defining and documenting a public API. The do_stuff function shows as being from this module and no automated analysis/introspection tools would get confused about that. There could be a minimum of high-level code in do_stuff e.g. for dispatching to different low-level routines or checking arguments but anything more should go in _src. Oscar
data:image/s3,"s3://crabby-images/e34ce/e34ce984c2026b18e14713adfaa865106f309d5c" alt=""
On Sun, Apr 11, 2021 at 2:07 PM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
What matters is the boundary between different codebases. Within a codebase
We agree that the most important API is the Public API, not the internal API (except for very large scale libraries where this can become important). But Python mostly supports (via https://docs.python.org/3/tutorial/classes.html#tut-private or the _-prefix naming convention) the notion of internal APIs, not of Public API.
It scales as long as it's expected that when you import one object from the library, you will use most of the library. In some cases (for instance: SQLAlchemy where you can want to use the Engine but not the ORM) the library author(s) have already thought about the issue and made several top-level namespaces. See for instance: https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/__init__... https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/orm/__in... etc. For one thing imports in Python
I think we agree for the same reason actually.
We agree that there needs to be a way to mark what is part of the public API vs. what is the (library-)private implementation. Being explicit about what is public vs. being explicit what is private are two complementary approaches (as long as we agree that public vs. private is a two-state choice, in this context).
That's why I prefer the Jax approach of putting the implementation in a jax._src package.
That's another approach, which reminds me of some Java projects I have worked on in the past. First time I've seen it in a Python project, though, vs. the other approach which I have found in several projects I work with on a daily basis. S. -- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyParis & PyData Paris - http://pyparis.org/ & http://pydata.fr/
data:image/s3,"s3://crabby-images/ab456/ab456d7b185e9d28a958835d5e138015926e5808" alt=""
I don't think that trying to do this programmatically via some kind of language construct is a good approach. The public API of a library is the one which is documented as such. That's really all there is to it. Documentation is written explicitly by the author of a package and languages provides a lot more nuances than using some programmatic mechanism. E.g. you may have situations where you have more than one level of documented APIs, e.g. a public stable one and an experimental one which is likely to change over time, or a high level one and a low-level one, where more care has to be applied in using it. Or you have APIs which are deprecated but still public and you know that the APIs will go away in a future version, so you should probably not start using them in new code and plan for a refactoring of existing code. Superimposing the extra meaning of __all__ = "public API", which has been done at times, is misleading, esp. when using "from module import *", for which __all__ was added, is frowned upon unless needed in special use cases. You'd really only define __all__ if you expect that "from module import *" will be used, which you typically don't :-) ... "Package authors may also decide not to support it, if they don’t see a use for importing * from their package." https://docs.python.org/3/tutorial/modules.html?highlight=__all__#importing-... Aside: __all__ and namespace package don't work well together either - they don't have __init__.py initializers. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Apr 12 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
data:image/s3,"s3://crabby-images/e34ce/e34ce984c2026b18e14713adfaa865106f309d5c" alt=""
On Mon, Apr 12, 2021 at 11:39 AM M.-A. Lemburg <mal@egenix.com> wrote:
I don't think that trying to do this programmatically via some kind of language construct is a good approach.
It's not the only approach, but I argue that it's useful. Two use cases: - IDE's that will nudge the developer towards importing the "right" objects. - Linters and runtime checkers that will issue warnings / errors and minimise the amount of human code auditing or reviewing.
The public API of a library is the one which is documented as such.
Right, except that in practice: 1) Many useful libraries are not documented or properly documented. 2) People don't read the docs (at least not always, and/or not in details).
Yes. This can be formalized by having more than one namespace for the Public API.
There are already conventions and tools for that, e.g. the @deprecated decorator (https://pypi.org/project/Deprecated/). S. -- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyParis & PyData Paris - http://pyparis.org/ & http://pydata.fr/
data:image/s3,"s3://crabby-images/ab456/ab456d7b185e9d28a958835d5e138015926e5808" alt=""
On 12.04.2021 12:13, Stéfane Fermigier wrote:
When writing code against an API you typically read the documentation. Without knowing which APIs you need or how they are used, a list of public APIs won't really get you to running code :-)
In those cases, I'd argue that such libraries then do not really care for a specific public API either :-)
I just listed some variants. If you want to capture all nuances, you'd end up creating a mini-language specifically for defining the public APIs... then why not just write this down in the documentation rather than inventing a new DSL ?
Sure, but again: this is just another aspect to consider. There can be plenty more, e.g. say some APIs are only available on specific platforms, or may only be used when certain external libraries are present or configurations are used. Or let's say that a specific API only works on CPython, but not in PyPy, or Jython. And the platform your IDE is running on may not have those requirements enabled or available. Should those "public API" attributes then include all variants or just the ones which do work on your platform ? (playing devil's advocate here) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Apr 12 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On Mon, 12 Apr 2021 at 11:41, M.-A. Lemburg <mal@egenix.com> wrote:
Is the problem here that we're trying to apply a technical solution to what is, in practice, a people problem? I don't think I've ever seen (in any language) a system of declaring names as public/private/whatever that substituted well for writing (and reading!) good documentation... At best, hiding stuff makes people work a bit harder to write bad code :-) Paul
data:image/s3,"s3://crabby-images/83003/83003405cb3e437d91969f4da1e4d11958d94f27" alt=""
On 2021-04-12 03:13, Stéfane Fermigier wrote:
Then they're buggy. I'm not convinced by the argument that "in practice" people do things they shouldn't do and that therefore we should encourage them to do more of that.
2) People don't read the docs (at least not always, and/or not in details).
Insofar as someone relies on behavior other than that given in the docs, they are being foolish. Again, I'm not convinced by the argument that "in practice" people do foolish things and that therefore we should encourage them to do more of that. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Wow, super helpful response. On Mon, Apr 12, 2021 at 1:26 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
data:image/s3,"s3://crabby-images/5f8b2/5f8b2ad1b2b61ef91eb396773cce6ee17c3a4eca" alt=""
On Mon, 12 Apr 2021 at 21:27, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
A library can be useful and buggy at the same time or it can be useful but have some parts that are buggy and less useful. Ideally if some code is (potentially) useful but buggy then someone would come along and make it less buggy. If one of the deficiencies of the code to be improved is that it does not have a clear distinction between public and internal API then that task can be made much more difficult.
Maybe the docs were not so clear or maybe the library just has a lot of users or maybe some combination of the two. Either way it's much better for everyone involved if the code can be improved or extended without breaking things for people who are already using it. Standardising on simple practice that helps to make that happen is no bad thing. Oscar
data:image/s3,"s3://crabby-images/52acc/52acc955a1cbe88973a951325c7e9ca1c0c76e28" alt=""
So I think there are multiple behaviors that are being described here and I think there is validity in being able to potentially patch/modify definitions when absolutely necessary, but I wonder if there's room here for something in-between (also, I find the 'import export' or 'export import' statements quite hard to read. Please don't combine 2 verbs in one statement) Here's my preference (which admittedly is partially taken from NodeJS/Javascript): """ # Base examples export x = 2 export x as y # Export a previous name under a new name (not quoted to emphasize it must be a valid variable name, not arbitrary string) export class MyClass: pass export DEFAULT = MyClass() DEFAULT.setup_with_defaults() # exported entities named within this module are named entities # Exporting from other modules from .my_submodule export * # This is just syntactic sugar similar to "from .my_submodule import *" where in this case it's a pure forwarding reference. Debatable on whether or not the found name are assigned to in the current local namespace from .my_other_submodule export ThisInterface as ThatInterface """ That said, I'm very new here and understand there are likely multiple edge-cases I'm not considering. But I am a very interested stakeholder in the result :-) Cheers, -Jeff On Mon, Apr 12, 2021 at 2:36 PM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
data:image/s3,"s3://crabby-images/efe10/efe107798b959240e12a33a55e62a713508452f0" alt=""
This is a great discussion. However, there's one issue that seems to be getting lost. Consider this library structure: test/lemon test/lemon/__init__.py test/module.py where test/module.py is: __all__ = ['lemon'] def lemon(): pass and test/__init__.py is: from .module import * from .lemon import * and test/lemon/__init__.py is empty. What do you think test.lemon is? It's not the function because it's hidden by the eponymous package. Also, test.module is exported whether you wanted to export it or not. This is the problem with any mechanism like __all__ or export. I think the main motivation behind the Jax team tucking everything in _src is that it hides the package names and module names from the public interface. I think I'm going to start only using __all__ for collecting private symbols into packages under a _src package. For the external interface, I'm going to explicitly import from dummy modules. I'd love to know if something better exists, but I don't think the export keyword works because of the above problem. Best, Neil On Mon, Apr 12, 2021 at 7:39 AM Joseph Martinot-Lagarde <contrebasse@gmail.com> wrote:
data:image/s3,"s3://crabby-images/3c316/3c31677f0350484505fbc9b436d43c966f3627ad" alt=""
Theia Vogel wrote:
I like the @public decorator like https://public.readthedocs.io/en/latest/index.html (which came from https://bugs.python.org/issue22247). The fact that it's at the definition of a function (or constant) makes it quite enjoyable to use. Joseph
participants (15)
-
Brendan Barnwell
-
Christopher Barker
-
Ethan Furman
-
Guido van Rossum
-
Jeff Edwards
-
Jonathan Crall
-
Joseph Martinot-Lagarde
-
M.-A. Lemburg
-
Neil Girdhar
-
Oscar Benjamin
-
Paul Moore
-
Rob Cliffe
-
Stestagg
-
Stéfane Fermigier
-
Theia Vogel