
I thought that the name in a module is in the public interface if: * It doesn't start with an underscore and the module does not have __all__. * It is included in the module's __all__ list. * It is explicitly documented as a part of the public interface. help() uses more complex rules, but it believes __all__ if it defined. But seems there are different views on this. * Raymond suggested to add an underscore the two dozens of names in the calendar module not included in __all__. https://bugs.python.org/issue28292#msg347758 I do not like this idea, because it looks like a code churn and makes the code less readable. * Gregory suggests to document codecs.escape_decode() despite it is not included in __all__. https://bugs.python.org/issue30588 I do not like this idea, because this function always was internal, its only purpose was implementing the "string-escape" codec which was removed in Python 3 (for reasons). In Python 3 it is only used for supporting the old pickle protocol 0. Could we strictly define what is considered a public module interface in Python?

On Jul 13, 2019, at 1:56 PM, Serhiy Storchaka <storchaka@gmail.com> wrote:
Could we strictly define what is considered a public module interface in Python?
The RealDefinition™ is that whatever we include in the docs is public, otherwise not. Beyond that, there is a question of how users can deduce what is public when they run "import somemodule; print(dir(some module))". In some modules, we've been careful to use both __all__ and to use an underscore prefix to indicate private variables and helper functions (collections and random for example). IMO, when a module has shown that care, future maintainers should stick with that practice. The calendar module is an example of where that care was taken for many years and then a recent patch went against that practice. This came to my attention when an end-user questioned which functions were for internal use only and posted their question on Twitter. On the tracker, I then made a simple request to restore the module's convention but you seem steadfastly resistant to the suggestion. When we do have evidence of user confusion (as in the case with the calendar module), we should just fix it. IMO, it would be an undue burden on the user to have to check every method in dir() against the contents of __all__ to determine what is public (see below). Also, as a maintainer of the module, I would not have found it obvious whether the functions were public or not. The non-public functions look just like the public ones. It's true that the practices across the standard library have historically been loose and varied (__all__ wasn't always used and wasn't always kept up-to-date, some modules took care with private underscore names and some didn't). To me this has mostly worked out fine and didn't require a strict rule for all modules everywhere. IMO, there is no need to sweep through the library and change long-standing policies on existing modules. Raymond ----------------------------------

On Jul 13, 2019, at 19:09, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
EIBTI <wink> Shameless plug: https://public.readthedocs.io/en/latest/ -Barry

On 16.07.19 00:32, Barry Warsaw wrote:
Hey, what a fantastic little module! I'll hurry and use that a lot! Especially the builtins idea is really great :-P Cheers - Chris p.s.: How about adding @private as well? There are cases where I would like to do the opposite: __all__ = dir() @private _some_private_func_1(...): ... ... @private _some_private_func_n(...): ... not-too-seriously yours - Chris -- Christian Tismer :^) tismer@stackless.com Software Consulting : http://www.stackless.com/ Karl-Liebknecht-Str. 121 : https://github.com/PySide 14482 Potsdam : GPG key -> 0xFB7BEE0E phone +49 173 24 18 776 fax +49 (30) 700143-0023

Raymond Hettinger wrote:
I agree with Raymond that if the calendar module was following the leading underscore practice (which we should probably encourage all new modules to follow for consistency going forward) then I think the module should be updated to keep the practice going. -Brett

PEP 8 would concur, whatever the current preferred style was. Under "Naming Conventions": """New modules and packages (including third party frameworks) should be written to these standards, but where an existing library has a different style, internal consistency is preferred.""" The requirement for internal consistency (essential for readability in code of any size) alone justifies Raymond's wish to update it. On Wed, Jul 17, 2019 at 1:32 AM Brett Cannon <brett@python.org> wrote:

Brett Cannon wrote:
Rather than it being on a case-by-case basis, would it be reasonable to establish a universal standard across stdlib for defining modules as public to apply to older modules as well? I think that it would prove to be quite beneficial to create an explicit definition of what is considered public. If we don't, there is likely to be further confusion on this topic, particularly from users. There would be some overhead cost associated with ensuring that every non-public function is is proceeded by an underscore, but adding public functions to __all__ could safely be automated with something like this (https://bugs.python.org/issue29446#msg287049): __all__ = [name for name, obj in globals().items() if not name.startswith('_') and not isinstance(obj, types.ModuleType)] or a bit more lazily: __all__ = [name for name in globals() if not name.startswith('_')] Personally, I think the benefit of avoiding confusion on this issue and providing consistency to users would far outweigh the cost of implementing it.

20.07.19 09:03, Kyle Stanley пише:
__all__ is not needed if we can make all public names non-undescored and all non-public names underscored. The problem in issue29446 is that we can't do this in case of tkinter. We can't add an underscore to "wantobjects", because this name is the part of the public interface, but we also do not want to make it imported by the star import. So we need __all__ which includes all "normal" public names except "wantobjects".

On Sat, Jul 20, 2019 at 06:03:39AM -0000, Kyle Stanley wrote:
No, I don't think so. That would require code churn to "fix" modules which aren't currently broken, and may never be. It also requires meeting a standard that doesn't universally apply: 1. __all__ is optional, not mandatory. 2. __all__ is a list of names to import during star imports, not a list of public names; while there is some overlap, we should not assume that the two will always match. 3. Imported modules are considered private, regardless of whether they are named with a leading underscore or not; 4. Unless they are considered public, such as os.path. 5. While I don't know of any top-level examples of this, there are cases in the std lib where single-underscore names are considered public, such as the namedtuple interface. So in principle at least, a module might include a single-underscore name in its __all__. 6. Dunder names are not private, and could appear in __all__.
And you've just broken about a million scripts and applications that use os.path. As well as any modules which export public dunder names, for example sys.__stdout__ and friends, since your test for a private name may be overzealous. -- Steven

17.07.19 03:26, Brett Cannon пише:
I agree with Raymond that if the calendar module was following the leading underscore practice (which we should probably encourage all new modules to follow for consistency going forward) then I think the module should be updated to keep the practice going.
But it was not.

14.07.19 05:09, Raymond Hettinger пише:
Run "help(some module)" or read the module documentation. dir() is not proper tool for getting the public interface. https://docs.python.org/3/library/functions.html#dir * If the object is a module object, the list contains the names of the module’s attributes. It does not say about publicly.
In some modules, we've been careful to use both __all__ and to use an underscore prefix to indicate private variables and helper functions (collections and random for example). IMO, when a module has shown that care, future maintainers should stick with that practice.
Either we establish the rule that all non-public names must be underscored, and do mass renaming through the whole stdlib. Or allow to use non-underscored names for internal things and leave the sources in peace. Note also that underscored names can be a part of the public interface (for example namedtuple._replace).
The calendar module is an example of where that care was taken for many years and then a recent patch went against that practice. This came to my attention when an end-user questioned which functions were for internal use only and posted their question on Twitter. On the tracker, I then made a simple request to restore the module's convention but you seem steadfastly resistant to the suggestion.
There was never such convention. Before that changes there were non-underscored non-public members in the module. In Python 3.6:
When we do have evidence of user confusion (as in the case with the calendar module), we should just fix it.
The main source of user confusion is not reading the documentation. Recent examples: https://bugs.python.org/issue37620, https://bugs.python.org/issue37623.
IMO, it would be an undue burden on the user to have to check every method in dir() against the contents of __all__ to determine what is public (see below).
Just do not use dir() for this. It returns the list of attributes of the object. Use __all__ or help().
Also, as a maintainer of the module, I would not have found it obvious whether the functions were public or not. The non-public functions look just like the public ones.
As you said, public names are explicitly documented.

Serhiy Storchaka wrote:
Personally, I would be the most in favor of doing a mass renaming through stdlib, at least for any public facing modules (if they don't start with an underscore, as that already implies the entire module is internal). Otherwise, I have a feeling similar issues will be brought up repeatedly by confused end-users. This change would also follow the guideline of "Explicit is better than implicit" by explicitly defining any function in a public-facing module as private or public through the existence or lack of an underscore. There would be some cost associated with implementing this change, but it would definitely be worthwhile if it settled the public vs private misunderstandings.

Brett Cannon wrote:
Good point, this would probably have to be a gradual change if it did happen, rather than being at all once. If it were to happen with a proper deprecation cycle and clear communication, I think it would result in significantly less confusion overall and provide a higher degree of consistency across stdlib in the long term.

On Mon, Jul 22, 2019 at 10:02:12PM -0000, Kyle Stanley wrote about renaming non-public names with a leading underscore:
You say "significantly less confusion" but I don't think there's much confusion in the first place, so any benefit will be a marginal improvement in clarity, not solving a serious problem. In all my years of using Python, I can only recall one major incident where confusion between public/non-public naming conventions caused a serious problem -- and the details are fuzzy. One of the std lib modules had imported a function for internal use, and then removed it, breaking code that wrongly treated it as part of the public API. So it clearly *does* happen that people misuse private implementation details, but frankly that's going to happen even if we named them all "_private_implementation_detail_dont_use_this_you_have_been_warned" *wink* So in my opinion, "fixing" the std lib to prefix all non-public names to use a leading underscore is going to have only a minor benefit. To that we have to counter with the costs: 1. Somebody has to do the work, review the changes, etc and that takes time and energy that might be better spent on other things. 2. Most abuses of non-public names are harmless; but by deprecating and then changing the name, we guarantee that we'll break people's code, even if it otherwise would have happily worked for years. (Even if it is *strictly speaking* wrong and bad for people to use non-public names in their code, "no harm, no foul" applies: we shouldn't break their code just to make a point.) 3. Leading underscores adds a cost when reading code: against the clarity of "its private" we have the physical cost of reading underscores and the question of how to pronounce them. 4. And the significant cost when writing code. Almost all imports will have to be written like these: import sys as _sys from math import sin as _sin, cos as _cos as well as the mental discipline to ensure that every non-public name in the module is prefixed with an underscore. Let me be frank: you want to shift responsibility of recognising public APIs from the user of the code to the author of the code. Regardless of whether it is worthwhile or not, that's a real cost for developers. That cost is why we have the rule "unless explicitly documented public, all imports are private even if not prefixed with an underscore". All these costs strongly suggest to me that we shouldn't try to "fix" the entire std lib. Let's fix issues when and as they come up, on a case-by-case basis, rather than pre-emptively churning a lot of code. -- Steven

Upon further consideration and reading your response, I'm starting to think that the proposal to perform a mass renaming across stdlib might have been a bit too drastic, even if it was done over a longer period of time. Thanks for the detailed explanation of the costs, that significantly improved my understanding of the situation. My primary motivation was to provide more explicit declaration of public vs private, not only for the purpose of shifting the responsibility to the authors, but also to shift the liability of using private members to the user. From my perspective, if the communication is 100% clear that a particular function is not public, the developers are able to make changes to it more easily without being as concerned about the impact it will have on users. Nothing prevents the users from using it anyways, but if a change that occurs to a private function breaks their functionality, it's completely on them. With the current system, users can potentially make the argument that they weren't certain that it the function or module in question was private. Being concerned about breaking the functionality for users on non-public functions seems to entirely defeat the purpose of them. I also dislike the idea of adding the underscores or dealing with it on a case-by-case basis, due to the inconsistency it would provide across stdlib. In some cases the inconsistency might be necessary, but I'd rather avoid it if possible. Also, is the rule "unless explicitly documented public, all imports are private even if not prefixed with an underscore" officially stated anywhere, or is it mostly implied? Personally, I think that it should be explicitly stated in a public manner if it's the methodology being followed. A solid alternative proposal would also be Barry's public decorator proposal: https://public.readthedocs.io/en/latest/. I remember him saying that it was largely rejected by the community when it was proposed, but I'm not certain as to why. It would be far easier to implement something like this than it would be to rename all of the non-public functions.

On Tue, 23 Jul 2019 at 04:58, Kyle Stanley <aeros167@gmail.com> wrote:
My primary motivation was to provide more explicit declaration of public vs private, not only for the purpose of shifting the responsibility to the authors, but also to shift the liability of using private members to the user.
My view is that the current somewhat adhoc, "consenting adults" approach has served us well for many years now. There have been a few cases where we've needed to address specific points of confusion, but mostly things have worked fine. With Python's increased popularity, there has been an influx of new users with less familiarity with Python's easy-going attitude, and consequently an increase in pressure for more "definite", or "explicit" rules. While it's great to see newcomers arrive with new ideas, and it's important to make their learning experience as pleasant as possible, we should also make sure that we don't lose the aspects of Python that *made* it popular in the process. And to my mind, that easy-going, "assume the users know what they are doing" attitude is a key part of Python's appeal. So I'm -1 on any global change of this nature, particularly if it is motivated by broad, general ideas of tightening up rules or making contracts more explicit rather than a specific issue. The key point about making changes on a "case by case" basis is *not* about doing bits of the fix when needed, but about having clear, practical issues that need addressing, to guide the decision on what particular fix is appropriate in any given situation. Paul

FWIW, I actually like the idea - though not strongly enough to really campaign for it. My reasoning is that I think that both the current "consenting adults" policy and possibly more importantly the fact that we are implicitly supporting private interfaces by our reluctance to changing them has harmed the ecosystem of Python interpreters. Because the lines between implementation details and deliberate functionality are very fuzzy, alternate implementations need to go out of their way to be something like "bug compatible" with CPython. Of course, there are all kinds of other psychological and practical reasons that are preventing a flourishing ecosystem of alternative Python implementations, but I do think that we could stand to be more strict about reliance on implementation details as a way of standing up for people who don't have the resources or market position to push people to write their code in a way that's compatible with multiple implementations. I'll note that I am basically neutral on the idea of consistency across the codebase as a goal - it would be nice but there are too many inconsistencies even in the public portion of the API for us to ever actually achieve it, so I don't think it's critical. The main reason I like the idea is that I /do/ think that there are a lot of people who use "does it start with an underscore" as their only heuristic for whether or not something is private (particularly since that is obvious to assess no matter how you access the function/method/attribute/class, whereas `__all__` is extra work and many people don't know its significance). Yes, they are just as wrong as people who we would be breaking by sweeping changes to the private interface, but the rename would prevent more /accidental/ reliance on implementation details. On 7/23/19 3:27 AM, Paul Moore wrote:

On 22Jul2019 2051, Kyle Stanley wrote:
Also, is the rule "unless explicitly documented public, all imports are private even if not prefixed with an underscore" officially stated anywhere, or is it mostly implied? Personally, I think that it should be explicitly stated in a public manner if it's the methodology being followed.
I'm not sure if it is, which probably means it isn't. But I agree this should be the rule as it implicitly gives us the minimal public API upon definition and it is very easy to add in anything else that ought to have been there.
A solid alternative proposal would also be Barry's public decorator proposal: https://public.readthedocs.io/en/latest/. I remember him saying that it was largely rejected by the community when it was proposed, but I'm not certain as to why. It would be far easier to implement something like this than it would be to rename all of the non-public functions.
The @public decorator is basically: def public(fn): __all__.append(fn.__name__) return fn It's trivial, but it adds a runtime overhead that is also trivially avoided by putting the name in __all__ manually. And once it's public API, we shouldn't be making it too easy to rename the function anyway ;) Cheers, Steve

On 07/23/2019 08:44 AM, Steve Dower wrote:
The run-time overhead added by executing @public is trivially trivial. ;) But your argument about the too-easy change of a public API strikes home. I think a safer @public would be one that verifies the function is in `__all__` and raises if it is not. -- ~Ethan~

On Jul 23, 2019, at 09:20, Ethan Furman <ethan@stoneleaf.us> wrote:
That actually defeats the purpose of @public IMHO. There should be exactly one source of truth, and that ought to be the @public decorator. From https://public.readthedocs.io/en/latest/ “”" __all__ has two problems. First, it separates the declaration of a name’s public export semantics from the implementation of that name. Usually the __all__ is put at the top of the module, although this isn’t required, and in some cases it’s actively prohibited. So when you’re looking at the definition of a function or class in a module, you have to search for the __all__ definition to know whether the function or class is intended for public consumption. This leads to the second problem, which is that it’s too easy for the __all__ to get out of sync with the module’s contents. Often a function or class is renamed, removed, or added without the __all__ being updated. Then it’s difficult to know what the module author’s intent was, and it can lead to an exception when a string appearing in __all__ doesn’t match an existing name in the module. Some tools like Sphinx will complain when names appear in __all__ don’t appear in the module. All of this points to the root problem; it should be easy to keep __all__ in sync! “”” Think of it this way: __all__ is an implementation detail, and @public is the API for extending it. Cheers, -Barry

On 07/23/2019 10:21 AM, Barry Warsaw wrote:
On Jul 23, 2019, at 09:20, Ethan Furman wrote:
On 07/23/2019 08:44 AM, Steve Dower wrote:
In other words, we should be smart enough to not change the name of the function preceded by an `@public` ? Yeah, I can see that argument, too. ;-) -- ~Ethan~

Barry Warsaw wrote:
IMO, this seems to be the best part of the @public decorator, at least from a general user's perspective. Manually having to update __all__ anytime something is added, removed, or has its name modified adds an extra step to easily forget. Also, those coming from other languages would be far more likely to recognize the significance of @public, the primary purpose of the decorator is quite clear based on it's name alone. Barry Warsaw wrote:
My package has a C version. If public() were a builtin (which I’ve implemented) it wouldn’t have that much import time overhead.
Is there a way we could estimate the approximate difference in overhead this would add using benchmarks? If it's incredibly trivial, I'd say it's worthwhile in order to make the public API members more explicitly declared, and provide a significant QoL improvement for maintaining __all__. While we should strive to optimize the language as much as possible, as far as I'm aware, Python's main priority has been readability and convenience rather than pure performance.

On 23Jul2019 1128, Kyle Stanley wrote:
Even if the performance impact is zero, commits that span the entire codebase for not-very-impactful changes have a negative impact on readability (for example, someone will suddenly become responsible for every single module as far as some blame tools are concerned - including github's suggested reviewers). I'm inclined to think this one would be primarily negative. We already maintain separate documentation from the source code, and this is the canonical reference for what is public or not. Until we make a new policy for __all__ to be the canonical reference, touching every file to use it is premature (let alone adding a builtin). So I apologise for mentioning that people care about import performance. Let's ignore them/that issue for now and worry instead about making sure people (including us!) know what the canonical reference for public/internal is. Cheers, Steve

Steve Dower wrote:
Good point, the discussion about __all__, adding the @public decorator, and worrying about performance impacts might be jumping too far ahead. For now, if most of the core devs are in agreement with the current unwritten rule of "unless explicitly documented public, all imports are private even if not prefixed with an underscore", I think the first priority should be to document it officially somewhere. That way, other developers and any potential confused users can be referred to it. It might not be the best long term solution, but it would require no code changes to be made and provide a written canonical reference for differentiating public vs private. I'm not certain as to where the most appropriate location for the rule would be, let me know if anyone has any suggestions.

On Wed, 24 Jul 2019 at 06:32, Kyle Stanley <aeros167@gmail.com> wrote:
It's not an unwritten rule, as it already has its own subsection in PEP 8: https://www.python.org/dev/peps/pep-0008/#public-and-internal-interfaces The main question in this thread is what to do about standard library modules that were written before those documented guidelines were put in place, and hence have multiple internal APIs that lack the leading underscore (and I don't think that's a question with a generic answer). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
Oh I see, thanks for the clarification. I've read over most of PEP 8 a few times at this point, but somehow I missed this part: "All undocumented interfaces should be assumed to be internal". Apologies for that. Personally, I think the stdlib modules which were written before the rule should be gradually updated, with each one being it's own issue with the respective experts for each of the modules carefully monitoring the changes. It would also be appropriate to provide any user attempting to import a module that is going to be prepended with an underscore with warnings, and at least a couple of versions to update their code.

Kyle Stanley wrote:
Clarification: When I mentioned prepending a module with an underscore, I meant for functions and classes within the module, not the module itself. It might be difficult to implement this in a way which does not cause an excessive number of warnings, but I think it's definitely worthwhile to aim towards having a fully consistent standard for differentiating public and private interfaces across all of stdlib.

On Jul 23, 2019, at 12:02, Steve Dower <steve.dower@python.org> wrote:
Even if the performance impact is zero, commits that span the entire codebase for not-very-impactful changes have a negative impact on readability (for example, someone will suddenly become responsible for every single module as far as some blame tools are concerned - including github's suggested reviewers). I'm inclined to think this one would be primarily negative.
If we were to adopt @public, its inclusion in the stdlib would follow the precedence we already have for non-functional changes (e.g. whitespace, code cleanup, etc.). It definitely shouldn’t be done willy nilly but if the opportunity arises, e.g. because someone is already fixing bugs or modernizing a module, then it would be fair game to add @public decorators. Of course, you can’t do that if it’s not available. :)
We already maintain separate documentation from the source code, and this is the canonical reference for what is public or not. Until we make a new policy for __all__ to be the canonical reference, touching every file to use it is premature (let alone adding a builtin).
Agreed, sort of. We’ve had lots of cases of grey areas though, where the documentation doesn’t match the source. The question always becomes whether the source or the documentation is the source of truth. For any individual case, we don’t always come down on the same side of that question.
So I apologise for mentioning that people care about import performance. Let's ignore them/that issue for now and worry instead about making sure people (including us!) know what the canonical reference for public/internal is.
+1 -Barry

My apologies for not having read this very large thread before posting, but hopefully this small note won't be adding too much fuel to the fire: Earlier this year I created an extremely small project called "publication" (https://pypi.org/project/publication/ <https://pypi.org/project/publication/>, https://github.com/glyph/publication <https://github.com/glyph/publication>) which attempts to harmonize the lofty ideal of "only the names explicitly mentioned in __all__ and in the documentation!" with the realpolitik of "anything I can reach where I don't have to type a leading underscore is fair game". It simply removes the ability to externally invoke non-__all__-exported names without importing an explicitly named "._private" namespace. It does not add any new syntactic idiom like a @public decorator (despite the aesthetic benefits of doing something like that) so that existing IDEs, type checkers, refactoring tools, code browsers etc can use the existing __all__ idiom and not break. It intentionally doesn't try hard to hide the implementation; it's still Python and if you demonstrate that you know what you're doing you're welcome to all the fiddly internals, it just makes sure you know that that's what you're getting. While I am perhaps infamously a stdlib contrarian ;-) this project is a single module with extremely straightforward, stable semantics which I would definitely not mind being copy/pasted into the stdlib wholesale, either under a different (private, experimental) name, or even under its current one if folks like it. I'd be very pleased if this could solve the issue for the calendar module. Thanks for reading, -glyph

On Jul 13, 2019, at 1:56 PM, Serhiy Storchaka <storchaka@gmail.com> wrote:
Could we strictly define what is considered a public module interface in Python?
The RealDefinition™ is that whatever we include in the docs is public, otherwise not. Beyond that, there is a question of how users can deduce what is public when they run "import somemodule; print(dir(some module))". In some modules, we've been careful to use both __all__ and to use an underscore prefix to indicate private variables and helper functions (collections and random for example). IMO, when a module has shown that care, future maintainers should stick with that practice. The calendar module is an example of where that care was taken for many years and then a recent patch went against that practice. This came to my attention when an end-user questioned which functions were for internal use only and posted their question on Twitter. On the tracker, I then made a simple request to restore the module's convention but you seem steadfastly resistant to the suggestion. When we do have evidence of user confusion (as in the case with the calendar module), we should just fix it. IMO, it would be an undue burden on the user to have to check every method in dir() against the contents of __all__ to determine what is public (see below). Also, as a maintainer of the module, I would not have found it obvious whether the functions were public or not. The non-public functions look just like the public ones. It's true that the practices across the standard library have historically been loose and varied (__all__ wasn't always used and wasn't always kept up-to-date, some modules took care with private underscore names and some didn't). To me this has mostly worked out fine and didn't require a strict rule for all modules everywhere. IMO, there is no need to sweep through the library and change long-standing policies on existing modules. Raymond ----------------------------------

On Jul 13, 2019, at 19:09, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
EIBTI <wink> Shameless plug: https://public.readthedocs.io/en/latest/ -Barry

On 16.07.19 00:32, Barry Warsaw wrote:
Hey, what a fantastic little module! I'll hurry and use that a lot! Especially the builtins idea is really great :-P Cheers - Chris p.s.: How about adding @private as well? There are cases where I would like to do the opposite: __all__ = dir() @private _some_private_func_1(...): ... ... @private _some_private_func_n(...): ... not-too-seriously yours - Chris -- Christian Tismer :^) tismer@stackless.com Software Consulting : http://www.stackless.com/ Karl-Liebknecht-Str. 121 : https://github.com/PySide 14482 Potsdam : GPG key -> 0xFB7BEE0E phone +49 173 24 18 776 fax +49 (30) 700143-0023

Raymond Hettinger wrote:
I agree with Raymond that if the calendar module was following the leading underscore practice (which we should probably encourage all new modules to follow for consistency going forward) then I think the module should be updated to keep the practice going. -Brett

PEP 8 would concur, whatever the current preferred style was. Under "Naming Conventions": """New modules and packages (including third party frameworks) should be written to these standards, but where an existing library has a different style, internal consistency is preferred.""" The requirement for internal consistency (essential for readability in code of any size) alone justifies Raymond's wish to update it. On Wed, Jul 17, 2019 at 1:32 AM Brett Cannon <brett@python.org> wrote:

Brett Cannon wrote:
Rather than it being on a case-by-case basis, would it be reasonable to establish a universal standard across stdlib for defining modules as public to apply to older modules as well? I think that it would prove to be quite beneficial to create an explicit definition of what is considered public. If we don't, there is likely to be further confusion on this topic, particularly from users. There would be some overhead cost associated with ensuring that every non-public function is is proceeded by an underscore, but adding public functions to __all__ could safely be automated with something like this (https://bugs.python.org/issue29446#msg287049): __all__ = [name for name, obj in globals().items() if not name.startswith('_') and not isinstance(obj, types.ModuleType)] or a bit more lazily: __all__ = [name for name in globals() if not name.startswith('_')] Personally, I think the benefit of avoiding confusion on this issue and providing consistency to users would far outweigh the cost of implementing it.

20.07.19 09:03, Kyle Stanley пише:
__all__ is not needed if we can make all public names non-undescored and all non-public names underscored. The problem in issue29446 is that we can't do this in case of tkinter. We can't add an underscore to "wantobjects", because this name is the part of the public interface, but we also do not want to make it imported by the star import. So we need __all__ which includes all "normal" public names except "wantobjects".

On Sat, Jul 20, 2019 at 06:03:39AM -0000, Kyle Stanley wrote:
No, I don't think so. That would require code churn to "fix" modules which aren't currently broken, and may never be. It also requires meeting a standard that doesn't universally apply: 1. __all__ is optional, not mandatory. 2. __all__ is a list of names to import during star imports, not a list of public names; while there is some overlap, we should not assume that the two will always match. 3. Imported modules are considered private, regardless of whether they are named with a leading underscore or not; 4. Unless they are considered public, such as os.path. 5. While I don't know of any top-level examples of this, there are cases in the std lib where single-underscore names are considered public, such as the namedtuple interface. So in principle at least, a module might include a single-underscore name in its __all__. 6. Dunder names are not private, and could appear in __all__.
And you've just broken about a million scripts and applications that use os.path. As well as any modules which export public dunder names, for example sys.__stdout__ and friends, since your test for a private name may be overzealous. -- Steven

17.07.19 03:26, Brett Cannon пише:
I agree with Raymond that if the calendar module was following the leading underscore practice (which we should probably encourage all new modules to follow for consistency going forward) then I think the module should be updated to keep the practice going.
But it was not.

14.07.19 05:09, Raymond Hettinger пише:
Run "help(some module)" or read the module documentation. dir() is not proper tool for getting the public interface. https://docs.python.org/3/library/functions.html#dir * If the object is a module object, the list contains the names of the module’s attributes. It does not say about publicly.
In some modules, we've been careful to use both __all__ and to use an underscore prefix to indicate private variables and helper functions (collections and random for example). IMO, when a module has shown that care, future maintainers should stick with that practice.
Either we establish the rule that all non-public names must be underscored, and do mass renaming through the whole stdlib. Or allow to use non-underscored names for internal things and leave the sources in peace. Note also that underscored names can be a part of the public interface (for example namedtuple._replace).
The calendar module is an example of where that care was taken for many years and then a recent patch went against that practice. This came to my attention when an end-user questioned which functions were for internal use only and posted their question on Twitter. On the tracker, I then made a simple request to restore the module's convention but you seem steadfastly resistant to the suggestion.
There was never such convention. Before that changes there were non-underscored non-public members in the module. In Python 3.6:
When we do have evidence of user confusion (as in the case with the calendar module), we should just fix it.
The main source of user confusion is not reading the documentation. Recent examples: https://bugs.python.org/issue37620, https://bugs.python.org/issue37623.
IMO, it would be an undue burden on the user to have to check every method in dir() against the contents of __all__ to determine what is public (see below).
Just do not use dir() for this. It returns the list of attributes of the object. Use __all__ or help().
Also, as a maintainer of the module, I would not have found it obvious whether the functions were public or not. The non-public functions look just like the public ones.
As you said, public names are explicitly documented.

Serhiy Storchaka wrote:
Personally, I would be the most in favor of doing a mass renaming through stdlib, at least for any public facing modules (if they don't start with an underscore, as that already implies the entire module is internal). Otherwise, I have a feeling similar issues will be brought up repeatedly by confused end-users. This change would also follow the guideline of "Explicit is better than implicit" by explicitly defining any function in a public-facing module as private or public through the existence or lack of an underscore. There would be some cost associated with implementing this change, but it would definitely be worthwhile if it settled the public vs private misunderstandings.

Brett Cannon wrote:
Good point, this would probably have to be a gradual change if it did happen, rather than being at all once. If it were to happen with a proper deprecation cycle and clear communication, I think it would result in significantly less confusion overall and provide a higher degree of consistency across stdlib in the long term.

On Mon, Jul 22, 2019 at 10:02:12PM -0000, Kyle Stanley wrote about renaming non-public names with a leading underscore:
You say "significantly less confusion" but I don't think there's much confusion in the first place, so any benefit will be a marginal improvement in clarity, not solving a serious problem. In all my years of using Python, I can only recall one major incident where confusion between public/non-public naming conventions caused a serious problem -- and the details are fuzzy. One of the std lib modules had imported a function for internal use, and then removed it, breaking code that wrongly treated it as part of the public API. So it clearly *does* happen that people misuse private implementation details, but frankly that's going to happen even if we named them all "_private_implementation_detail_dont_use_this_you_have_been_warned" *wink* So in my opinion, "fixing" the std lib to prefix all non-public names to use a leading underscore is going to have only a minor benefit. To that we have to counter with the costs: 1. Somebody has to do the work, review the changes, etc and that takes time and energy that might be better spent on other things. 2. Most abuses of non-public names are harmless; but by deprecating and then changing the name, we guarantee that we'll break people's code, even if it otherwise would have happily worked for years. (Even if it is *strictly speaking* wrong and bad for people to use non-public names in their code, "no harm, no foul" applies: we shouldn't break their code just to make a point.) 3. Leading underscores adds a cost when reading code: against the clarity of "its private" we have the physical cost of reading underscores and the question of how to pronounce them. 4. And the significant cost when writing code. Almost all imports will have to be written like these: import sys as _sys from math import sin as _sin, cos as _cos as well as the mental discipline to ensure that every non-public name in the module is prefixed with an underscore. Let me be frank: you want to shift responsibility of recognising public APIs from the user of the code to the author of the code. Regardless of whether it is worthwhile or not, that's a real cost for developers. That cost is why we have the rule "unless explicitly documented public, all imports are private even if not prefixed with an underscore". All these costs strongly suggest to me that we shouldn't try to "fix" the entire std lib. Let's fix issues when and as they come up, on a case-by-case basis, rather than pre-emptively churning a lot of code. -- Steven

Upon further consideration and reading your response, I'm starting to think that the proposal to perform a mass renaming across stdlib might have been a bit too drastic, even if it was done over a longer period of time. Thanks for the detailed explanation of the costs, that significantly improved my understanding of the situation. My primary motivation was to provide more explicit declaration of public vs private, not only for the purpose of shifting the responsibility to the authors, but also to shift the liability of using private members to the user. From my perspective, if the communication is 100% clear that a particular function is not public, the developers are able to make changes to it more easily without being as concerned about the impact it will have on users. Nothing prevents the users from using it anyways, but if a change that occurs to a private function breaks their functionality, it's completely on them. With the current system, users can potentially make the argument that they weren't certain that it the function or module in question was private. Being concerned about breaking the functionality for users on non-public functions seems to entirely defeat the purpose of them. I also dislike the idea of adding the underscores or dealing with it on a case-by-case basis, due to the inconsistency it would provide across stdlib. In some cases the inconsistency might be necessary, but I'd rather avoid it if possible. Also, is the rule "unless explicitly documented public, all imports are private even if not prefixed with an underscore" officially stated anywhere, or is it mostly implied? Personally, I think that it should be explicitly stated in a public manner if it's the methodology being followed. A solid alternative proposal would also be Barry's public decorator proposal: https://public.readthedocs.io/en/latest/. I remember him saying that it was largely rejected by the community when it was proposed, but I'm not certain as to why. It would be far easier to implement something like this than it would be to rename all of the non-public functions.

On Tue, 23 Jul 2019 at 04:58, Kyle Stanley <aeros167@gmail.com> wrote:
My primary motivation was to provide more explicit declaration of public vs private, not only for the purpose of shifting the responsibility to the authors, but also to shift the liability of using private members to the user.
My view is that the current somewhat adhoc, "consenting adults" approach has served us well for many years now. There have been a few cases where we've needed to address specific points of confusion, but mostly things have worked fine. With Python's increased popularity, there has been an influx of new users with less familiarity with Python's easy-going attitude, and consequently an increase in pressure for more "definite", or "explicit" rules. While it's great to see newcomers arrive with new ideas, and it's important to make their learning experience as pleasant as possible, we should also make sure that we don't lose the aspects of Python that *made* it popular in the process. And to my mind, that easy-going, "assume the users know what they are doing" attitude is a key part of Python's appeal. So I'm -1 on any global change of this nature, particularly if it is motivated by broad, general ideas of tightening up rules or making contracts more explicit rather than a specific issue. The key point about making changes on a "case by case" basis is *not* about doing bits of the fix when needed, but about having clear, practical issues that need addressing, to guide the decision on what particular fix is appropriate in any given situation. Paul

FWIW, I actually like the idea - though not strongly enough to really campaign for it. My reasoning is that I think that both the current "consenting adults" policy and possibly more importantly the fact that we are implicitly supporting private interfaces by our reluctance to changing them has harmed the ecosystem of Python interpreters. Because the lines between implementation details and deliberate functionality are very fuzzy, alternate implementations need to go out of their way to be something like "bug compatible" with CPython. Of course, there are all kinds of other psychological and practical reasons that are preventing a flourishing ecosystem of alternative Python implementations, but I do think that we could stand to be more strict about reliance on implementation details as a way of standing up for people who don't have the resources or market position to push people to write their code in a way that's compatible with multiple implementations. I'll note that I am basically neutral on the idea of consistency across the codebase as a goal - it would be nice but there are too many inconsistencies even in the public portion of the API for us to ever actually achieve it, so I don't think it's critical. The main reason I like the idea is that I /do/ think that there are a lot of people who use "does it start with an underscore" as their only heuristic for whether or not something is private (particularly since that is obvious to assess no matter how you access the function/method/attribute/class, whereas `__all__` is extra work and many people don't know its significance). Yes, they are just as wrong as people who we would be breaking by sweeping changes to the private interface, but the rename would prevent more /accidental/ reliance on implementation details. On 7/23/19 3:27 AM, Paul Moore wrote:

On 22Jul2019 2051, Kyle Stanley wrote:
Also, is the rule "unless explicitly documented public, all imports are private even if not prefixed with an underscore" officially stated anywhere, or is it mostly implied? Personally, I think that it should be explicitly stated in a public manner if it's the methodology being followed.
I'm not sure if it is, which probably means it isn't. But I agree this should be the rule as it implicitly gives us the minimal public API upon definition and it is very easy to add in anything else that ought to have been there.
A solid alternative proposal would also be Barry's public decorator proposal: https://public.readthedocs.io/en/latest/. I remember him saying that it was largely rejected by the community when it was proposed, but I'm not certain as to why. It would be far easier to implement something like this than it would be to rename all of the non-public functions.
The @public decorator is basically: def public(fn): __all__.append(fn.__name__) return fn It's trivial, but it adds a runtime overhead that is also trivially avoided by putting the name in __all__ manually. And once it's public API, we shouldn't be making it too easy to rename the function anyway ;) Cheers, Steve

On 07/23/2019 08:44 AM, Steve Dower wrote:
The run-time overhead added by executing @public is trivially trivial. ;) But your argument about the too-easy change of a public API strikes home. I think a safer @public would be one that verifies the function is in `__all__` and raises if it is not. -- ~Ethan~

On Jul 23, 2019, at 09:20, Ethan Furman <ethan@stoneleaf.us> wrote:
That actually defeats the purpose of @public IMHO. There should be exactly one source of truth, and that ought to be the @public decorator. From https://public.readthedocs.io/en/latest/ “”" __all__ has two problems. First, it separates the declaration of a name’s public export semantics from the implementation of that name. Usually the __all__ is put at the top of the module, although this isn’t required, and in some cases it’s actively prohibited. So when you’re looking at the definition of a function or class in a module, you have to search for the __all__ definition to know whether the function or class is intended for public consumption. This leads to the second problem, which is that it’s too easy for the __all__ to get out of sync with the module’s contents. Often a function or class is renamed, removed, or added without the __all__ being updated. Then it’s difficult to know what the module author’s intent was, and it can lead to an exception when a string appearing in __all__ doesn’t match an existing name in the module. Some tools like Sphinx will complain when names appear in __all__ don’t appear in the module. All of this points to the root problem; it should be easy to keep __all__ in sync! “”” Think of it this way: __all__ is an implementation detail, and @public is the API for extending it. Cheers, -Barry

On 07/23/2019 10:21 AM, Barry Warsaw wrote:
On Jul 23, 2019, at 09:20, Ethan Furman wrote:
On 07/23/2019 08:44 AM, Steve Dower wrote:
In other words, we should be smart enough to not change the name of the function preceded by an `@public` ? Yeah, I can see that argument, too. ;-) -- ~Ethan~

Barry Warsaw wrote:
IMO, this seems to be the best part of the @public decorator, at least from a general user's perspective. Manually having to update __all__ anytime something is added, removed, or has its name modified adds an extra step to easily forget. Also, those coming from other languages would be far more likely to recognize the significance of @public, the primary purpose of the decorator is quite clear based on it's name alone. Barry Warsaw wrote:
My package has a C version. If public() were a builtin (which I’ve implemented) it wouldn’t have that much import time overhead.
Is there a way we could estimate the approximate difference in overhead this would add using benchmarks? If it's incredibly trivial, I'd say it's worthwhile in order to make the public API members more explicitly declared, and provide a significant QoL improvement for maintaining __all__. While we should strive to optimize the language as much as possible, as far as I'm aware, Python's main priority has been readability and convenience rather than pure performance.

On 23Jul2019 1128, Kyle Stanley wrote:
Even if the performance impact is zero, commits that span the entire codebase for not-very-impactful changes have a negative impact on readability (for example, someone will suddenly become responsible for every single module as far as some blame tools are concerned - including github's suggested reviewers). I'm inclined to think this one would be primarily negative. We already maintain separate documentation from the source code, and this is the canonical reference for what is public or not. Until we make a new policy for __all__ to be the canonical reference, touching every file to use it is premature (let alone adding a builtin). So I apologise for mentioning that people care about import performance. Let's ignore them/that issue for now and worry instead about making sure people (including us!) know what the canonical reference for public/internal is. Cheers, Steve

Steve Dower wrote:
Good point, the discussion about __all__, adding the @public decorator, and worrying about performance impacts might be jumping too far ahead. For now, if most of the core devs are in agreement with the current unwritten rule of "unless explicitly documented public, all imports are private even if not prefixed with an underscore", I think the first priority should be to document it officially somewhere. That way, other developers and any potential confused users can be referred to it. It might not be the best long term solution, but it would require no code changes to be made and provide a written canonical reference for differentiating public vs private. I'm not certain as to where the most appropriate location for the rule would be, let me know if anyone has any suggestions.

On Wed, 24 Jul 2019 at 06:32, Kyle Stanley <aeros167@gmail.com> wrote:
It's not an unwritten rule, as it already has its own subsection in PEP 8: https://www.python.org/dev/peps/pep-0008/#public-and-internal-interfaces The main question in this thread is what to do about standard library modules that were written before those documented guidelines were put in place, and hence have multiple internal APIs that lack the leading underscore (and I don't think that's a question with a generic answer). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
Oh I see, thanks for the clarification. I've read over most of PEP 8 a few times at this point, but somehow I missed this part: "All undocumented interfaces should be assumed to be internal". Apologies for that. Personally, I think the stdlib modules which were written before the rule should be gradually updated, with each one being it's own issue with the respective experts for each of the modules carefully monitoring the changes. It would also be appropriate to provide any user attempting to import a module that is going to be prepended with an underscore with warnings, and at least a couple of versions to update their code.

Kyle Stanley wrote:
Clarification: When I mentioned prepending a module with an underscore, I meant for functions and classes within the module, not the module itself. It might be difficult to implement this in a way which does not cause an excessive number of warnings, but I think it's definitely worthwhile to aim towards having a fully consistent standard for differentiating public and private interfaces across all of stdlib.

On Jul 23, 2019, at 12:02, Steve Dower <steve.dower@python.org> wrote:
Even if the performance impact is zero, commits that span the entire codebase for not-very-impactful changes have a negative impact on readability (for example, someone will suddenly become responsible for every single module as far as some blame tools are concerned - including github's suggested reviewers). I'm inclined to think this one would be primarily negative.
If we were to adopt @public, its inclusion in the stdlib would follow the precedence we already have for non-functional changes (e.g. whitespace, code cleanup, etc.). It definitely shouldn’t be done willy nilly but if the opportunity arises, e.g. because someone is already fixing bugs or modernizing a module, then it would be fair game to add @public decorators. Of course, you can’t do that if it’s not available. :)
We already maintain separate documentation from the source code, and this is the canonical reference for what is public or not. Until we make a new policy for __all__ to be the canonical reference, touching every file to use it is premature (let alone adding a builtin).
Agreed, sort of. We’ve had lots of cases of grey areas though, where the documentation doesn’t match the source. The question always becomes whether the source or the documentation is the source of truth. For any individual case, we don’t always come down on the same side of that question.
So I apologise for mentioning that people care about import performance. Let's ignore them/that issue for now and worry instead about making sure people (including us!) know what the canonical reference for public/internal is.
+1 -Barry

My apologies for not having read this very large thread before posting, but hopefully this small note won't be adding too much fuel to the fire: Earlier this year I created an extremely small project called "publication" (https://pypi.org/project/publication/ <https://pypi.org/project/publication/>, https://github.com/glyph/publication <https://github.com/glyph/publication>) which attempts to harmonize the lofty ideal of "only the names explicitly mentioned in __all__ and in the documentation!" with the realpolitik of "anything I can reach where I don't have to type a leading underscore is fair game". It simply removes the ability to externally invoke non-__all__-exported names without importing an explicitly named "._private" namespace. It does not add any new syntactic idiom like a @public decorator (despite the aesthetic benefits of doing something like that) so that existing IDEs, type checkers, refactoring tools, code browsers etc can use the existing __all__ idiom and not break. It intentionally doesn't try hard to hide the implementation; it's still Python and if you demonstrate that you know what you're doing you're welcome to all the fiddly internals, it just makes sure you know that that's what you're getting. While I am perhaps infamously a stdlib contrarian ;-) this project is a single module with extremely straightforward, stable semantics which I would definitely not mind being copy/pasted into the stdlib wholesale, either under a different (private, experimental) name, or even under its current one if folks like it. I'd be very pleased if this could solve the issue for the calendar module. Thanks for reading, -glyph
participants (15)
-
Barry Warsaw
-
Brett Cannon
-
Christian Tismer
-
Ethan Furman
-
Glyph
-
Ivan Pozdeev
-
Kyle Stanley
-
Nick Coghlan
-
Paul Ganssle
-
Paul Moore
-
Raymond Hettinger
-
Serhiy Storchaka
-
Steve Dower
-
Steve Holden
-
Steven D'Aprano