Making the stdlib consistent again

Hi python-ideas, As you all know, the Python stdlib can sometimes be a bit of an inconsistent mess that can be surprising in how it names things. This is mostly caused by the fact that several modules were developed before the introduction of PEP-8, and now we're stuck with the older naming within these modules. It has been said and discussed in the past [1][2] that the stdlib is in fact inconsistent, but fixing this has almost always been disregarded as being too painful (after all, we don't want a new Python 3 all over again). However, this way, we will never move away from these inconsistencies. Perhaps this is fine, but I think we should at least consider providing function and class names that are unsurprising for developers. While maintaining full backwards compatibility, my idea is that we should offer consistently named aliases in -eventually- all stdlib modules. For instance, with Python 2.6, the threading module received this treatment, but unfortunately this was not expanded to all modules. What am I speaking of precisely? I have done a quick survey of the stdlib and found the following examples. Please note, this is a highly opinionated list; some names may have been chosen with a very good reason, and others are just a matter of taste. Hopefully you agree with at least some of them: * The CamelCasing in some modules are the most obvious culprits, e.g. logging and unittest. There is obviously an issue regarding subclasses and methods that are supposed to be overridden, but I feel we could make it work. * All lower case class names, such as collections.defaultdict and collections.deque, should be CamelCased. Another example is datetime, which uses names such as timedelta instead of TimeDelta. * Inconsistent names all together, such as re.sub, which I feel should be re.replace (cf. str.replace). But also re.finditer and re.findall, but no re.find. * Names that do not reflect actual usage, such as ssl.PROTOCOL_SSLv23, which can in fact not be used as client for SSLv2. * Underscore usage, such as tarfile.TarFile.gettarinfo (should it not be get_tar_info?), http.client.HTTPConnection.getresponse vs set_debuglevel, and pathlib.Path.samefile vs pathlib.Path.read_text. And is it pkgutil.iter_modules or is it pathlib.Path.iterdir (or re.finditer)? * Usage of various abbreviations, such as in filecmp.cmp * Inconsistencies between similar modules, e.g. between tarfile.TarFile.add and zipfile.ZipFile.write. These are just some examples of inconsistent and surprising naming I could find, other categories are probably also conceivable. Another subject for reconsideration would be attribute and argument names, but I haven't looked for those in my quick survey. For all of these inconsistencies, I think we should make a 'consistently' named alternative, and alias the original variant with them (or the other way around), without defining a deprecation timeline for the original names. This should make it possible to eventually make the stdlib consistent, Pythonic and unsurprising. What would you think of such an effort? Regards, Ralph Broenink [1] https://mail.python.org/pipermail/python-ideas/2010-January/006755.html [2] https://mail.python.org/pipermail/python-dev/2009-March/086646.html

On 25 July 2016 at 18:55, Ralph Broenink <ralph@ralphbroenink.net> wrote:
It sounds to me like rather a lot of effort, for limited benefit. I suspect few people will actually start using the new names (as they likely have to continue supporting older versions of Python for a long time yet) and so the benefit will actually be lower than we'd like. Also, you'd need to put a lot of effort into updating the tests - should they use the new or the old names, and how would they test that both names behave identically? So there's probably more effort than you'd think. Also, there's a lot of tutorial material - books, courses, websites - that would need to change. So overall, I think practicality beats purity here, and such a change is not worth the effort. Paul

On 26 July 2016 at 05:23, Paul Moore <p.f.moore@gmail.com> wrote:
So overall, I think practicality beats purity here, and such a change is not worth the effort.
We did go through the effort for the threading module, as we had the problem there that multiprocessing was PEP-8 compliant (with aliases for threading compatibility), but threading only had the old pre-PEP-8 names. A couple of the conclusions that came out of that were: - it was probably worth it in threading's case due to the improved alignment with multiprocessing - it isn't worth the disruption in the general case The "isn't worth it" mainly comes from the fact that these APIs generally *were* compliant with the coding guidelines that existed at the time they were first written, it's just that the guidelines and community expectations have changed since then, so they represent things like "good Python design style circa 1999" (e.g. unittest, logging) rather than "good Python style for today". So if you say "let's update them to 2016 conventions" today, by 2026 you'll just have the same problem again. It ends up being one of those cases where "invest development time now to reduce cognitive burden later" doesn't actually work in practice, as you have to keep the old API around anyway if you don't want to break working code, and that means future learners now have two APIs to learn instead of one (and if you refresh the API to the design du jour ever decade or so, that number keeps going up). Instead, for these older parts of the standard library, it's useful to view them as "interoperability plumbing" (e.g. logging for event stream reporting and management, unittest for test case reporting and management), and look for more modern version independent 3rd party facades if you find the mix of API design eras in the standard library annoying (e.g. pytest as an alternative to raw unittest, structlog over direct use of the logging module) Cheers, Nick. P.S. I sometimes wonder if you could analyse Python API designs over time and group them into eras the way people do with building architectural styles - "This API is a fine example of the stylistic leanings of Python's Java period, which ran from <year> to <year>. As the community moved into the subsequent Haskell period, composition came to be favoured over inheritance, leading to the emergence of alternative APIs like X, Y, and Z, of which, Y ultimately proved to be most popular" :) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
Is that true? Did we really actively advise people to use Java-style APIs in some cases, or was it just that nobody told them otherwise?
You're assuming that the conventions will change just as much in the next 10 years as they did in the last 10. I don't think that's likely -- I would hope we're converging on a set of conventions that's good enough to endure.
It would be sad if Python's motto became "Batteries included (as long as you're happy with Leclanche cells; if you want anything more modern you'll have to look elsewhere)". -- Greg

On 26 July 2016 at 16:53, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Conventions can really only remain stable if their environment doesn't change, and the software development world isn't that much more stable now than it was 20 years ago.
New libraries can still make their way into the standard set, but it makes more sense to do that as *actual new libraries*, rather than simply PEP-8-ifying old ones. It's relatively easy to manage the cognitive burden of argparse vs optparse vs getopt, for example - you know which one people are using based on what they import, and they have different names so it's easy to talk about the trade-offs between them and tell people which one is recommended for new code. Ditto for asyncio vs asynchat and asyncore. We even have a section in the docs specifically for modules that are still around for backwards compatibility, but we don't recommend to new users: https://docs.python.org/3/library/superseded.html So evolving the standard library by introducing new ways of doing things as the community learns better alternatives, and requiring those new ways to be compliant with PEP 8 at the time they're standardised *absolutely* makes sense. The only part I don't think makes sense is modernising the APIs of existing modules purely for the sake of modernising them - there needs to be a stronger justification for expending that much effort (as there was with properly aligning the threading and multiprocessing APIs). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 7/26/16, Nick Coghlan <ncoghlan@gmail.com> wrote:
How could snake survive if not moulting regularly, when old skin is outgrown? I am not argue with you! Just thinking about future of this language and see some analogies in biology. And sorry just could not resist to share this little idea! :)

Ralph, You seem to be vastly underestimating the cost of making backwards incompatible changes like these, while sounding naively optimistic about the value of consistency. Please understand the second section heading of PEP 8. --Guido On Mon, Jul 25, 2016 at 10:55 AM, Ralph Broenink <ralph@ralphbroenink.net> wrote:
-- --Guido van Rossum (python.org/~guido)

I've pined for this (and feel a real mental pain every time I use one of those poorlyCased names)-- I end up using a lot of mental space remembering exactly HOW each stdlib isn't consistent. Aliasing consistent names in each case seems like a real win all around, personally. On Mon, Jul 25, 2016 at 10:55 AM, Ralph Broenink <ralph@ralphbroenink.net> wrote:

On Mon, 25 Jul 2016 at 13:03 Mark Mollineaux <bufordsharkley@gmail.com> wrote:
For those that want consistent names, you could create a PyPI package that is nothing more than the aliased names as suggested. Otherwise I get the desire for consistency, but as pointed out by a bunch of other core devs, we have thought about this many times and always reach the same conclusion that the amount of work and potential code breakage is too great. -Brett

I'm a bit sad that I'm clearly on the loosing side of the argument. I now believe that I must be grossly underestimating the amount of effort and overestimating the potential gain. However, I still feel that we should strive for consistency in the long run. I do not propose to do this at once, but I feel that at least some collaborated effort would be nice. (If not only for this kind of mail threads.) If I would start an effort to - for instance - 'fix' some camelCased modules, and attempt to make it 100% backwards compatible, including tests, would there be any chance it could be merged at some point? Otherwise, I feel it would be a totally pointless effort ;). On Tue, 26 Jul 2016 at 18:29 Brett Cannon <brett@python.org> wrote:

I'm on Ralph's side here. "Why is this thing named the other way?" was one of the first questions I asked. And people whom I occasionally teach about Python, ask the same question over and over again. Code breakage happens (PEP 3151 - didn't know about it till it almost bit my leg off), so we can't shy away from it completely. Is there any link on the previous thoughts of the core devs on the matter? Especially regarding the amount of potential code breakage. I'm genuinely interested, as I think that this amount is negligible if the new names will be gradually introduced along with a deprecation notice on (end eventual removal of) the old ones. As far as I can see, it can only do some harm if someone uses a discouraged "import *" or monkey-patches some new methods into Python standard classes/modules, and updates his Python installation. Regards, Eugene On Mon, Aug 1, 2016 at 1:46 AM, Ralph Broenink <ralph@ralphbroenink.net> wrote:

It was hardly all of the thoughts, or at least with little background information. E.g. "multiprocessing and threading" - I can't find any information on it. Was it done in one go or gradually by introducing new aliases and deprecating old names, at it has been suggested by you (although it was 7 years ago)? The suggestion about aliases+deprecation has been made for other modules, but sadly it ended up with you saying "let's do it this way" - with nothing following. Or "you have to keep the old API around" - what is the exact motivation behind it? If it's "so the code doesn't break", then it's a strange motivation, because as I said, code gets [potentially] broken more often that it doesn't. Every alteration/addition/deletion is a potential code breakage a priory - aliasing+deprecation is in no way more dangerous than, let's say, new zipapp module. I in no way state that the changes should include a complete overhaul or be done in one go. But maybe we can at least start gradually introducing some naming changes to the oldest, most used and least changed (w.r.t. API) modules? The perfect example I think is collections module, but it's only from the personal experience - other modules may be better candidates. I can't say that I surely do not underestimate the efforts required. But what if I want to invest my time in it? If let's say I succeed and do everything according to the developer's guide and prove on a number of the most popular libraries that the changes indeed do not break anything - will be patch be considered or just thrown away with a note "we already discussed it"? Regards, Eugene On Mon, Aug 1, 2016 at 7:39 AM, Guido van Rossum <guido@python.org> wrote:

A lot of people will not want to invest time to rewrite their code to use the consistent stdlib because they will not see immediate/enough benefits. This mean you will need to keep the alias forever creating more confusion for users because they will have two function doing the same. Regards, Julien On Mon, Aug 1, 2016 at 12:05 PM Eugene Pakhomov <p1himik@gmail.com> wrote:

There's always a lot of people that don't want to make some changes. It doesn't mean that we have to keep every old piece at its place. We can keep deprecated names for a really long time, yes - maybe even until we introduce some required and definitely breaking change in the related module. But the names still will be deprecated and their use hence will be discouraged - there's no confusion at all. Regards, Eugene On Mon, Aug 1, 2016 at 5:41 PM, Julien Duponchelle <julien@gns3.net> wrote:

On 1 August 2016 at 11:05, Eugene Pakhomov <p1himik@gmail.com> wrote:
The motivation here is that there are many, many people with deployed code that uses the existing APIs and names. You are suggesting that we ask those people to modify their working code to use the new names - certainly you're proposing a deprecation period, but ultimately the proposal is to just have the new names. What benefit do you offer to those people to justify that work? "It's more consistent" isn't going to be compelling. Further, how will you support people such as library authors who need their code to work with multiple versions of Python? What timescale are you talking about here? Library authors are still having to support 2.7 at present, so until 2.7 is completely gone, *someone* will have to maintain code that uses both old and new names. So either your deprecation period is very long (with a consequent cost on the Python core developers, maintaining 2 names) or some library authors are left having to maintain their own compatibility code. Neither is attractive, so again, where's a practical, significant benefit? What's the benefit to book authors or trainers who find that their books/course materials are now out of date, and they are under pressure to produce a new version that's "up to date"? The Python core developers take their responsibility not to break their users' code without good reason very seriously. And from long experience, we know that we need to consider long timescales. That's not necessarily something we like (if we were writing things from scratch, we might well make different decisions) but it's part of the role of maintaining software that millions of people rely on, often for core aspects of their business.
You've had the answer a few times in this thread. The benefits have to outweigh the costs. Vague statements about "consistency" are not enough, you need to show concrete benefits and show how they improve things *for the people who have to change their code*. [From another post]
But the names still will be deprecated and their use hence will be discouraged - there's no confusion at all.
As long as both names are supported (even deprecated and discouraged names remain supported) the Python core developers will have to pay the cost - maintain compatibility wrappers, test both names, etc. How long do you expect the core devs to do that? Here - consider this. We rename collections.defaultdict to collections.DefaultDict (for whatever reason). So now collections.defaultdict must act the same as collections.DefaultDict. OK, so someone has the following relatively standard mock object type of pattern in their test suite: # Patch defaultdict for some reason collections.defaultdict = MyMockClass # Now call the code under test test_some_external_function() # Now check the results MyMockClass.showcallpattern() Now, suppose that the "external function" switches to use the name collections.DefaultDict. The above code will break, unless the two names defaultdict and DefaultDict are updated in parallel somehow to both point to MyMockClass. How do you propose we support that? And if your answer is that the user is making incorrect use of the stdlib, then you just broke their code. You have what you feel is a justification, but who gets to handle the bug report? Who gets to support a user whose production code needs a rewrite to work with Python 3.6? Or to support the author of the external_function() library who has angry users saying the new version broke code that uses it, even though the library author was following the new guidelines given in the Python documentation (I assume you'll be including documentation updates in your proposal)? Of course these are unlikely scenarios. I'm not trying to claim otherwise. But they are the sort of things that the core devs have to concern themselves with, and that's why the benefits need to justify a change like this. Paul

On 1 August 2016 at 13:45, Paul Moore <p.f.moore@gmail.com> wrote: than,
let's say, new zipapp module.
Double names maintenance is not really an issue in my opinion. Earlier this thread there was a valid point for aliasing, which from my perspective needs little maintenance if any, with no cost towards performance.
New books get written all the time, but 'getting it up-to-date' is not needed when aliases are put in place, with deprecation warnings for e.g. when support for 2.7 and <last not-renamed Python version> both are discontinued. That gives both authors and users enough time to adapt their relative works.
Especially on the long run people who get in touch with python will again and again find out that the names in python are not consistent. If this can be solved, why not do so? the
Compatebility wrappers only have to be of one type: make the content of package variable 'a' be the same as package variable 'A'. This can be done by allowing the containing structure of variables be pointed to by multiple variable names, which would allow for library maintainers to just insert the new name, and point it to the correct variable container.
This would be fixed with the 'aliasing variable names'-solution.
Changing supported Python versions will always change behaviour of someone's code. There's a nice statement about that on XKCD: (https://xkcd.com/1172/). Although I know this is also an extreme example, this does not mean that it is not correct: If there is a point in which we can make Python/stdlib easier to work with, then maybe we should do so to make things more consistent and easier to learn. -Matthias

On Mon, Aug 1, 2016 at 11:31 PM, Matthias welp <boekewurm@gmail.com> wrote:
Not sure I follow; are you proposing that module attributes be able to say "I'm the same as that guy over there"? That could be done with descriptor protocol (think @property, where you can write a getter/setter that has the actual value in a differently-named public attribute), but normally, modules don't allow that, as you need to mess with the class not the instance. But there have been numerous proposals to make that easier for module authors, one way or another. That would make at least some things easier - the mocking example would work that way - but it'd still mean people have to grok more than one name when reading code, and it'd most likely mess with people's expectations in tracebacks etc (the function isn't called what I thought it was called). Or is that not what you mean by aliasing? ChrisA

On 2 August 2016 at 00:23, Chris Angelico <rosuav@gmail.com> wrote:
One of them (making __class__ writable on module instances) was actually implemented in Python 3.5, but omitted from the What's New docs and given a relatively cryptic description in the NEWS file: http://bugs.python.org/issue27505 So if someone wanted to try their hand at documentation for that which is comprehensible to folks that aren't necessarily experts in: - the import system; - the metaclass machinery; and - the descriptor protocol I can review it. I'm just not currently sure where to start in writing it, as what mainly needs to be covered is how folks can *use* it to change the behaviour of module level attribute lookups, rather than the precise mechanics of how it works :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 1 August 2016 at 16:23, Chris Angelico <rosuav@gmail.com> wrote:
By aliasing I meant that the names of the fuctions/variables/ classes (variables) are all using the same value pointer/location. That could mean that in debugging the name of the variable is used as the name of the function. e.g. debugging get_int results in 'in get_int(), line 5' but it's original getint results in 'in getint(), line 5'. Another option is 'in get_int(), line 5 of getint()' if you want to retain the source location and name. It could be descriptors for Python library implementations, but in C it could be implemented as a pointer to <VariableContents> instead of a struct containing <VariableContents>, or it could compile(?) to use the same reference. I am not familiar enough with the structure of CPython and how it's variable lookups are built, but these are just a few ideas. -Matthias

On Mon, Aug 1, 2016, 08:36 Matthias welp <boekewurm@gmail.com> wrote:
And all of that requires work beyond simple aliasing by assignment. That means writing code to make this work as well as tests to make sure nothing breaks (on top of the documentation). Multiple core devs have now said why this isn't a technical problem. Nick has pointed out that unless someone does a study showing the new names would be worth the effort then the situation will not change. At this point the core devs have either muted this thread and thus you're not reaching them anymore or we are going to continue to give the same answer and we feel like we're repeating ourselves and this is a drain on our time. I know everyone involved on this thread means well (including you, Matthias), but do realize that not letting this topic go is a drain on people's time. Every email you write is taking the time of hundreds of people, so it's not just taking 10 minutes of my spare time to read and respond to this email while I'm on vacation (happy BC Day), but it's a minute for everyone else who is on this mailing list to simply decide what to do with it (you can't assume people can mute threads thanks to the variety of email clients out there). So every email sent to this list literally takes an accumulative time of hours from people to deal with. So when multiple core devs have given an answer and what it will take to change the situation then please accept that answer. Otherwise you run the risk of frustrating the core devs by making us feel like we're are not being listened to or trusted. And that is part of what leads to burnout for the core devs. -brett

At the risk of making a long thread even longer, one idea that would not involve any work on the part of core developers, but might still partially satisfy the proponents of this plan is to maintain your own shim modules. For example, nose is a (the de facto?) testing library for Python, and it provides pep8 aliases to most of unittest's commands. You could have your own similar project on pypi, e.g., "logging8" or something like that. I for one would use it instead of logging because I find the pep8 style "relaxing". Best, Neil On Monday, August 1, 2016 at 1:34:13 PM UTC-4, Brett Cannon wrote:

On 1 August 2016 at 20:05, Eugene Pakhomov <p1himik@gmail.com> wrote:
It was hardly all of the thoughts, or at least with little background information.
The most pertinent piece of background information is http://www.curiousefficiency.org/posts/2011/02/status-quo-wins-stalemate.htm... The number of changes we *could* make to Python as a language is theoretically unbounded. "Can you get current core developers sufficiently enthusiastic about your idea to encourage you to see it through to implementation?" is thus one of the main defenses the language has against churn for churn's sake (it's definitely not the only one, but it's still a hurdle that a lot of proposals fail to clear).
All at once, and the old names haven't been removed, and will likely remain supported indefinitely (there's nothing technically wrong with them, they just predate the introduction of the descriptor protocol and Python's adoption of snake_case as the preferred convention for method and attribute names, and instead use the older Java-style API with camelCase names and explicit getter and setter methods). The implementation issue is http://bugs.python.org/issue3042 but you'll need to click through to the actual applied patches to see the magnitude of the diffs (the patches Benjamin posted to the tracker were only partial ones to discuss the general approach). For a decent list of other renames that have largely been judged to have created more annoyance for existing users than was justified by any related increase in naming consistency, the six.moves compatibility module documentation includes a concise list of many of the renames that took place in the migration to Python 3: https://pythonhosted.org/six/#module-six.moves I've yet to hear a professional educator proclaim their delight at those renaming efforts, but I *have* received a concrete suggestion for improving the process next time we decide to start renaming things to improve "consistency": don't do it unless we have actual user experience testing results in hand to show that the new names are genuinely less confusing and easier to learn than the old ones.
That's the second great filter for language change proposals: is there at least one person (whether a current core developer or not) with the necessary time, energy and interest needed to implement the proposed change and present it for review?
Aside from adding new methods to existing classes (which may collide with added methods in third party subclasses) and the text mock example Paul cited, additions to existing modules almost never break things. By contrast, removals almost *always* break things and hence are generally only done these days when the old ways of doing things are actively harmful (e.g. the way the misdesigned contextlib.nested API encouraged resource leaks in end-user applications, or the blurred distinction between text and binary data in Python 2 encouraged mishandling of text data).
If it's a programmatic deprecation (rather than merely a documented one), then deprecation absolutely *does* force folks that have a "no warnings" policy for their test suites to update their code immediately. It's the main reason we prefer to only deprecate things when they're actively harmful, rather than just because they're no longer fashionable. While standard library module additions can indeed pose a compatibility challenge (due to the way module name shadowing works), the typical worst case scenario there is just needing some explicit sys.path manipulation to override the stdlib version specifically for affected applications, and even that only arises if somebody has an existing internal package name that collides with the new stdlib one (we try to avoid colliding with names on PyPI).
That's a different proposal from wholesale name changes, as it allows each change to be discussed on its individual merits rather than attempting to establish a blanket change in development policy.
The perfect example I think is collections module, but it's only from the personal experience - other modules may be better candidates.
collections has the challenge that the normal PEP 8 guidelines don't apply to the builtins, so some collections types are named like builtins (deque, defaultdict), while others follow normal class naming conventions (e.g. OrderedDict). There are also type factories like namedtuple, which again follow the builtin convention of omitting underscores from typically snake_case names.
I can't say that I surely do not underestimate the efforts required. But what if I want to invest my time in it?
It's not contributor time or even core developer time that's the main problem in cases like this, it's the flow on effect on books, tutorials, and other learning resources. When folks feel obliged to update those, we want them to able to say to themselves "Yes, the new way of doing things is clearly better for my students and readers than the old way". When we ask other people to spend time on something, the onus is on us to make sure that at least we believe their time will be well spent in the long run (even if we're not 100% successful in convincing them of that in the near term). If even we don't think that's going to be the case, then the onus is on us to avoid wasting their time in the first place. We're never going to be 100% successful in that (we're always going to approve some non-zero number of changes that, with the benefit of hindsight, turn out to have been more trouble than they were worth), but it's essential that we keep the overall cost of change to the entire ecosystem in mind when reviewing proposals, rather than taking a purely local view of the costs and benefits at the initial implementation level.
It would depend on the specific change you propose, and the rationale you give for making it. If the only rationale given is "Make <API> more compliant with PEP 8", then it will almost certainly be knocked back. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Thank you very much for the links and the new insights, especially regarding actual user experience with new names. A good reason to start gathering actual statistics. You have made a very good point, I will not reason about naming further without a significant amount of data showing that it still may be worth it. Regards, Eugene On Mon, Aug 1, 2016 at 7:51 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On Mon, Aug 1, 2016 at 12:05 PM, Eugene Pakhomov <p1himik@gmail.com> wrote:
Python 3 already broke quite a lot of stuff "just to be better/more-consistent" and we're still into a state where there are many people stuck with Python 2.7 because the migration cost is considered too high or just not worth it. Introducing such a change would increase this cost and make the two Python versions (2 and 3) even more different. It also has a mnemonic cost because you double the size of API names you'll have to remember for virtually zero additional value other than the fact that the names are more consistent. I understand the rationale but I think any proposal making Python 3 more incompatible with python 2 should have a *very* huge barrier in terms of acceptance.
-- Giampaolo - http://grodola.blogspot.com

On Mon, 25 Jul 2016 17:55:27 -0000, Ralph Broenink <ralph@ralphbroenink.net> wrote:
Right. We tried it. It didn't work out very well (really bad cost/benefit ratio), so we stopped.[*] --David [*] At least, that's my understanding, I wasn't actually around yet when then threading changes were made.

On 25 Jul 2016, at 19:55, Ralph Broenink <ralph@ralphbroenink.net> wrote:
* Names that do not reflect actual usage, such as ssl.PROTOCOL_SSLv23, which can in fact not be used as client for SSLv2.
PROTOCOL_SSLv23 *can* be used as a client for SSLv2, in some circumstances. If you want to fight about names, then we should talk about the fact that PROTOCOL_SSLv23 means “use a SSLv3 handshake and then negotiate the highest commonly supported TLS version”. However, this naming scheme comes from OpenSSL (what we call PROTOCOL_SSLv23 OpenSSL calls SSLv23_METHOD, and so on). This naming scheme *is* terrible, but the stdlib ssl module is really more appropriately called the openssl module: it exposes a substantial number of OpenSSL internals and names. Happily, OpenSSL is changing this name to TLS_METHOD, which is what it should have been called all along. Cory

On 25 July 2016 at 18:55, Ralph Broenink <ralph@ralphbroenink.net> wrote:
It sounds to me like rather a lot of effort, for limited benefit. I suspect few people will actually start using the new names (as they likely have to continue supporting older versions of Python for a long time yet) and so the benefit will actually be lower than we'd like. Also, you'd need to put a lot of effort into updating the tests - should they use the new or the old names, and how would they test that both names behave identically? So there's probably more effort than you'd think. Also, there's a lot of tutorial material - books, courses, websites - that would need to change. So overall, I think practicality beats purity here, and such a change is not worth the effort. Paul

On 26 July 2016 at 05:23, Paul Moore <p.f.moore@gmail.com> wrote:
So overall, I think practicality beats purity here, and such a change is not worth the effort.
We did go through the effort for the threading module, as we had the problem there that multiprocessing was PEP-8 compliant (with aliases for threading compatibility), but threading only had the old pre-PEP-8 names. A couple of the conclusions that came out of that were: - it was probably worth it in threading's case due to the improved alignment with multiprocessing - it isn't worth the disruption in the general case The "isn't worth it" mainly comes from the fact that these APIs generally *were* compliant with the coding guidelines that existed at the time they were first written, it's just that the guidelines and community expectations have changed since then, so they represent things like "good Python design style circa 1999" (e.g. unittest, logging) rather than "good Python style for today". So if you say "let's update them to 2016 conventions" today, by 2026 you'll just have the same problem again. It ends up being one of those cases where "invest development time now to reduce cognitive burden later" doesn't actually work in practice, as you have to keep the old API around anyway if you don't want to break working code, and that means future learners now have two APIs to learn instead of one (and if you refresh the API to the design du jour ever decade or so, that number keeps going up). Instead, for these older parts of the standard library, it's useful to view them as "interoperability plumbing" (e.g. logging for event stream reporting and management, unittest for test case reporting and management), and look for more modern version independent 3rd party facades if you find the mix of API design eras in the standard library annoying (e.g. pytest as an alternative to raw unittest, structlog over direct use of the logging module) Cheers, Nick. P.S. I sometimes wonder if you could analyse Python API designs over time and group them into eras the way people do with building architectural styles - "This API is a fine example of the stylistic leanings of Python's Java period, which ran from <year> to <year>. As the community moved into the subsequent Haskell period, composition came to be favoured over inheritance, leading to the emergence of alternative APIs like X, Y, and Z, of which, Y ultimately proved to be most popular" :) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
Is that true? Did we really actively advise people to use Java-style APIs in some cases, or was it just that nobody told them otherwise?
You're assuming that the conventions will change just as much in the next 10 years as they did in the last 10. I don't think that's likely -- I would hope we're converging on a set of conventions that's good enough to endure.
It would be sad if Python's motto became "Batteries included (as long as you're happy with Leclanche cells; if you want anything more modern you'll have to look elsewhere)". -- Greg

On 26 July 2016 at 16:53, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Conventions can really only remain stable if their environment doesn't change, and the software development world isn't that much more stable now than it was 20 years ago.
New libraries can still make their way into the standard set, but it makes more sense to do that as *actual new libraries*, rather than simply PEP-8-ifying old ones. It's relatively easy to manage the cognitive burden of argparse vs optparse vs getopt, for example - you know which one people are using based on what they import, and they have different names so it's easy to talk about the trade-offs between them and tell people which one is recommended for new code. Ditto for asyncio vs asynchat and asyncore. We even have a section in the docs specifically for modules that are still around for backwards compatibility, but we don't recommend to new users: https://docs.python.org/3/library/superseded.html So evolving the standard library by introducing new ways of doing things as the community learns better alternatives, and requiring those new ways to be compliant with PEP 8 at the time they're standardised *absolutely* makes sense. The only part I don't think makes sense is modernising the APIs of existing modules purely for the sake of modernising them - there needs to be a stronger justification for expending that much effort (as there was with properly aligning the threading and multiprocessing APIs). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 7/26/16, Nick Coghlan <ncoghlan@gmail.com> wrote:
How could snake survive if not moulting regularly, when old skin is outgrown? I am not argue with you! Just thinking about future of this language and see some analogies in biology. And sorry just could not resist to share this little idea! :)

Ralph, You seem to be vastly underestimating the cost of making backwards incompatible changes like these, while sounding naively optimistic about the value of consistency. Please understand the second section heading of PEP 8. --Guido On Mon, Jul 25, 2016 at 10:55 AM, Ralph Broenink <ralph@ralphbroenink.net> wrote:
-- --Guido van Rossum (python.org/~guido)

I've pined for this (and feel a real mental pain every time I use one of those poorlyCased names)-- I end up using a lot of mental space remembering exactly HOW each stdlib isn't consistent. Aliasing consistent names in each case seems like a real win all around, personally. On Mon, Jul 25, 2016 at 10:55 AM, Ralph Broenink <ralph@ralphbroenink.net> wrote:

On Mon, 25 Jul 2016 at 13:03 Mark Mollineaux <bufordsharkley@gmail.com> wrote:
For those that want consistent names, you could create a PyPI package that is nothing more than the aliased names as suggested. Otherwise I get the desire for consistency, but as pointed out by a bunch of other core devs, we have thought about this many times and always reach the same conclusion that the amount of work and potential code breakage is too great. -Brett

I'm a bit sad that I'm clearly on the loosing side of the argument. I now believe that I must be grossly underestimating the amount of effort and overestimating the potential gain. However, I still feel that we should strive for consistency in the long run. I do not propose to do this at once, but I feel that at least some collaborated effort would be nice. (If not only for this kind of mail threads.) If I would start an effort to - for instance - 'fix' some camelCased modules, and attempt to make it 100% backwards compatible, including tests, would there be any chance it could be merged at some point? Otherwise, I feel it would be a totally pointless effort ;). On Tue, 26 Jul 2016 at 18:29 Brett Cannon <brett@python.org> wrote:

I'm on Ralph's side here. "Why is this thing named the other way?" was one of the first questions I asked. And people whom I occasionally teach about Python, ask the same question over and over again. Code breakage happens (PEP 3151 - didn't know about it till it almost bit my leg off), so we can't shy away from it completely. Is there any link on the previous thoughts of the core devs on the matter? Especially regarding the amount of potential code breakage. I'm genuinely interested, as I think that this amount is negligible if the new names will be gradually introduced along with a deprecation notice on (end eventual removal of) the old ones. As far as I can see, it can only do some harm if someone uses a discouraged "import *" or monkey-patches some new methods into Python standard classes/modules, and updates his Python installation. Regards, Eugene On Mon, Aug 1, 2016 at 1:46 AM, Ralph Broenink <ralph@ralphbroenink.net> wrote:

It was hardly all of the thoughts, or at least with little background information. E.g. "multiprocessing and threading" - I can't find any information on it. Was it done in one go or gradually by introducing new aliases and deprecating old names, at it has been suggested by you (although it was 7 years ago)? The suggestion about aliases+deprecation has been made for other modules, but sadly it ended up with you saying "let's do it this way" - with nothing following. Or "you have to keep the old API around" - what is the exact motivation behind it? If it's "so the code doesn't break", then it's a strange motivation, because as I said, code gets [potentially] broken more often that it doesn't. Every alteration/addition/deletion is a potential code breakage a priory - aliasing+deprecation is in no way more dangerous than, let's say, new zipapp module. I in no way state that the changes should include a complete overhaul or be done in one go. But maybe we can at least start gradually introducing some naming changes to the oldest, most used and least changed (w.r.t. API) modules? The perfect example I think is collections module, but it's only from the personal experience - other modules may be better candidates. I can't say that I surely do not underestimate the efforts required. But what if I want to invest my time in it? If let's say I succeed and do everything according to the developer's guide and prove on a number of the most popular libraries that the changes indeed do not break anything - will be patch be considered or just thrown away with a note "we already discussed it"? Regards, Eugene On Mon, Aug 1, 2016 at 7:39 AM, Guido van Rossum <guido@python.org> wrote:

A lot of people will not want to invest time to rewrite their code to use the consistent stdlib because they will not see immediate/enough benefits. This mean you will need to keep the alias forever creating more confusion for users because they will have two function doing the same. Regards, Julien On Mon, Aug 1, 2016 at 12:05 PM Eugene Pakhomov <p1himik@gmail.com> wrote:

There's always a lot of people that don't want to make some changes. It doesn't mean that we have to keep every old piece at its place. We can keep deprecated names for a really long time, yes - maybe even until we introduce some required and definitely breaking change in the related module. But the names still will be deprecated and their use hence will be discouraged - there's no confusion at all. Regards, Eugene On Mon, Aug 1, 2016 at 5:41 PM, Julien Duponchelle <julien@gns3.net> wrote:

On 1 August 2016 at 11:05, Eugene Pakhomov <p1himik@gmail.com> wrote:
The motivation here is that there are many, many people with deployed code that uses the existing APIs and names. You are suggesting that we ask those people to modify their working code to use the new names - certainly you're proposing a deprecation period, but ultimately the proposal is to just have the new names. What benefit do you offer to those people to justify that work? "It's more consistent" isn't going to be compelling. Further, how will you support people such as library authors who need their code to work with multiple versions of Python? What timescale are you talking about here? Library authors are still having to support 2.7 at present, so until 2.7 is completely gone, *someone* will have to maintain code that uses both old and new names. So either your deprecation period is very long (with a consequent cost on the Python core developers, maintaining 2 names) or some library authors are left having to maintain their own compatibility code. Neither is attractive, so again, where's a practical, significant benefit? What's the benefit to book authors or trainers who find that their books/course materials are now out of date, and they are under pressure to produce a new version that's "up to date"? The Python core developers take their responsibility not to break their users' code without good reason very seriously. And from long experience, we know that we need to consider long timescales. That's not necessarily something we like (if we were writing things from scratch, we might well make different decisions) but it's part of the role of maintaining software that millions of people rely on, often for core aspects of their business.
You've had the answer a few times in this thread. The benefits have to outweigh the costs. Vague statements about "consistency" are not enough, you need to show concrete benefits and show how they improve things *for the people who have to change their code*. [From another post]
But the names still will be deprecated and their use hence will be discouraged - there's no confusion at all.
As long as both names are supported (even deprecated and discouraged names remain supported) the Python core developers will have to pay the cost - maintain compatibility wrappers, test both names, etc. How long do you expect the core devs to do that? Here - consider this. We rename collections.defaultdict to collections.DefaultDict (for whatever reason). So now collections.defaultdict must act the same as collections.DefaultDict. OK, so someone has the following relatively standard mock object type of pattern in their test suite: # Patch defaultdict for some reason collections.defaultdict = MyMockClass # Now call the code under test test_some_external_function() # Now check the results MyMockClass.showcallpattern() Now, suppose that the "external function" switches to use the name collections.DefaultDict. The above code will break, unless the two names defaultdict and DefaultDict are updated in parallel somehow to both point to MyMockClass. How do you propose we support that? And if your answer is that the user is making incorrect use of the stdlib, then you just broke their code. You have what you feel is a justification, but who gets to handle the bug report? Who gets to support a user whose production code needs a rewrite to work with Python 3.6? Or to support the author of the external_function() library who has angry users saying the new version broke code that uses it, even though the library author was following the new guidelines given in the Python documentation (I assume you'll be including documentation updates in your proposal)? Of course these are unlikely scenarios. I'm not trying to claim otherwise. But they are the sort of things that the core devs have to concern themselves with, and that's why the benefits need to justify a change like this. Paul

On 1 August 2016 at 13:45, Paul Moore <p.f.moore@gmail.com> wrote: than,
let's say, new zipapp module.
Double names maintenance is not really an issue in my opinion. Earlier this thread there was a valid point for aliasing, which from my perspective needs little maintenance if any, with no cost towards performance.
New books get written all the time, but 'getting it up-to-date' is not needed when aliases are put in place, with deprecation warnings for e.g. when support for 2.7 and <last not-renamed Python version> both are discontinued. That gives both authors and users enough time to adapt their relative works.
Especially on the long run people who get in touch with python will again and again find out that the names in python are not consistent. If this can be solved, why not do so? the
Compatebility wrappers only have to be of one type: make the content of package variable 'a' be the same as package variable 'A'. This can be done by allowing the containing structure of variables be pointed to by multiple variable names, which would allow for library maintainers to just insert the new name, and point it to the correct variable container.
This would be fixed with the 'aliasing variable names'-solution.
Changing supported Python versions will always change behaviour of someone's code. There's a nice statement about that on XKCD: (https://xkcd.com/1172/). Although I know this is also an extreme example, this does not mean that it is not correct: If there is a point in which we can make Python/stdlib easier to work with, then maybe we should do so to make things more consistent and easier to learn. -Matthias

On Mon, Aug 1, 2016 at 11:31 PM, Matthias welp <boekewurm@gmail.com> wrote:
Not sure I follow; are you proposing that module attributes be able to say "I'm the same as that guy over there"? That could be done with descriptor protocol (think @property, where you can write a getter/setter that has the actual value in a differently-named public attribute), but normally, modules don't allow that, as you need to mess with the class not the instance. But there have been numerous proposals to make that easier for module authors, one way or another. That would make at least some things easier - the mocking example would work that way - but it'd still mean people have to grok more than one name when reading code, and it'd most likely mess with people's expectations in tracebacks etc (the function isn't called what I thought it was called). Or is that not what you mean by aliasing? ChrisA

On 2 August 2016 at 00:23, Chris Angelico <rosuav@gmail.com> wrote:
One of them (making __class__ writable on module instances) was actually implemented in Python 3.5, but omitted from the What's New docs and given a relatively cryptic description in the NEWS file: http://bugs.python.org/issue27505 So if someone wanted to try their hand at documentation for that which is comprehensible to folks that aren't necessarily experts in: - the import system; - the metaclass machinery; and - the descriptor protocol I can review it. I'm just not currently sure where to start in writing it, as what mainly needs to be covered is how folks can *use* it to change the behaviour of module level attribute lookups, rather than the precise mechanics of how it works :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 1 August 2016 at 16:23, Chris Angelico <rosuav@gmail.com> wrote:
By aliasing I meant that the names of the fuctions/variables/ classes (variables) are all using the same value pointer/location. That could mean that in debugging the name of the variable is used as the name of the function. e.g. debugging get_int results in 'in get_int(), line 5' but it's original getint results in 'in getint(), line 5'. Another option is 'in get_int(), line 5 of getint()' if you want to retain the source location and name. It could be descriptors for Python library implementations, but in C it could be implemented as a pointer to <VariableContents> instead of a struct containing <VariableContents>, or it could compile(?) to use the same reference. I am not familiar enough with the structure of CPython and how it's variable lookups are built, but these are just a few ideas. -Matthias

On Mon, Aug 1, 2016, 08:36 Matthias welp <boekewurm@gmail.com> wrote:
And all of that requires work beyond simple aliasing by assignment. That means writing code to make this work as well as tests to make sure nothing breaks (on top of the documentation). Multiple core devs have now said why this isn't a technical problem. Nick has pointed out that unless someone does a study showing the new names would be worth the effort then the situation will not change. At this point the core devs have either muted this thread and thus you're not reaching them anymore or we are going to continue to give the same answer and we feel like we're repeating ourselves and this is a drain on our time. I know everyone involved on this thread means well (including you, Matthias), but do realize that not letting this topic go is a drain on people's time. Every email you write is taking the time of hundreds of people, so it's not just taking 10 minutes of my spare time to read and respond to this email while I'm on vacation (happy BC Day), but it's a minute for everyone else who is on this mailing list to simply decide what to do with it (you can't assume people can mute threads thanks to the variety of email clients out there). So every email sent to this list literally takes an accumulative time of hours from people to deal with. So when multiple core devs have given an answer and what it will take to change the situation then please accept that answer. Otherwise you run the risk of frustrating the core devs by making us feel like we're are not being listened to or trusted. And that is part of what leads to burnout for the core devs. -brett

At the risk of making a long thread even longer, one idea that would not involve any work on the part of core developers, but might still partially satisfy the proponents of this plan is to maintain your own shim modules. For example, nose is a (the de facto?) testing library for Python, and it provides pep8 aliases to most of unittest's commands. You could have your own similar project on pypi, e.g., "logging8" or something like that. I for one would use it instead of logging because I find the pep8 style "relaxing". Best, Neil On Monday, August 1, 2016 at 1:34:13 PM UTC-4, Brett Cannon wrote:

On 1 August 2016 at 20:05, Eugene Pakhomov <p1himik@gmail.com> wrote:
It was hardly all of the thoughts, or at least with little background information.
The most pertinent piece of background information is http://www.curiousefficiency.org/posts/2011/02/status-quo-wins-stalemate.htm... The number of changes we *could* make to Python as a language is theoretically unbounded. "Can you get current core developers sufficiently enthusiastic about your idea to encourage you to see it through to implementation?" is thus one of the main defenses the language has against churn for churn's sake (it's definitely not the only one, but it's still a hurdle that a lot of proposals fail to clear).
All at once, and the old names haven't been removed, and will likely remain supported indefinitely (there's nothing technically wrong with them, they just predate the introduction of the descriptor protocol and Python's adoption of snake_case as the preferred convention for method and attribute names, and instead use the older Java-style API with camelCase names and explicit getter and setter methods). The implementation issue is http://bugs.python.org/issue3042 but you'll need to click through to the actual applied patches to see the magnitude of the diffs (the patches Benjamin posted to the tracker were only partial ones to discuss the general approach). For a decent list of other renames that have largely been judged to have created more annoyance for existing users than was justified by any related increase in naming consistency, the six.moves compatibility module documentation includes a concise list of many of the renames that took place in the migration to Python 3: https://pythonhosted.org/six/#module-six.moves I've yet to hear a professional educator proclaim their delight at those renaming efforts, but I *have* received a concrete suggestion for improving the process next time we decide to start renaming things to improve "consistency": don't do it unless we have actual user experience testing results in hand to show that the new names are genuinely less confusing and easier to learn than the old ones.
That's the second great filter for language change proposals: is there at least one person (whether a current core developer or not) with the necessary time, energy and interest needed to implement the proposed change and present it for review?
Aside from adding new methods to existing classes (which may collide with added methods in third party subclasses) and the text mock example Paul cited, additions to existing modules almost never break things. By contrast, removals almost *always* break things and hence are generally only done these days when the old ways of doing things are actively harmful (e.g. the way the misdesigned contextlib.nested API encouraged resource leaks in end-user applications, or the blurred distinction between text and binary data in Python 2 encouraged mishandling of text data).
If it's a programmatic deprecation (rather than merely a documented one), then deprecation absolutely *does* force folks that have a "no warnings" policy for their test suites to update their code immediately. It's the main reason we prefer to only deprecate things when they're actively harmful, rather than just because they're no longer fashionable. While standard library module additions can indeed pose a compatibility challenge (due to the way module name shadowing works), the typical worst case scenario there is just needing some explicit sys.path manipulation to override the stdlib version specifically for affected applications, and even that only arises if somebody has an existing internal package name that collides with the new stdlib one (we try to avoid colliding with names on PyPI).
That's a different proposal from wholesale name changes, as it allows each change to be discussed on its individual merits rather than attempting to establish a blanket change in development policy.
The perfect example I think is collections module, but it's only from the personal experience - other modules may be better candidates.
collections has the challenge that the normal PEP 8 guidelines don't apply to the builtins, so some collections types are named like builtins (deque, defaultdict), while others follow normal class naming conventions (e.g. OrderedDict). There are also type factories like namedtuple, which again follow the builtin convention of omitting underscores from typically snake_case names.
I can't say that I surely do not underestimate the efforts required. But what if I want to invest my time in it?
It's not contributor time or even core developer time that's the main problem in cases like this, it's the flow on effect on books, tutorials, and other learning resources. When folks feel obliged to update those, we want them to able to say to themselves "Yes, the new way of doing things is clearly better for my students and readers than the old way". When we ask other people to spend time on something, the onus is on us to make sure that at least we believe their time will be well spent in the long run (even if we're not 100% successful in convincing them of that in the near term). If even we don't think that's going to be the case, then the onus is on us to avoid wasting their time in the first place. We're never going to be 100% successful in that (we're always going to approve some non-zero number of changes that, with the benefit of hindsight, turn out to have been more trouble than they were worth), but it's essential that we keep the overall cost of change to the entire ecosystem in mind when reviewing proposals, rather than taking a purely local view of the costs and benefits at the initial implementation level.
It would depend on the specific change you propose, and the rationale you give for making it. If the only rationale given is "Make <API> more compliant with PEP 8", then it will almost certainly be knocked back. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Thank you very much for the links and the new insights, especially regarding actual user experience with new names. A good reason to start gathering actual statistics. You have made a very good point, I will not reason about naming further without a significant amount of data showing that it still may be worth it. Regards, Eugene On Mon, Aug 1, 2016 at 7:51 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On Mon, Aug 1, 2016 at 12:05 PM, Eugene Pakhomov <p1himik@gmail.com> wrote:
Python 3 already broke quite a lot of stuff "just to be better/more-consistent" and we're still into a state where there are many people stuck with Python 2.7 because the migration cost is considered too high or just not worth it. Introducing such a change would increase this cost and make the two Python versions (2 and 3) even more different. It also has a mnemonic cost because you double the size of API names you'll have to remember for virtually zero additional value other than the fact that the names are more consistent. I understand the rationale but I think any proposal making Python 3 more incompatible with python 2 should have a *very* huge barrier in terms of acceptance.
-- Giampaolo - http://grodola.blogspot.com

On Mon, 25 Jul 2016 17:55:27 -0000, Ralph Broenink <ralph@ralphbroenink.net> wrote:
Right. We tried it. It didn't work out very well (really bad cost/benefit ratio), so we stopped.[*] --David [*] At least, that's my understanding, I wasn't actually around yet when then threading changes were made.

On 25 Jul 2016, at 19:55, Ralph Broenink <ralph@ralphbroenink.net> wrote:
* Names that do not reflect actual usage, such as ssl.PROTOCOL_SSLv23, which can in fact not be used as client for SSLv2.
PROTOCOL_SSLv23 *can* be used as a client for SSLv2, in some circumstances. If you want to fight about names, then we should talk about the fact that PROTOCOL_SSLv23 means “use a SSLv3 handshake and then negotiate the highest commonly supported TLS version”. However, this naming scheme comes from OpenSSL (what we call PROTOCOL_SSLv23 OpenSSL calls SSLv23_METHOD, and so on). This naming scheme *is* terrible, but the stdlib ssl module is really more appropriately called the openssl module: it exposes a substantial number of OpenSSL internals and names. Happily, OpenSSL is changing this name to TLS_METHOD, which is what it should have been called all along. Cory
participants (18)
-
Brett Cannon
-
Chris Angelico
-
Cory Benfield
-
Eugene Pakhomov
-
Giampaolo Rodola'
-
Greg Ewing
-
Guido van Rossum
-
Jelle Zijlstra
-
Julien Duponchelle
-
Mark Mollineaux
-
Matthias welp
-
Neil Girdhar
-
Nick Coghlan
-
Paul Moore
-
Pavol Lisy
-
R. David Murray
-
Ralph Broenink
-
Serhiy Storchaka