Defining an easily installable "Recommended baseline package set"

Over on python-dev, the question of recommending MRAB's "regex" module over the standard library's "re" module for more advanced regular expressions recently came up again. Because of various logistical issues and backwards compatibility risks, it's highly unlikely that we'll ever be able to swap out the current _sre based re module implementation in the standard library for an implementation based on the regex module. At the same time, it would be beneficial to have a way to offer an even stronger recommendation to redistributors that we think full-featured general purpose Python scripting environments should offer the regex module as an opt-in alternative to the baseline re feature set, since that would also help with other currently difficult cases like the requests module. What I'm thinking is that we could make some relatively simple additions to the `ensurepip` and `venv` modules to help with this: 1. Add a ensurepip.RECOMMENDED_PACKAGES mapping keyed by standard library module names containing dependency specifiers for recommended third party packages for particular tasks (e.g. "regex" as an enhanced alternative to "re", "requests" as an enhanced HTTPS-centric alternative to "urllib.request") 2. Add a new `install_recommended` boolean flag to ensurepip.bootstrap 3. Add a corresponding `--install-recommended flag to the `python -m ensurepip` CLI 4. Add a corresponding `--install-recommended flag to the `python -m venv` CLI (when combined with `--without-pip`, this would run pip directly from the bundled wheel file to do the installations) We'd also need either a new informational PEP or else a section in the developer guide to say that the contents of `ensurepip.RECOMMENDED_PACKAGES` are up to the individual module maintainers (hence keying the mapping by standard library module name, rather than having a single flat list for the entire standard library). For redistributors with weak dependency support, these reference interpreter level recommendations could become redistributor level recommendations. Redistributors without weak dependency support could still make a distinction between "default" installations (which would include them) and "minimal" installations (which would exclude them). Folks writing scripts and example code for independent distribution (i.e. no explicitly declared dependencies) could then choose between relying on just the standard library (as now), or on the standard library plus independently versioned recommended packages. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

At this point, I would punt to distutils-sig about curated packages, and pip tooling to support that, but they are bogged down as it stands with just getting warehouse up and running. I don’t support putting specialized tooling in python itself to install curated packages, because that curation would be on the same release schedule as python itself. Pip is an important special case, but it’s a special case to avoid further special cases. If there was excess manpower on the packaging side, that’s where it should be developed. From: Python-ideas [mailto:python-ideas-bounces+tritium-list=sdamon.com@python.org] On Behalf Of Guido van Rossum Sent: Sunday, October 29, 2017 1:17 AM To: Nick Coghlan <ncoghlan@gmail.com> Cc: python-ideas@python.org Subject: Re: [Python-ideas] Defining an easily installable "Recommended baseline package set" Why? What's wrong with pip install? Why complicate things? Your motivation is really weak here. "beneficial"? "difficult cases"? On Sat, Oct 28, 2017 at 8:57 PM, Nick Coghlan <ncoghlan@gmail.com <mailto:ncoghlan@gmail.com> > wrote: Over on python-dev, the question of recommending MRAB's "regex" module over the standard library's "re" module for more advanced regular expressions recently came up again. Because of various logistical issues and backwards compatibility risks, it's highly unlikely that we'll ever be able to swap out the current _sre based re module implementation in the standard library for an implementation based on the regex module. At the same time, it would be beneficial to have a way to offer an even stronger recommendation to redistributors that we think full-featured general purpose Python scripting environments should offer the regex module as an opt-in alternative to the baseline re feature set, since that would also help with other currently difficult cases like the requests module. What I'm thinking is that we could make some relatively simple additions to the `ensurepip` and `venv` modules to help with this: 1. Add a ensurepip.RECOMMENDED_PACKAGES mapping keyed by standard library module names containing dependency specifiers for recommended third party packages for particular tasks (e.g. "regex" as an enhanced alternative to "re", "requests" as an enhanced HTTPS-centric alternative to "urllib.request") 2. Add a new `install_recommended` boolean flag to ensurepip.bootstrap 3. Add a corresponding `--install-recommended flag to the `python -m ensurepip` CLI 4. Add a corresponding `--install-recommended flag to the `python -m venv` CLI (when combined with `--without-pip`, this would run pip directly from the bundled wheel file to do the installations) We'd also need either a new informational PEP or else a section in the developer guide to say that the contents of `ensurepip.RECOMMENDED_PACKAGES` are up to the individual module maintainers (hence keying the mapping by standard library module name, rather than having a single flat list for the entire standard library). For redistributors with weak dependency support, these reference interpreter level recommendations could become redistributor level recommendations. Redistributors without weak dependency support could still make a distinction between "default" installations (which would include them) and "minimal" installations (which would exclude them). Folks writing scripts and example code for independent distribution (i.e. no explicitly declared dependencies) could then choose between relying on just the standard library (as now), or on the standard library plus independently versioned recommended packages. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com <mailto:ncoghlan@gmail.com> | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org <mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -- --Guido van Rossum (python.org/~guido <http://python.org/~guido> )

There are already curated lists of Python packages, such as: https://github.com/vinta/awesome-python Even better, if you don't agree, you can just clone it and create your own ;-) Stephan 2017-10-29 6:46 GMT+01:00 Alex Walters <tritium-list@sdamon.com>:

On 29 October 2017 at 15:16, Guido van Rossum <guido@python.org> wrote:
Why? What's wrong with pip install?
At a technical level, this would just be a really thin wrapper around 'pip install' (even thinner than ensurepip in general, since these libraries *wouldn't* be bundled for offline installation, only listed by name).
Why complicate things? Your motivation is really weak here. "beneficial"? "difficult cases"?
The main recurring problems with "pip install" are a lack of discoverability and a potential lack of availability (depending on the environment). This then causes a couple of key undesirable outcomes: - folks using Python as a teaching language have to choose between teaching with just the standard library APIs, requiring that learners restrict themselves to a particular preconfigured learning environment, or make a detour into package management tools in order to ensure learners have access to the APIs they actually want to use (this isn't hypothetical - I was a technical reviewer for a book that justified teaching XML-RPC over HTTPS+JSON on the basis that xmlrpc was in the standard library, and requests wasn't) - folks using Python purely as a scripting language (i.e without app level dependency management) may end up having to restrict themselves to the standard library API, even when there's a well-established frequently preferred alternative for what they're doing (e.g. requests for API management, regex for enhanced regular expressions) The underlying problem is that our reasons for omitting these particular libraries from the standard library relate mainly to publisher side concerns like the logistics of ongoing bug fixing and support, *not* end user concerns like software reliability or API usability. This means that if educators aren't teaching them, or redistributors aren't providing them, then they're actively doing their users a disservice (as opposed to other cases like web frameworks and similar, where there are multiple competing options, you're only going to want one of them in any given application, and the relevant trade-offs between the available options depend greatly on exactly what you're doing) Now, the Python-for-data-science community have taken a particular direction around handling this, and there's an additional library set beyond the standard library that's pretty much taken for granted in a data science context. While conda has been the focal point for those efforts more recently, it started a long time ago with initiatives like Python(x, y) and the Enthought Python Distribution. Similarly, initiatives like Raspberry Pi are able to assume a particular learning environment (Raspbian in the Pi's case), rather than coping with arbitrary starting points. Curated lists like the "awesome-python" one that Stephan linked don't really help that much with the discoverability problem, since they become just another thing for people to learn: How do they find out such lists exist in the first place? Given such a list, how do they determine if the recommendations it offers are actually relevant to their needs? Since assessing a published package API against your needs as a user is a skill that has to be learned like any other, it can be a lot easier to get started in a more prescriptive environment that says "This is what you have to work with for now, we'll explain more about your options for branching out later". The proposal in this thread thus stems from asking the question "Who is going to be best positioned to offer authoritative advice on which third party modules may be preferable to their standard library counterparts for end users of Python?" and answering it with "The standard library module maintainers that are already responsible for deciding whether or not to place appropriate See Also links in the module documentation". All the proposal does is to suggest taking those existing recommendations from the documentation and converting them into a more readibly executable form. I'm not particularly wedded to any particular approach to making the recommendations available in a more machine-friendly form, though - it's just the "offer something more machine friendly than scraping the docs for recommendation links" aspect that I'm interested in. For example, we could skip touching ensurepip or venv at all, and instead limit this to a documentation proposal to collect these recommendations from the documentation, and publish them within the `venv` module docs as a "recommended-libraries.txt" file (using pip's requirements.txt format). That would be sufficient to allow straightforward 3rd party automation, without necessarily committing to providing such automation ourselves. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, 29 Oct 2017 17:54:22 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
They're both really. One important consequence of a library being in the stdlib is to tie it to the stdlib's release cycle, QA infrastructure and compatibility requirements -- which more or less solves many dependency and/or version pinning headaches.
Which redistributors do not provide the requests library, for example? regex is probably not as popular (mostly because re is good enough for most purposes), but it still appears to be available from Ubuntu and Anaconda.
I'm curious what such a list looks like :-) Regards Antoine.

On 29 October 2017 at 09:51, Antoine Pitrou <solipsis@pitrou.net> wrote:
I know it's not what you meant, but "the python.org installers" is the obvious answer here. On Windows, if you say to someone "install Python", they get the python.org distribution. Explicitly directing them to Anaconda is an option, but that gives them a distinctly different experience than "standard Python plus some best of breed packages like requests" does.
I am also. I'd put requests on it immediately, but that's the only thing I consider obvious. regex is what triggered this, but I'm not sure it adds *that* much - it's a trade off between people who need the extra features and people confused over why we have two regex libraries available. After that, you're almost immediately into domain-specific answers, and it becomes tricky fast. Paul

On Sun, Oct 29, 2017 at 1:44 PM, Paul Moore <p.f.moore@gmail.com> wrote:
If the list is otherwise not be long enough, maybe it will contain something that is now in the stdlib?-) -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

On Sun, 29 Oct 2017 11:44:52 +0000 Paul Moore <p.f.moore@gmail.com> wrote:
Well, I'm not sure what Nick meant by the word exactly, but "to redistribute" means "to distribute again", so the word looked aimed at third-party vendors of Python rather than python.org :-) Regards Antoine.

On 29 October 2017 at 19:51, Antoine Pitrou <solipsis@pitrou.net> wrote:
For the QA & platform compatibility aspects, one of the things actually defining a specific extended package set would let us do is to amend the test suite with a new "third-party" resource, whereby we: 1. Named specific known-working versions in the recommended-packages.txt file (updating them for each maintenance release) 2. Added a new test case that installed and ran the test suites for these projects in a venv (guarded by "-uthird-party")
The existing commercial redistributors have been doing this long enough now that they offer the most popular third party packages. Where folks can get into trouble is when they're putting their own bespoke environments together based directly on the python.org installers, and that quite often includes folks teaching themselves from a book or online tutorial (since book and tutorial authors are frequently reluctant to favour particular commercial vendors, and will hence direct readers to the python.org downloads instead).
regex and requests are the two cases I'm personally aware of that already have "You'll probably want to look at this 3rd party option instead" links at the beginning of the related standard library module documentation. ctypes should probably have a similar "Consider this alternative instead" pointer to cffi, but doesn't currently have one. All three of those (regex, requests, cffi) have received "in principle" approval for standard library inclusion at one point or another, but we haven't been able to devise a way to make the resulting maintenance logistics work (e.g. bringing in cffi would mean bringing in pycparser, which would mean bringing in PLY...) six should be on the recommended packages list for as long as 2.7 is still supported (and potentially for a while after) setuptools is currently brought in implicitly as a dependency of pip, but would also belong on a recommended packages list in its own right as an enhanced alternative to distutils (https://docs.python.org/3/distutils/ indirectly recommends that by pointing to the https://packaging.python.org/guides/tool-recommendations/ page) Beyond those, I think things get significantly more debatable (for example, while datetime doesn't currently reference pytz as a regularly updated timezone database, most ad hoc scripts can also get away with working in either UTC or local time without worrying about arbitrary timezones, which means that use case is at least arguably already adequately covered by explicit dependency management). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Perhaps slightly off-topic, but I have sometimes wondered if pip could not be made somewhat friendlier for the absolute newbie and the classroom context. Some concrete proposals. 1. Add a function `pip` to the interactive interpreter (similar to how `help` is available). def pip(args): import sys import subprocess subprocess.check_call([sys.executable, "-m", "pip"] + args.split()) This allows people to install something using pip as long as they have a Python prompt open, and avoids instructors to have to deal with platform-specific instructions for various shells. Also avoids confusion when multiple Python interpreters are available (it operates in the context of the current interpreter.) 2. Add to Pypi package webpages a line like: To install, execute this line in your Python interpreter: pip("install my-package --user") Make this copyable with a button (like Github allows you to copy the repo name). 3. Add the notion of a "channel" to which one can "subscribe". This is actually nothing new, just a Pypi package with no code and just a bunch of dependencies. But it allows me to say: Just enter: pip("install stephans-awesome-stuff --user") in your Python and you get a bunch of useful stuff. Stephan 2017-10-29 8:54 GMT+01:00 Nick Coghlan <ncoghlan@gmail.com>:

On 29 October 2017 at 10:40, Stephan Houben <stephanh42@gmail.com> wrote:
There are subtle issues around whether newly installed/upgraded packages are visible in a running Python interpreter. It's possible that this would result in *more* confusion than the current situation. I can see the appeal of something like this, but it's not as simple as it looks. If you want to discuss this further, I'd definitely suggest making it a thread of its own. Personally, as a pip maintainer, I'm -0 on this (possibly even -1). Paul

On Sunday, October 29, 2017, Paul Moore <p.f.moore@gmail.com> wrote:
If the package is already imported, reload() isn't sufficient; but deepreload from IPython may be. IPython also supports shell commands prefixed with '!': ! pip install -U requests regex IPython ! pip install -U --user psfblessedset1 %run pip install -U --user psfblessedset1 %run? % autoreload? # http://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html This:
It's possible that this would result in *more* confusion than the current situation.
Why isn't it upgrading over top of the preinstalled set? At least with bash, the shell command history is logged to .bash_history by default. IMO, it's easier to assume that environment-modifying commands are logged in the shell log; and that running a ``%logstart -o logged-cmds.py`` won't change the versions of packages it builds upon. Is it bad form to put ``! pip install -U requests `` at the top of every jupyter notebook? ``! pip install requests==version `` is definitely more reproducible. (seeAlso: binder, jupyter/docker-stacks) I'll just include these here unnecessarily from the other curated package set discussions: https://github.com/jupyter/docker-stacks https://github.com/Kaggle/docker-python/blob/master/Dockerfile https://binderhub.readthedocs.io/en/latest/ I can see the appeal of something like this, but it's not as simple as
it looks. If you want to discuss this further, I'd definitely suggest making it a thread of its own.
Personally, as a pip maintainer, I'm -0 on this (possibly even -1).
Same.

On 29 October 2017 at 07:54, Nick Coghlan <ncoghlan@gmail.com> wrote:
I agree with Guido, this doesn't seem to add much as stated. (From an earlier message of Nick's)
This is key to me. The target area is *scripting*. Library and application developers have their own ways of managing the dependency handling problem, and they generally work fine. The people who don't currently have a good solution are those who just use the system install - i.e., people writing adhoc scripts, people writing code that they want to share with other members of their organisation, via "here's this file, just run it", and not as a full application. For those people "not in a standard install" is a significantly higher barrier to usage than elsewhere. "Here's a Python file, just install Python and double click on the file" is a reasonable request. Running pip may be beyond the capabilities of the recipient. In my view, the key problems in this area are: 1. What users writing one-off scripts can expect to be available. 2. Corporate environments where "adding extra software" is a major hurdle. 3. Offline environments, where PyPI isn't available. The solutions to these issues aren't so much around how we let people know that they should do "pip install" (or "--install-recommended") but rather around initial installs. To that end I'd suggest: 1. The python.org installers gain an "install recommended 3rd party packages" option, that is true by default, which does "pip install <whatever we recommend>". This covers (1) and (2) above, as it makes the recommended package set the norm, and ensures that they are covered by an approval to "install Python". 2. For offline environments, we need to do a little more, but I'd imagine having the installer look for wheels in the same directory as the installer executable, and leave it to the user to put them there. If wheels aren't available the installer should warn but continue. For 3rd party distributions (Linux, Homebrew, Conda, ...) this gives a clear message that Python users are entitled to expect these modules to be available. Handling that expectation is down to those distributions. Things I haven't thought through: 1. System vs user site-packages. If we install (say) requests into the system site-packages, the user could potentially need admin rights to upgrade it. Or get into a situation where they have an older version in site-packages and a newer one in user-site (which I don't believe is a well-tested scenario), 2. Virtual environments. Should venv/virtualenv include the recommended packages by default? Probably not, but that does make the "in a virtual environment" experience different from the system Python experience. 3. The suggestion above for offline environments isn't perfect, by any means. Better solutions would be good here (but I don't think we want to go down the bundling route like we did with pip...?). Paul

On 2017-10-29 00:54, Nick Coghlan wrote:
So it sounds like you're just saying that there should be one more "awesome-python" style list, except it should be official, with the seal of approval of Python itself. The details of exactly how that would displayed to users are not so important as that it be easily accessible from python.org and clearly shown to be endorsed by the Python developers. I think that is a good idea. The current situation is quite confusing in many cases. You see a lot of questions on StackOverflow where someone is tearing their hair out trying to use urllib to do heavy lifting and the answer is "use requests instead". Likewise people trying to implement matrix multiplication who should probably be using numpy, As for regex, to be honest, I have yet to find any way in which it is not completely superior to the existing re library, and I find it somewhat frustrating that the "publisher-side" concerns you mention continue to hold it back from replacing re. The only problem I see with this sort of "Python seal of approval" library list is that it carries some of the same risks as incorporating such libraries into the stdlib. Not all, but some. For one thing, if people go to python.org and see a thing that says "You may also want to use these libraries that bear the Python Seal of Approval. . .", and then they download them, and something goes wrong (e.g., there is some kind of version conflict with another package), they will likely blame python.org (at least partially). Only now, since the libraries aren't in the stdlib, the Python devs won't really be able to do anything to fix that; all they could do is remove the offending package from the approved list. In practice I think this is unlikely to happen, since the idea would be that Python devs would be judicious in awarding the seal of approval only to projects that are robust and not prone to breakage. But it does change the nature of the approval from "we approve this *code* and we will fix it if it breaks" (which is how the existing stdlib works) to "we approve these *people* (the people working on requests or regex or whatever) and we will cease to do if they break their code". -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

The two use cases you describe (scripters and teachers) leave me luke-warm -- scripters live in the wild west and can just pip install whatever (that's what it means to be scripting) and teachers tend to want a customized bundle anyway -- let the edu world get together and create their own recommended bundle. As long as it's not going to be bundled, i.e. there's just going to be some list of packages that we recommend to 3rd party repackagers, then I'm fine with it. But they must remain clearly marked as 3rd party packages in whatever docs we provide, and live in site-packages. I would really like to see what you'd add to the list besides requests -- I really don't see why the teaching use case would need the regex module (unless it's a class in regular expressions). --Guido On Sun, Oct 29, 2017 at 12:54 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 29 October 2017 at 18:56, Guido van Rossum <guido@python.org> wrote:
In my experience, "scripting" *does* include people for whom "being in a standard Python distribution" is a requirement. Situations I have encountered: 1. Servers where developers need to write administrative or monitoring scripts, but they don't control what's on that server. The Python installation that comes by default is all that's available. 2. Developers working in environments with limited internet access. For example, my employer has a Windows (NTLM) proxy that pip can't work with. Getting internet access for pip involves installing a separate application, and that's not always possible/desirable. 3. Developers writing scripts to be shared with non-developers. On Unix, this means "must work with the system Python", and on Windows "download and install Python from python.org" is typically all you can expect (although in that case "install Anaconda" is a possible alternative, although not one I've tried telling people to do myself). Nick's proposal doesn't actually help for (1) or (2), as the problem there is that "pip install" won't work. And bundling a script with its (pure Python) dependencies, for example as a zipapp, is always a solution - although it's nowhere near as easy as simply copying a single-file script to the destination where it's to be run. So these situations don't actually matter in terms of the value of the proposals being discussed here. But I did want to dispute the idea that "scripters can just pip install whatever" is inherent to the idea of being a scripter - my experience is the opposite, that scripters are the people *least* able to simply pip install things. Paul

Writing scripts for non-developers, in an unmanaged environment (IT cant push a python install to the system) on windows means running pyinstaller et. al., on your script, if it has dependencies or not. Its not worth it to walk someone through a python install to run a script, let alone installing optional dependencies. For linux, you are not limited to the standard library, but what the distros have packaged - its easy enough to tell a typical non-developer linux user "apt install python python-twisted, then run the script". For deployment to restricted environments, optional dependencies suggested by python-dev is unlikely to be installed either. Not that this matters, or would matter for years, because IME those environments have ancient versions of python that the developer is restricted to anyways. And environments with limited internet access ... Bandersnatch on a portable hard drive. Not to sound too crass, but there are only so many edge cases that a third party platform developer can be expected to care about (enough to bend over backwards to support). Putting a curated list of "These are good packages that solve a lot of problems that the stdlib doesn't or makes complicated" on python.org is a great idea. Building tooling to make those part of a default installation is not.

On 29 October 2017 at 20:44, Alex Walters <tritium-list@sdamon.com> wrote:
Let's just say "not in the environments I work in", and leave it at that.
I never suggested otherwise. I just wanted to point out that "scripting in Python" covers a wider range of use cases than "can use pip install". I'm not asking python-dev to support those use cases (heck, I'm part of python-dev and *I* don't want to bend over backwards to support them), but I do ask that people be careful not to dismiss a group of users who are commonly under-represented in open source mailing lists and communities. Paul

On 30 October 2017 at 04:56, Guido van Rossum <guido@python.org> wrote:
For personal scripting, we can install whatever, but "institutional scripting" isn't the same thing - there we're scripting predefined "Standard Operating Environments", like those Mahmoud Hashemi describes for PayPal at the start of https://www.paypal-engineering.com/2016/09/07/python-packaging-at-paypal/ "Just use PyInstaller" isn't an adequate answer in large-scale environments because of the "zlib problem": you don't want to have to rebuild and redeploy the world to handle a security update in a low level frequently used component, you want to just update that component and have everything else pick it up dynamically. While "Just use conda" is excellent advice nowadays for any organisation contemplating defining their own bespoke Python SOE (hence Mahmoud's post), if that isn't being driven by the folks that already maintain the SOE (as happened in PayPal's case) convincing an org to add a handful of python-dev endorsed libraries to an established SOE is going to be easier than rebasing their entire Python SOE on conda.
Yep, that was my intent, although it may not have been clear in my initial proposal. I've filed two separate RFEs in relation to that: * Documentation only: https://bugs.python.org/issue31898 * Regression testing resource: https://bugs.python.org/issue31899 Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I just feel that when you're talking about an org like PayPal they can take care of themselves and don't need our help. They will likely have packages they want installed everywhere that would never make in on your list. So it feels this whole discussion is a distraction and a waste of time (yours, too). On Sun, Oct 29, 2017 at 10:24 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 31 October 2017 at 00:28, Guido van Rossum <guido@python.org> wrote:
Just because companies are big doesn't mean they necessarily have anyone internally that's already up to speed on the specifics of recommended practices in a sprawling open source community like Python's. The genesis of this idea is that I think we can offer a more consistent initial experience for those folks than "Here's PyPI and Google, y'all have fun now" (and in so doing, help folks writing books and online tutorials to feel more comfortable with the idea of assuming that libraries like requests will be available in even the most restrictive institutional environments that still allow the use of Python). One specific situation this idea is designed to help with is the one where: - there's a centrally managed Standard Operating Environment that dictates what gets installed - they've approved the python.org installers - they *haven't* approved anything else yet Now, a lot of large orgs simply won't get into that situation in the first place, since their own supplier management rules will push them towards a commercial redistributor, in which case they'll get their chosen redistributor's preferred package set, which will then typically cover at least a few hundred of the most popular PyPI packages. But some of them will start from the narrower "standard library only" baseline, and I spent enough time back at Boeing arguing for libraries to be added to our approved component list to appreciate the benefits of transitive declarations of trust ("we trust supplier X, they unambigously state their trust in supplier Y, so that's an additional point in favour of our also trusting supplier Y") when it comes time to make your case to your supplier management organisation. Such declarations still aren'y always sufficient, but they definitely don't hurt, and they sometimes help. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

What's your proposed process to arrive at the list of recommended packages? And is it really just going to be a list of names, or is there going to be some documentation (about the vetting, not about the contents of the packages) for each name? On Mon, Oct 30, 2017 at 9:08 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On Mon, Oct 30, 2017 at 12:29 PM, Guido van Rossum <guido@python.org> wrote:
As I see it, the bootstrap would be a requirements.txt (requirements.pip) with the packages that more experienced developers prefer to use over stdlib, pinned to package versions known to work well with the CPython release. I think this is a great idea, specially if it's an easy opt-in. For example, sometimes I go into a programming niche for a few years, and I'd very much appreciate a curated list of packages when I move to another niche. A pay-forward kind of thing. The downside to the idea is that the list of packages will need a curator (which could be python.org).

So what's your list? What would you put in requirements.txt? I'm fine with doing it in the form of requirements.txt -- I am worried that when push comes to shove, we won't be able to agree on what list of package names should go into that file. On Mon, Oct 30, 2017 at 12:08 PM, Juancarlo Añez <apalala@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 31 October 2017 at 02:29, Guido van Rossum <guido@python.org> wrote:
What's your proposed process to arrive at the list of recommended packages?
I'm thinking it makes the most sense to treat inclusion in the recommended packages list as a possible outcome of proposals for standard library inclusion, rather than being something we'd provide a way to propose specifically. We'd only use it in cases where a proposal would otherwise meet the criteria for stdlib inclusion, but the logistics of actually doing so don't work for some reason. Running the initial 5 proposals through that filter: * six: a cross-version compatibility layer clearly needs to be outside the standard library * setuptools: we want to update this in line with the PyPA interop specs, not the Python language version * cffi: updates may be needed for PyPA interop specs, Python implementation updates or C language definition updates * requests: updates are more likely to be driven by changes in network protocols and client platform APIs than Python language changes * regex: we don't want two regex engines in the stdlib, transparently replacing _sre would be difficult, and _sre is still good enough for most purposes Of the 5, I'd suggest that regex is the only one that could potentially still make its way into the standard library some day - it would just require someone with both the time and inclination to create a CPython variant that used _regex instead of _sre as the default regex engine, and then gathered evidence to show that it was "compatible enough" with _sre to serve as the default engine for CPython. For the first four, there are compelling arguments that their drivers for new feature additions are such that their release cycles shouldn't ever be tied to the rate at which we update the Python language definition.
I'm thinking a new subsection in https://docs.python.org/devguide/stdlibchanges.html for "Recommended Third Party Packages" would make sense, covering what I wrote above. It also occurred to me that since the recommendations are independent of the Python version, they don't really belong in the version specific documentation. While the Developer's Guide isn't really the right place for the list either (except as an easier way to answer "Why isn't <X> in the standard library?" questions), it could be a good interim option until I get around to actually writing a first draft of https://github.com/python/redistributor-guide/ (which I was talking to Barry about at the dev sprint, but didn't end up actually creating any content for since I went down a signal handling rabbit hole instead). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Oct 31, 2017 at 4:42 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I don't think that gets you off the hook for a process proposal. We need some criteria to explain why a module should be on the recommended list -- not just a ruling as to why it shouldn't be in the stdlib.
But that would exclude most of the modules you mention below, since one of the criteria is that their development speed be matched with Python's release cycle. I think there must be some form of "popularity" combined with "best of breed". In particular I'd like to have a rule that explains why flask and Django would never make the list. (I don't know what that rule is, or I would tell you -- my gut tells me it's something to do with having their own community *and* competing for the same spot.) Running the initial 5 proposals through that filter:
* six: a cross-version compatibility layer clearly needs to be outside the standard library
Hm... Does six still change regularly? If not I think it *would* be a candidate for actual stdlib inclusion. Just like we added u"..." literals to Python 3.4.
* setuptools: we want to update this in line with the PyPA interop specs, not the Python language version
But does that exclude stdlib inclusion? Why would those specs change, and why couldn't they wait for a new Python release?
* cffi: updates may be needed for PyPA interop specs, Python implementation updates or C language definition updates
Hm, again, I don't recall that this was debated -- I think it's a failure that it's not in the stdlib.
* requests: updates are more likely to be driven by changes in network protocols and client platform APIs than Python language changes
Here I agree. There's no alternative (except aiohttp, but that's asyncio-based) and it can't be in the stdlib because it's actively being developed.
I think this needn't be recommended at all. For 99.9% of regular expression uses, re is just fine. Let's just work on a strategy for introducing regex into the stdlib.
As you can tell from my arguing, the reasons need to be written up in more detail.
That's too well hidden for my taste.
But that doesn't mean they can't (also) be listed there. (And each probably has its version dependencies.)
Hm, let's not put more arbitrary check boxes in the way of progress. Maybe it can be an informational PEP that's occasionally updated? -- --Guido van Rossum (python.org/~guido)

On 1 November 2017 at 00:53, Guido van Rossum <guido@python.org> wrote:
The developer guide already has couple of sections on this aspect: * https://devguide.python.org/stdlibchanges/#acceptable-types-of-modules * https://devguide.python.org/stdlibchanges/#requirements I don't think either of those sections is actually quite right (since we've approved new modules that wouldn't meet them), but they're not terrible as a starting point in general, and they're accurate for the recommended packages use case.
The developer guide words this as "The module needs to be considered best-of-breed.". In some categories (like WSGI frameworks), there are inherent trade-offs that mean there will *never* be a single best-of-breed solution, since projects like Django, Flask, and Pyramid occupy deliberately different points in the space of available design decisions. Running the initial 5 proposals through that filter:
It still changes as folks porting new projects discover additional discrepancies between the 2.x and 3.x standard library layouts and behaviour (e.g. we found recently that the 3.x subprocess module's emulation of the old commands module APIs actually bit shifts the status codes relative to the 2.7 versions). The rate of change has definitely slowed down a lot since the early days, but it isn't zero. In addition, the only folks that need it are those that already care about older versions of Python - if you can start with whatever the latest version of Python is, and don't have any reason to support users still running older version, you can happily pretend those older versions don't exist, and hence don't need a compatibility library like six. As a separate library, it can just gracefully fade away as folks stop depending on it as they drop Python 2.7 support. By contrast, if we were to bring it into the 3.x standard library, then we'd eventually have to figure out when we could deprecate and remove it again.
The specs mainly change when we want to offer publishers new capabilities while still maintaining compatibility with older installation clients (and vice-versa: we want folks still running Python 2.7 to be able to publish wheel files and use recently added metadata fields like Description-Content-Type). The reason we can't wait for new Python releases is because when we add such things, we need them to work on *all* supported Python releases (including 2.7 and security-release-only 3.x versions). There are also other drivers for setuptools updates, which include: - operating system build toolchain changes (e.g. finding new versions of Visual Studio or XCode) - changes to PyPI's operations (e.g. the legacy upload API getting turned off due to persistent service stability problems, switching to HTTPS only access) With setuptools as a separate project, a whole lot of package publication problems can be solved via "pip install --upgrade setuptools wheel" in a virtual environment, which is a luxury we don't have with plain distutils.
A couple of years ago I would have agreed with you, but I've spent enough time on packaging problems now to realise that cffi actually qualifies as a build tool due to the way it generates extension module wrappers when used in "out-of-line" mode. Being outside the standard library means that cffi still has significant flexibility to evolve how it separates its buildtime functionality from its runtime functionality, and may eventually be adjusted so that only "_cffi_backend" needs to be installed at runtime for the out-of-line compilation mode, without the full C header parsing and inline extension module compilation capabilities of CFFI itself (see https://cffi.readthedocs.io/en/latest/cdef.html#preparing-and-distributing-m... for the details). Being separate also means that cffi can be updated to generate more efficient code even for existing Python versions.
Given your informational PEP suggestion below, I'd probably still include it, but in a separate section from the others (e.g. the others might be listed as "Misaligned Feature Release Cycles", which is an inherent logistical problem that no amount of coding can fix, while regex would instead be categorised as "Technical Challenges").
If I'm correctly reading that as "Could the list of Recommended Third Party Packages be an informational PEP?", then I agree that's probably a good way to tackle it, since it will cover both the developer-centric "Why isn't this in the standard library yet?" aspect *and* the "Redistributors should probably provide this" aspect. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tuesday, October 31, 2017, Guido van Rossum <guido@python.org> wrote:
What about certifi (SSL bundles (from requests (?)) on PyPi) https://pypi.org/project/certifi/ ?

On Tue, Oct 31, 2017 at 11:41 AM, MRAB <python@mrabarnett.plus.com> wrote:
regex gets updated when the Unicode Consortium releases an update.
Is it a feature that that is more frequently than Python releases? There are other things in Python that must be updated whenever the UC releases an update, and they get treated as features (or perhaps as bugfixes, I'm not sure) but this means they generally don't get backported. -- --Guido van Rossum (python.org/~guido)

On 2017-10-31 18:44, Guido van Rossum wrote:
Here's a list of the updates to Unicode: https://www.unicode.org/versions/enumeratedversions.html Roughly yearly. Those still on Python 2.7, for example, are now 8 years behind re Unicode. At least Python 3.6 is only 1 year/release behind, which is fine!

On Tue, Oct 31, 2017 at 12:24 PM, MRAB <python@mrabarnett.plus.com> wrote:
Frankly I consider that a feature of the 2.7 end-of-life planning. :-)
At least Python 3.6 is only 1 year/release behind, which is fine!
OK, so presumably that argument doesn't preclude inclusion in the 3.7 (or later) stdlib. I'm beginning to warm up to the idea again... Maybe we should just bite the bullet. Nick, what do you think? Is it worth a small PEP? -- --Guido van Rossum (python.org/~guido)

On 1 November 2017 at 05:56, Guido van Rossum <guido@python.org> wrote:
I'm personally still in favour of swapping out the current _sre based implementation for a _regex based implementation (such that 3.7+ still only contained a single regex engine, and the stdlib re module and a PyPI regex backport module could still happily coexist), and a PEP + draft patch would be the way to do that. The framing of a PEP for that approach would be "Replace the regex engine backing the re module" rather than "Add regex to the standard library". The existing engine could then potentially be spun out as a new "sre" project on PyPI, such that folks that *were* depending on _sre internals had access to an upgrade path that didn't necessarily require them porting their code to a new regex engine (the PEP author wouldn't have to commit to doing that work - we'd just ask the PyPI admins to reserve the name in case there proved to be demand for such a project). However, I'm also not one of the folks that would be on the hook for handling any compatibility regressions that were subsequently reported against the 3.7 re module, so I'd also take my +1 with a rather large grain of salt - it's easy to be positive about a plan when the potential downsides don't affect me personally :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Oct 31, 2017 4:42 AM, "Nick Coghlan" <ncoghlan@gmail.com> wrote: On 31 October 2017 at 02:29, Guido van Rossum <guido@python.org> wrote:
What's your proposed process to arrive at the list of recommended packages?
I'm thinking it makes the most sense to treat inclusion in the recommended packages list as a possible outcome of proposals for standard library inclusion, rather than being something we'd provide a way to propose specifically. We'd only use it in cases where a proposal would otherwise meet the criteria for stdlib inclusion, but the logistics of actually doing so don't work for some reason. Running the initial 5 proposals through that filter: * six: a cross-version compatibility layer clearly needs to be outside the standard library * setuptools: we want to update this in line with the PyPA interop specs, not the Python language version * cffi: updates may be needed for PyPA interop specs, Python implementation updates or C language definition updates * requests: updates are more likely to be driven by changes in network protocols and client platform APIs than Python language changes * regex: we don't want two regex engines in the stdlib, transparently replacing _sre would be difficult, and _sre is still good enough for most purposes Some other packages that might meet these criteria, or at least be useful for honing them: - lxml - numpy - cryptography - idna -n

On Wed, Nov 1, 2017 at 7:41 AM, Guido van Rossum <guido@python.org> wrote:
My impression (probably others are more knowledgeable) is that lxml has more or less replaced the stdlib 'xml' package as the de facto standard -- sort of similar to the urllib2/requests situation. AFAIK lxml has never been proposed for stdlib inclusion and I believe the fact that it's all in Cython would be a barrier even if the maintainers were amenable. But it might be helpful to our users to put a box at the top of the 'xml' docs suggesting people check out 'lxml', similar to the one on the urllib2 docs.
- numpy
Numpy's arrays are a foundational data structure and de facto standard, and would probably fit naturally in the stdlib semantically, but for a number of logistical/implementational reasons it doesn't make sense to merge. Probably doesn't make much difference whether python-dev "blesses" it or not in practice, since there aren't any real competitors inside or outside the stdlib; it'd more just be an acknowledgement of the status quo.
- cryptography
Conceptually, core cryptographic operations are the kind of functionality that you might expect to see in the stdlib, but the unique sensitivity of crypto code makes this a bad idea. Historically there have been a variety of basic crypto packages for Python, but at this point IIUC the other ones are all considered obsolete-and-potentially-dangerous and the consensus is everyone should move to 'cryptography', so documenting that in this PEP might help send people in the right direction.
- idna
This is a bit of a funny one. IDNA functionality is pretty fundamental -- you need it to do unicode<->bytes conversions on hostnames, so basically anyone doing networking needs it. Python ships with some built-in IDNA functionality (as the "idna" codec), but it's using an obsolete standard (IDNA2003, versus the current IDNA2008, see bpo-17305), and IIRC Christian thinks the whole codec-based design is the wrong approach... basically what we have in the stdlib has been broken for most of a decade and there doesn't seem to be anyone interested in fixing it. So... in the long run the stdlib support should either be fixed or deprecated. I'm not sure which is better. (The argument for deprecating it would be that IIUC you need to update the tables whenever a new unicode standard comes out, and since it's a networking thing you want to keep in sync with the rest of the world, which is easier with a standalone library. I don't know how much this matters in practice.) But right now, this library is just better than the stdlib functionality, and it wouldn't hurt to document that. -n -- Nathaniel J. Smith -- https://vorpus.org

Good write-ups. Though it sounds like idna should just be moved into the stdlib. (Agreed that for things like this codecs are not a great idea.) We have other things that must be updated whenever the Unicode standard changes and it seems that doing this in feature releases is typically fine, and treating it as a bugfix (so doing it in bugfix releases is acceptable) sounds fine to me too. Unless it were so security-critical that we'd want to enable users to do such updates to less-supported Pythons or on a tighter schedule (which is one of the reasons requests can't go into the stdlib -- they have their own cert store). On Wed, Nov 1, 2017 at 7:36 PM, Nathaniel Smith <njs@pobox.com> wrote:
-- --Guido van Rossum (python.org/~guido)

At this point, I would punt to distutils-sig about curated packages, and pip tooling to support that, but they are bogged down as it stands with just getting warehouse up and running. I don’t support putting specialized tooling in python itself to install curated packages, because that curation would be on the same release schedule as python itself. Pip is an important special case, but it’s a special case to avoid further special cases. If there was excess manpower on the packaging side, that’s where it should be developed. From: Python-ideas [mailto:python-ideas-bounces+tritium-list=sdamon.com@python.org] On Behalf Of Guido van Rossum Sent: Sunday, October 29, 2017 1:17 AM To: Nick Coghlan <ncoghlan@gmail.com> Cc: python-ideas@python.org Subject: Re: [Python-ideas] Defining an easily installable "Recommended baseline package set" Why? What's wrong with pip install? Why complicate things? Your motivation is really weak here. "beneficial"? "difficult cases"? On Sat, Oct 28, 2017 at 8:57 PM, Nick Coghlan <ncoghlan@gmail.com <mailto:ncoghlan@gmail.com> > wrote: Over on python-dev, the question of recommending MRAB's "regex" module over the standard library's "re" module for more advanced regular expressions recently came up again. Because of various logistical issues and backwards compatibility risks, it's highly unlikely that we'll ever be able to swap out the current _sre based re module implementation in the standard library for an implementation based on the regex module. At the same time, it would be beneficial to have a way to offer an even stronger recommendation to redistributors that we think full-featured general purpose Python scripting environments should offer the regex module as an opt-in alternative to the baseline re feature set, since that would also help with other currently difficult cases like the requests module. What I'm thinking is that we could make some relatively simple additions to the `ensurepip` and `venv` modules to help with this: 1. Add a ensurepip.RECOMMENDED_PACKAGES mapping keyed by standard library module names containing dependency specifiers for recommended third party packages for particular tasks (e.g. "regex" as an enhanced alternative to "re", "requests" as an enhanced HTTPS-centric alternative to "urllib.request") 2. Add a new `install_recommended` boolean flag to ensurepip.bootstrap 3. Add a corresponding `--install-recommended flag to the `python -m ensurepip` CLI 4. Add a corresponding `--install-recommended flag to the `python -m venv` CLI (when combined with `--without-pip`, this would run pip directly from the bundled wheel file to do the installations) We'd also need either a new informational PEP or else a section in the developer guide to say that the contents of `ensurepip.RECOMMENDED_PACKAGES` are up to the individual module maintainers (hence keying the mapping by standard library module name, rather than having a single flat list for the entire standard library). For redistributors with weak dependency support, these reference interpreter level recommendations could become redistributor level recommendations. Redistributors without weak dependency support could still make a distinction between "default" installations (which would include them) and "minimal" installations (which would exclude them). Folks writing scripts and example code for independent distribution (i.e. no explicitly declared dependencies) could then choose between relying on just the standard library (as now), or on the standard library plus independently versioned recommended packages. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com <mailto:ncoghlan@gmail.com> | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org <mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -- --Guido van Rossum (python.org/~guido <http://python.org/~guido> )

There are already curated lists of Python packages, such as: https://github.com/vinta/awesome-python Even better, if you don't agree, you can just clone it and create your own ;-) Stephan 2017-10-29 6:46 GMT+01:00 Alex Walters <tritium-list@sdamon.com>:

On 29 October 2017 at 15:16, Guido van Rossum <guido@python.org> wrote:
Why? What's wrong with pip install?
At a technical level, this would just be a really thin wrapper around 'pip install' (even thinner than ensurepip in general, since these libraries *wouldn't* be bundled for offline installation, only listed by name).
Why complicate things? Your motivation is really weak here. "beneficial"? "difficult cases"?
The main recurring problems with "pip install" are a lack of discoverability and a potential lack of availability (depending on the environment). This then causes a couple of key undesirable outcomes: - folks using Python as a teaching language have to choose between teaching with just the standard library APIs, requiring that learners restrict themselves to a particular preconfigured learning environment, or make a detour into package management tools in order to ensure learners have access to the APIs they actually want to use (this isn't hypothetical - I was a technical reviewer for a book that justified teaching XML-RPC over HTTPS+JSON on the basis that xmlrpc was in the standard library, and requests wasn't) - folks using Python purely as a scripting language (i.e without app level dependency management) may end up having to restrict themselves to the standard library API, even when there's a well-established frequently preferred alternative for what they're doing (e.g. requests for API management, regex for enhanced regular expressions) The underlying problem is that our reasons for omitting these particular libraries from the standard library relate mainly to publisher side concerns like the logistics of ongoing bug fixing and support, *not* end user concerns like software reliability or API usability. This means that if educators aren't teaching them, or redistributors aren't providing them, then they're actively doing their users a disservice (as opposed to other cases like web frameworks and similar, where there are multiple competing options, you're only going to want one of them in any given application, and the relevant trade-offs between the available options depend greatly on exactly what you're doing) Now, the Python-for-data-science community have taken a particular direction around handling this, and there's an additional library set beyond the standard library that's pretty much taken for granted in a data science context. While conda has been the focal point for those efforts more recently, it started a long time ago with initiatives like Python(x, y) and the Enthought Python Distribution. Similarly, initiatives like Raspberry Pi are able to assume a particular learning environment (Raspbian in the Pi's case), rather than coping with arbitrary starting points. Curated lists like the "awesome-python" one that Stephan linked don't really help that much with the discoverability problem, since they become just another thing for people to learn: How do they find out such lists exist in the first place? Given such a list, how do they determine if the recommendations it offers are actually relevant to their needs? Since assessing a published package API against your needs as a user is a skill that has to be learned like any other, it can be a lot easier to get started in a more prescriptive environment that says "This is what you have to work with for now, we'll explain more about your options for branching out later". The proposal in this thread thus stems from asking the question "Who is going to be best positioned to offer authoritative advice on which third party modules may be preferable to their standard library counterparts for end users of Python?" and answering it with "The standard library module maintainers that are already responsible for deciding whether or not to place appropriate See Also links in the module documentation". All the proposal does is to suggest taking those existing recommendations from the documentation and converting them into a more readibly executable form. I'm not particularly wedded to any particular approach to making the recommendations available in a more machine-friendly form, though - it's just the "offer something more machine friendly than scraping the docs for recommendation links" aspect that I'm interested in. For example, we could skip touching ensurepip or venv at all, and instead limit this to a documentation proposal to collect these recommendations from the documentation, and publish them within the `venv` module docs as a "recommended-libraries.txt" file (using pip's requirements.txt format). That would be sufficient to allow straightforward 3rd party automation, without necessarily committing to providing such automation ourselves. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, 29 Oct 2017 17:54:22 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
They're both really. One important consequence of a library being in the stdlib is to tie it to the stdlib's release cycle, QA infrastructure and compatibility requirements -- which more or less solves many dependency and/or version pinning headaches.
Which redistributors do not provide the requests library, for example? regex is probably not as popular (mostly because re is good enough for most purposes), but it still appears to be available from Ubuntu and Anaconda.
I'm curious what such a list looks like :-) Regards Antoine.

On 29 October 2017 at 09:51, Antoine Pitrou <solipsis@pitrou.net> wrote:
I know it's not what you meant, but "the python.org installers" is the obvious answer here. On Windows, if you say to someone "install Python", they get the python.org distribution. Explicitly directing them to Anaconda is an option, but that gives them a distinctly different experience than "standard Python plus some best of breed packages like requests" does.
I am also. I'd put requests on it immediately, but that's the only thing I consider obvious. regex is what triggered this, but I'm not sure it adds *that* much - it's a trade off between people who need the extra features and people confused over why we have two regex libraries available. After that, you're almost immediately into domain-specific answers, and it becomes tricky fast. Paul

On Sun, Oct 29, 2017 at 1:44 PM, Paul Moore <p.f.moore@gmail.com> wrote:
If the list is otherwise not be long enough, maybe it will contain something that is now in the stdlib?-) -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

On Sun, 29 Oct 2017 11:44:52 +0000 Paul Moore <p.f.moore@gmail.com> wrote:
Well, I'm not sure what Nick meant by the word exactly, but "to redistribute" means "to distribute again", so the word looked aimed at third-party vendors of Python rather than python.org :-) Regards Antoine.

On 29 October 2017 at 19:51, Antoine Pitrou <solipsis@pitrou.net> wrote:
For the QA & platform compatibility aspects, one of the things actually defining a specific extended package set would let us do is to amend the test suite with a new "third-party" resource, whereby we: 1. Named specific known-working versions in the recommended-packages.txt file (updating them for each maintenance release) 2. Added a new test case that installed and ran the test suites for these projects in a venv (guarded by "-uthird-party")
The existing commercial redistributors have been doing this long enough now that they offer the most popular third party packages. Where folks can get into trouble is when they're putting their own bespoke environments together based directly on the python.org installers, and that quite often includes folks teaching themselves from a book or online tutorial (since book and tutorial authors are frequently reluctant to favour particular commercial vendors, and will hence direct readers to the python.org downloads instead).
regex and requests are the two cases I'm personally aware of that already have "You'll probably want to look at this 3rd party option instead" links at the beginning of the related standard library module documentation. ctypes should probably have a similar "Consider this alternative instead" pointer to cffi, but doesn't currently have one. All three of those (regex, requests, cffi) have received "in principle" approval for standard library inclusion at one point or another, but we haven't been able to devise a way to make the resulting maintenance logistics work (e.g. bringing in cffi would mean bringing in pycparser, which would mean bringing in PLY...) six should be on the recommended packages list for as long as 2.7 is still supported (and potentially for a while after) setuptools is currently brought in implicitly as a dependency of pip, but would also belong on a recommended packages list in its own right as an enhanced alternative to distutils (https://docs.python.org/3/distutils/ indirectly recommends that by pointing to the https://packaging.python.org/guides/tool-recommendations/ page) Beyond those, I think things get significantly more debatable (for example, while datetime doesn't currently reference pytz as a regularly updated timezone database, most ad hoc scripts can also get away with working in either UTC or local time without worrying about arbitrary timezones, which means that use case is at least arguably already adequately covered by explicit dependency management). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Perhaps slightly off-topic, but I have sometimes wondered if pip could not be made somewhat friendlier for the absolute newbie and the classroom context. Some concrete proposals. 1. Add a function `pip` to the interactive interpreter (similar to how `help` is available). def pip(args): import sys import subprocess subprocess.check_call([sys.executable, "-m", "pip"] + args.split()) This allows people to install something using pip as long as they have a Python prompt open, and avoids instructors to have to deal with platform-specific instructions for various shells. Also avoids confusion when multiple Python interpreters are available (it operates in the context of the current interpreter.) 2. Add to Pypi package webpages a line like: To install, execute this line in your Python interpreter: pip("install my-package --user") Make this copyable with a button (like Github allows you to copy the repo name). 3. Add the notion of a "channel" to which one can "subscribe". This is actually nothing new, just a Pypi package with no code and just a bunch of dependencies. But it allows me to say: Just enter: pip("install stephans-awesome-stuff --user") in your Python and you get a bunch of useful stuff. Stephan 2017-10-29 8:54 GMT+01:00 Nick Coghlan <ncoghlan@gmail.com>:

On 29 October 2017 at 10:40, Stephan Houben <stephanh42@gmail.com> wrote:
There are subtle issues around whether newly installed/upgraded packages are visible in a running Python interpreter. It's possible that this would result in *more* confusion than the current situation. I can see the appeal of something like this, but it's not as simple as it looks. If you want to discuss this further, I'd definitely suggest making it a thread of its own. Personally, as a pip maintainer, I'm -0 on this (possibly even -1). Paul

On Sunday, October 29, 2017, Paul Moore <p.f.moore@gmail.com> wrote:
If the package is already imported, reload() isn't sufficient; but deepreload from IPython may be. IPython also supports shell commands prefixed with '!': ! pip install -U requests regex IPython ! pip install -U --user psfblessedset1 %run pip install -U --user psfblessedset1 %run? % autoreload? # http://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html This:
It's possible that this would result in *more* confusion than the current situation.
Why isn't it upgrading over top of the preinstalled set? At least with bash, the shell command history is logged to .bash_history by default. IMO, it's easier to assume that environment-modifying commands are logged in the shell log; and that running a ``%logstart -o logged-cmds.py`` won't change the versions of packages it builds upon. Is it bad form to put ``! pip install -U requests `` at the top of every jupyter notebook? ``! pip install requests==version `` is definitely more reproducible. (seeAlso: binder, jupyter/docker-stacks) I'll just include these here unnecessarily from the other curated package set discussions: https://github.com/jupyter/docker-stacks https://github.com/Kaggle/docker-python/blob/master/Dockerfile https://binderhub.readthedocs.io/en/latest/ I can see the appeal of something like this, but it's not as simple as
it looks. If you want to discuss this further, I'd definitely suggest making it a thread of its own.
Personally, as a pip maintainer, I'm -0 on this (possibly even -1).
Same.

On 29 October 2017 at 07:54, Nick Coghlan <ncoghlan@gmail.com> wrote:
I agree with Guido, this doesn't seem to add much as stated. (From an earlier message of Nick's)
This is key to me. The target area is *scripting*. Library and application developers have their own ways of managing the dependency handling problem, and they generally work fine. The people who don't currently have a good solution are those who just use the system install - i.e., people writing adhoc scripts, people writing code that they want to share with other members of their organisation, via "here's this file, just run it", and not as a full application. For those people "not in a standard install" is a significantly higher barrier to usage than elsewhere. "Here's a Python file, just install Python and double click on the file" is a reasonable request. Running pip may be beyond the capabilities of the recipient. In my view, the key problems in this area are: 1. What users writing one-off scripts can expect to be available. 2. Corporate environments where "adding extra software" is a major hurdle. 3. Offline environments, where PyPI isn't available. The solutions to these issues aren't so much around how we let people know that they should do "pip install" (or "--install-recommended") but rather around initial installs. To that end I'd suggest: 1. The python.org installers gain an "install recommended 3rd party packages" option, that is true by default, which does "pip install <whatever we recommend>". This covers (1) and (2) above, as it makes the recommended package set the norm, and ensures that they are covered by an approval to "install Python". 2. For offline environments, we need to do a little more, but I'd imagine having the installer look for wheels in the same directory as the installer executable, and leave it to the user to put them there. If wheels aren't available the installer should warn but continue. For 3rd party distributions (Linux, Homebrew, Conda, ...) this gives a clear message that Python users are entitled to expect these modules to be available. Handling that expectation is down to those distributions. Things I haven't thought through: 1. System vs user site-packages. If we install (say) requests into the system site-packages, the user could potentially need admin rights to upgrade it. Or get into a situation where they have an older version in site-packages and a newer one in user-site (which I don't believe is a well-tested scenario), 2. Virtual environments. Should venv/virtualenv include the recommended packages by default? Probably not, but that does make the "in a virtual environment" experience different from the system Python experience. 3. The suggestion above for offline environments isn't perfect, by any means. Better solutions would be good here (but I don't think we want to go down the bundling route like we did with pip...?). Paul

On 2017-10-29 00:54, Nick Coghlan wrote:
So it sounds like you're just saying that there should be one more "awesome-python" style list, except it should be official, with the seal of approval of Python itself. The details of exactly how that would displayed to users are not so important as that it be easily accessible from python.org and clearly shown to be endorsed by the Python developers. I think that is a good idea. The current situation is quite confusing in many cases. You see a lot of questions on StackOverflow where someone is tearing their hair out trying to use urllib to do heavy lifting and the answer is "use requests instead". Likewise people trying to implement matrix multiplication who should probably be using numpy, As for regex, to be honest, I have yet to find any way in which it is not completely superior to the existing re library, and I find it somewhat frustrating that the "publisher-side" concerns you mention continue to hold it back from replacing re. The only problem I see with this sort of "Python seal of approval" library list is that it carries some of the same risks as incorporating such libraries into the stdlib. Not all, but some. For one thing, if people go to python.org and see a thing that says "You may also want to use these libraries that bear the Python Seal of Approval. . .", and then they download them, and something goes wrong (e.g., there is some kind of version conflict with another package), they will likely blame python.org (at least partially). Only now, since the libraries aren't in the stdlib, the Python devs won't really be able to do anything to fix that; all they could do is remove the offending package from the approved list. In practice I think this is unlikely to happen, since the idea would be that Python devs would be judicious in awarding the seal of approval only to projects that are robust and not prone to breakage. But it does change the nature of the approval from "we approve this *code* and we will fix it if it breaks" (which is how the existing stdlib works) to "we approve these *people* (the people working on requests or regex or whatever) and we will cease to do if they break their code". -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

The two use cases you describe (scripters and teachers) leave me luke-warm -- scripters live in the wild west and can just pip install whatever (that's what it means to be scripting) and teachers tend to want a customized bundle anyway -- let the edu world get together and create their own recommended bundle. As long as it's not going to be bundled, i.e. there's just going to be some list of packages that we recommend to 3rd party repackagers, then I'm fine with it. But they must remain clearly marked as 3rd party packages in whatever docs we provide, and live in site-packages. I would really like to see what you'd add to the list besides requests -- I really don't see why the teaching use case would need the regex module (unless it's a class in regular expressions). --Guido On Sun, Oct 29, 2017 at 12:54 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 29 October 2017 at 18:56, Guido van Rossum <guido@python.org> wrote:
In my experience, "scripting" *does* include people for whom "being in a standard Python distribution" is a requirement. Situations I have encountered: 1. Servers where developers need to write administrative or monitoring scripts, but they don't control what's on that server. The Python installation that comes by default is all that's available. 2. Developers working in environments with limited internet access. For example, my employer has a Windows (NTLM) proxy that pip can't work with. Getting internet access for pip involves installing a separate application, and that's not always possible/desirable. 3. Developers writing scripts to be shared with non-developers. On Unix, this means "must work with the system Python", and on Windows "download and install Python from python.org" is typically all you can expect (although in that case "install Anaconda" is a possible alternative, although not one I've tried telling people to do myself). Nick's proposal doesn't actually help for (1) or (2), as the problem there is that "pip install" won't work. And bundling a script with its (pure Python) dependencies, for example as a zipapp, is always a solution - although it's nowhere near as easy as simply copying a single-file script to the destination where it's to be run. So these situations don't actually matter in terms of the value of the proposals being discussed here. But I did want to dispute the idea that "scripters can just pip install whatever" is inherent to the idea of being a scripter - my experience is the opposite, that scripters are the people *least* able to simply pip install things. Paul

Writing scripts for non-developers, in an unmanaged environment (IT cant push a python install to the system) on windows means running pyinstaller et. al., on your script, if it has dependencies or not. Its not worth it to walk someone through a python install to run a script, let alone installing optional dependencies. For linux, you are not limited to the standard library, but what the distros have packaged - its easy enough to tell a typical non-developer linux user "apt install python python-twisted, then run the script". For deployment to restricted environments, optional dependencies suggested by python-dev is unlikely to be installed either. Not that this matters, or would matter for years, because IME those environments have ancient versions of python that the developer is restricted to anyways. And environments with limited internet access ... Bandersnatch on a portable hard drive. Not to sound too crass, but there are only so many edge cases that a third party platform developer can be expected to care about (enough to bend over backwards to support). Putting a curated list of "These are good packages that solve a lot of problems that the stdlib doesn't or makes complicated" on python.org is a great idea. Building tooling to make those part of a default installation is not.

On 29 October 2017 at 20:44, Alex Walters <tritium-list@sdamon.com> wrote:
Let's just say "not in the environments I work in", and leave it at that.
I never suggested otherwise. I just wanted to point out that "scripting in Python" covers a wider range of use cases than "can use pip install". I'm not asking python-dev to support those use cases (heck, I'm part of python-dev and *I* don't want to bend over backwards to support them), but I do ask that people be careful not to dismiss a group of users who are commonly under-represented in open source mailing lists and communities. Paul

On 30 October 2017 at 04:56, Guido van Rossum <guido@python.org> wrote:
For personal scripting, we can install whatever, but "institutional scripting" isn't the same thing - there we're scripting predefined "Standard Operating Environments", like those Mahmoud Hashemi describes for PayPal at the start of https://www.paypal-engineering.com/2016/09/07/python-packaging-at-paypal/ "Just use PyInstaller" isn't an adequate answer in large-scale environments because of the "zlib problem": you don't want to have to rebuild and redeploy the world to handle a security update in a low level frequently used component, you want to just update that component and have everything else pick it up dynamically. While "Just use conda" is excellent advice nowadays for any organisation contemplating defining their own bespoke Python SOE (hence Mahmoud's post), if that isn't being driven by the folks that already maintain the SOE (as happened in PayPal's case) convincing an org to add a handful of python-dev endorsed libraries to an established SOE is going to be easier than rebasing their entire Python SOE on conda.
Yep, that was my intent, although it may not have been clear in my initial proposal. I've filed two separate RFEs in relation to that: * Documentation only: https://bugs.python.org/issue31898 * Regression testing resource: https://bugs.python.org/issue31899 Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I just feel that when you're talking about an org like PayPal they can take care of themselves and don't need our help. They will likely have packages they want installed everywhere that would never make in on your list. So it feels this whole discussion is a distraction and a waste of time (yours, too). On Sun, Oct 29, 2017 at 10:24 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 31 October 2017 at 00:28, Guido van Rossum <guido@python.org> wrote:
Just because companies are big doesn't mean they necessarily have anyone internally that's already up to speed on the specifics of recommended practices in a sprawling open source community like Python's. The genesis of this idea is that I think we can offer a more consistent initial experience for those folks than "Here's PyPI and Google, y'all have fun now" (and in so doing, help folks writing books and online tutorials to feel more comfortable with the idea of assuming that libraries like requests will be available in even the most restrictive institutional environments that still allow the use of Python). One specific situation this idea is designed to help with is the one where: - there's a centrally managed Standard Operating Environment that dictates what gets installed - they've approved the python.org installers - they *haven't* approved anything else yet Now, a lot of large orgs simply won't get into that situation in the first place, since their own supplier management rules will push them towards a commercial redistributor, in which case they'll get their chosen redistributor's preferred package set, which will then typically cover at least a few hundred of the most popular PyPI packages. But some of them will start from the narrower "standard library only" baseline, and I spent enough time back at Boeing arguing for libraries to be added to our approved component list to appreciate the benefits of transitive declarations of trust ("we trust supplier X, they unambigously state their trust in supplier Y, so that's an additional point in favour of our also trusting supplier Y") when it comes time to make your case to your supplier management organisation. Such declarations still aren'y always sufficient, but they definitely don't hurt, and they sometimes help. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

What's your proposed process to arrive at the list of recommended packages? And is it really just going to be a list of names, or is there going to be some documentation (about the vetting, not about the contents of the packages) for each name? On Mon, Oct 30, 2017 at 9:08 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On Mon, Oct 30, 2017 at 12:29 PM, Guido van Rossum <guido@python.org> wrote:
As I see it, the bootstrap would be a requirements.txt (requirements.pip) with the packages that more experienced developers prefer to use over stdlib, pinned to package versions known to work well with the CPython release. I think this is a great idea, specially if it's an easy opt-in. For example, sometimes I go into a programming niche for a few years, and I'd very much appreciate a curated list of packages when I move to another niche. A pay-forward kind of thing. The downside to the idea is that the list of packages will need a curator (which could be python.org).

So what's your list? What would you put in requirements.txt? I'm fine with doing it in the form of requirements.txt -- I am worried that when push comes to shove, we won't be able to agree on what list of package names should go into that file. On Mon, Oct 30, 2017 at 12:08 PM, Juancarlo Añez <apalala@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 31 October 2017 at 02:29, Guido van Rossum <guido@python.org> wrote:
What's your proposed process to arrive at the list of recommended packages?
I'm thinking it makes the most sense to treat inclusion in the recommended packages list as a possible outcome of proposals for standard library inclusion, rather than being something we'd provide a way to propose specifically. We'd only use it in cases where a proposal would otherwise meet the criteria for stdlib inclusion, but the logistics of actually doing so don't work for some reason. Running the initial 5 proposals through that filter: * six: a cross-version compatibility layer clearly needs to be outside the standard library * setuptools: we want to update this in line with the PyPA interop specs, not the Python language version * cffi: updates may be needed for PyPA interop specs, Python implementation updates or C language definition updates * requests: updates are more likely to be driven by changes in network protocols and client platform APIs than Python language changes * regex: we don't want two regex engines in the stdlib, transparently replacing _sre would be difficult, and _sre is still good enough for most purposes Of the 5, I'd suggest that regex is the only one that could potentially still make its way into the standard library some day - it would just require someone with both the time and inclination to create a CPython variant that used _regex instead of _sre as the default regex engine, and then gathered evidence to show that it was "compatible enough" with _sre to serve as the default engine for CPython. For the first four, there are compelling arguments that their drivers for new feature additions are such that their release cycles shouldn't ever be tied to the rate at which we update the Python language definition.
I'm thinking a new subsection in https://docs.python.org/devguide/stdlibchanges.html for "Recommended Third Party Packages" would make sense, covering what I wrote above. It also occurred to me that since the recommendations are independent of the Python version, they don't really belong in the version specific documentation. While the Developer's Guide isn't really the right place for the list either (except as an easier way to answer "Why isn't <X> in the standard library?" questions), it could be a good interim option until I get around to actually writing a first draft of https://github.com/python/redistributor-guide/ (which I was talking to Barry about at the dev sprint, but didn't end up actually creating any content for since I went down a signal handling rabbit hole instead). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Oct 31, 2017 at 4:42 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I don't think that gets you off the hook for a process proposal. We need some criteria to explain why a module should be on the recommended list -- not just a ruling as to why it shouldn't be in the stdlib.
But that would exclude most of the modules you mention below, since one of the criteria is that their development speed be matched with Python's release cycle. I think there must be some form of "popularity" combined with "best of breed". In particular I'd like to have a rule that explains why flask and Django would never make the list. (I don't know what that rule is, or I would tell you -- my gut tells me it's something to do with having their own community *and* competing for the same spot.) Running the initial 5 proposals through that filter:
* six: a cross-version compatibility layer clearly needs to be outside the standard library
Hm... Does six still change regularly? If not I think it *would* be a candidate for actual stdlib inclusion. Just like we added u"..." literals to Python 3.4.
* setuptools: we want to update this in line with the PyPA interop specs, not the Python language version
But does that exclude stdlib inclusion? Why would those specs change, and why couldn't they wait for a new Python release?
* cffi: updates may be needed for PyPA interop specs, Python implementation updates or C language definition updates
Hm, again, I don't recall that this was debated -- I think it's a failure that it's not in the stdlib.
* requests: updates are more likely to be driven by changes in network protocols and client platform APIs than Python language changes
Here I agree. There's no alternative (except aiohttp, but that's asyncio-based) and it can't be in the stdlib because it's actively being developed.
I think this needn't be recommended at all. For 99.9% of regular expression uses, re is just fine. Let's just work on a strategy for introducing regex into the stdlib.
As you can tell from my arguing, the reasons need to be written up in more detail.
That's too well hidden for my taste.
But that doesn't mean they can't (also) be listed there. (And each probably has its version dependencies.)
Hm, let's not put more arbitrary check boxes in the way of progress. Maybe it can be an informational PEP that's occasionally updated? -- --Guido van Rossum (python.org/~guido)

On 1 November 2017 at 00:53, Guido van Rossum <guido@python.org> wrote:
The developer guide already has couple of sections on this aspect: * https://devguide.python.org/stdlibchanges/#acceptable-types-of-modules * https://devguide.python.org/stdlibchanges/#requirements I don't think either of those sections is actually quite right (since we've approved new modules that wouldn't meet them), but they're not terrible as a starting point in general, and they're accurate for the recommended packages use case.
The developer guide words this as "The module needs to be considered best-of-breed.". In some categories (like WSGI frameworks), there are inherent trade-offs that mean there will *never* be a single best-of-breed solution, since projects like Django, Flask, and Pyramid occupy deliberately different points in the space of available design decisions. Running the initial 5 proposals through that filter:
It still changes as folks porting new projects discover additional discrepancies between the 2.x and 3.x standard library layouts and behaviour (e.g. we found recently that the 3.x subprocess module's emulation of the old commands module APIs actually bit shifts the status codes relative to the 2.7 versions). The rate of change has definitely slowed down a lot since the early days, but it isn't zero. In addition, the only folks that need it are those that already care about older versions of Python - if you can start with whatever the latest version of Python is, and don't have any reason to support users still running older version, you can happily pretend those older versions don't exist, and hence don't need a compatibility library like six. As a separate library, it can just gracefully fade away as folks stop depending on it as they drop Python 2.7 support. By contrast, if we were to bring it into the 3.x standard library, then we'd eventually have to figure out when we could deprecate and remove it again.
The specs mainly change when we want to offer publishers new capabilities while still maintaining compatibility with older installation clients (and vice-versa: we want folks still running Python 2.7 to be able to publish wheel files and use recently added metadata fields like Description-Content-Type). The reason we can't wait for new Python releases is because when we add such things, we need them to work on *all* supported Python releases (including 2.7 and security-release-only 3.x versions). There are also other drivers for setuptools updates, which include: - operating system build toolchain changes (e.g. finding new versions of Visual Studio or XCode) - changes to PyPI's operations (e.g. the legacy upload API getting turned off due to persistent service stability problems, switching to HTTPS only access) With setuptools as a separate project, a whole lot of package publication problems can be solved via "pip install --upgrade setuptools wheel" in a virtual environment, which is a luxury we don't have with plain distutils.
A couple of years ago I would have agreed with you, but I've spent enough time on packaging problems now to realise that cffi actually qualifies as a build tool due to the way it generates extension module wrappers when used in "out-of-line" mode. Being outside the standard library means that cffi still has significant flexibility to evolve how it separates its buildtime functionality from its runtime functionality, and may eventually be adjusted so that only "_cffi_backend" needs to be installed at runtime for the out-of-line compilation mode, without the full C header parsing and inline extension module compilation capabilities of CFFI itself (see https://cffi.readthedocs.io/en/latest/cdef.html#preparing-and-distributing-m... for the details). Being separate also means that cffi can be updated to generate more efficient code even for existing Python versions.
Given your informational PEP suggestion below, I'd probably still include it, but in a separate section from the others (e.g. the others might be listed as "Misaligned Feature Release Cycles", which is an inherent logistical problem that no amount of coding can fix, while regex would instead be categorised as "Technical Challenges").
If I'm correctly reading that as "Could the list of Recommended Third Party Packages be an informational PEP?", then I agree that's probably a good way to tackle it, since it will cover both the developer-centric "Why isn't this in the standard library yet?" aspect *and* the "Redistributors should probably provide this" aspect. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tuesday, October 31, 2017, Guido van Rossum <guido@python.org> wrote:
What about certifi (SSL bundles (from requests (?)) on PyPi) https://pypi.org/project/certifi/ ?

On Tue, Oct 31, 2017 at 11:41 AM, MRAB <python@mrabarnett.plus.com> wrote:
regex gets updated when the Unicode Consortium releases an update.
Is it a feature that that is more frequently than Python releases? There are other things in Python that must be updated whenever the UC releases an update, and they get treated as features (or perhaps as bugfixes, I'm not sure) but this means they generally don't get backported. -- --Guido van Rossum (python.org/~guido)

On 2017-10-31 18:44, Guido van Rossum wrote:
Here's a list of the updates to Unicode: https://www.unicode.org/versions/enumeratedversions.html Roughly yearly. Those still on Python 2.7, for example, are now 8 years behind re Unicode. At least Python 3.6 is only 1 year/release behind, which is fine!

On Tue, Oct 31, 2017 at 12:24 PM, MRAB <python@mrabarnett.plus.com> wrote:
Frankly I consider that a feature of the 2.7 end-of-life planning. :-)
At least Python 3.6 is only 1 year/release behind, which is fine!
OK, so presumably that argument doesn't preclude inclusion in the 3.7 (or later) stdlib. I'm beginning to warm up to the idea again... Maybe we should just bite the bullet. Nick, what do you think? Is it worth a small PEP? -- --Guido van Rossum (python.org/~guido)

On 1 November 2017 at 05:56, Guido van Rossum <guido@python.org> wrote:
I'm personally still in favour of swapping out the current _sre based implementation for a _regex based implementation (such that 3.7+ still only contained a single regex engine, and the stdlib re module and a PyPI regex backport module could still happily coexist), and a PEP + draft patch would be the way to do that. The framing of a PEP for that approach would be "Replace the regex engine backing the re module" rather than "Add regex to the standard library". The existing engine could then potentially be spun out as a new "sre" project on PyPI, such that folks that *were* depending on _sre internals had access to an upgrade path that didn't necessarily require them porting their code to a new regex engine (the PEP author wouldn't have to commit to doing that work - we'd just ask the PyPI admins to reserve the name in case there proved to be demand for such a project). However, I'm also not one of the folks that would be on the hook for handling any compatibility regressions that were subsequently reported against the 3.7 re module, so I'd also take my +1 with a rather large grain of salt - it's easy to be positive about a plan when the potential downsides don't affect me personally :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Oct 31, 2017 4:42 AM, "Nick Coghlan" <ncoghlan@gmail.com> wrote: On 31 October 2017 at 02:29, Guido van Rossum <guido@python.org> wrote:
What's your proposed process to arrive at the list of recommended packages?
I'm thinking it makes the most sense to treat inclusion in the recommended packages list as a possible outcome of proposals for standard library inclusion, rather than being something we'd provide a way to propose specifically. We'd only use it in cases where a proposal would otherwise meet the criteria for stdlib inclusion, but the logistics of actually doing so don't work for some reason. Running the initial 5 proposals through that filter: * six: a cross-version compatibility layer clearly needs to be outside the standard library * setuptools: we want to update this in line with the PyPA interop specs, not the Python language version * cffi: updates may be needed for PyPA interop specs, Python implementation updates or C language definition updates * requests: updates are more likely to be driven by changes in network protocols and client platform APIs than Python language changes * regex: we don't want two regex engines in the stdlib, transparently replacing _sre would be difficult, and _sre is still good enough for most purposes Some other packages that might meet these criteria, or at least be useful for honing them: - lxml - numpy - cryptography - idna -n

On Wed, Nov 1, 2017 at 7:41 AM, Guido van Rossum <guido@python.org> wrote:
My impression (probably others are more knowledgeable) is that lxml has more or less replaced the stdlib 'xml' package as the de facto standard -- sort of similar to the urllib2/requests situation. AFAIK lxml has never been proposed for stdlib inclusion and I believe the fact that it's all in Cython would be a barrier even if the maintainers were amenable. But it might be helpful to our users to put a box at the top of the 'xml' docs suggesting people check out 'lxml', similar to the one on the urllib2 docs.
- numpy
Numpy's arrays are a foundational data structure and de facto standard, and would probably fit naturally in the stdlib semantically, but for a number of logistical/implementational reasons it doesn't make sense to merge. Probably doesn't make much difference whether python-dev "blesses" it or not in practice, since there aren't any real competitors inside or outside the stdlib; it'd more just be an acknowledgement of the status quo.
- cryptography
Conceptually, core cryptographic operations are the kind of functionality that you might expect to see in the stdlib, but the unique sensitivity of crypto code makes this a bad idea. Historically there have been a variety of basic crypto packages for Python, but at this point IIUC the other ones are all considered obsolete-and-potentially-dangerous and the consensus is everyone should move to 'cryptography', so documenting that in this PEP might help send people in the right direction.
- idna
This is a bit of a funny one. IDNA functionality is pretty fundamental -- you need it to do unicode<->bytes conversions on hostnames, so basically anyone doing networking needs it. Python ships with some built-in IDNA functionality (as the "idna" codec), but it's using an obsolete standard (IDNA2003, versus the current IDNA2008, see bpo-17305), and IIRC Christian thinks the whole codec-based design is the wrong approach... basically what we have in the stdlib has been broken for most of a decade and there doesn't seem to be anyone interested in fixing it. So... in the long run the stdlib support should either be fixed or deprecated. I'm not sure which is better. (The argument for deprecating it would be that IIUC you need to update the tables whenever a new unicode standard comes out, and since it's a networking thing you want to keep in sync with the rest of the world, which is easier with a standalone library. I don't know how much this matters in practice.) But right now, this library is just better than the stdlib functionality, and it wouldn't hurt to document that. -n -- Nathaniel J. Smith -- https://vorpus.org

Good write-ups. Though it sounds like idna should just be moved into the stdlib. (Agreed that for things like this codecs are not a great idea.) We have other things that must be updated whenever the Unicode standard changes and it seems that doing this in feature releases is typically fine, and treating it as a bugfix (so doing it in bugfix releases is acceptable) sounds fine to me too. Unless it were so security-critical that we'd want to enable users to do such updates to less-supported Pythons or on a tighter schedule (which is one of the reasons requests can't go into the stdlib -- they have their own cert store). On Wed, Nov 1, 2017 at 7:36 PM, Nathaniel Smith <njs@pobox.com> wrote:
-- --Guido van Rossum (python.org/~guido)
participants (13)
-
Alex Walters
-
Antoine Pitrou
-
Brendan Barnwell
-
Guido van Rossum
-
Juancarlo Añez
-
Koos Zevenhoven
-
MRAB
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Stefan Krah
-
Stephan Houben
-
Wes Turner