Better support for consuming vendored packages
I'd like to start a discussion around practices for vendoring package dependencies. I'm not sure python-dev is the appropriate venue for this discussion. If not, please point me to one and I'll gladly take it there. I'll start with a problem statement. Not all consumers of Python packages wish to consume Python packages in the common `pip install <package>` + `import <package>` manner. Some Python applications may wish to vendor Python package dependencies such that known compatible versions are always available. For example, a Python application targeting a general audience may not wish to expose the existence of Python nor want its users to be concerned about Python packaging. This is good for the application because it reduces complexity and the surface area of things that can go wrong. But at the same time, Python applications need to be aware that the Python environment may contain more than just the Python standard library and whatever Python packages are provided by that application. If using the system Python executable, other system packages may have installed Python packages in the system site-packages and those packages would be visible to your application. A user could `pip install` a package and that would be in the Python environment used by your application. In short, unless your application distributes its own copy of Python, all bets are off with regards to what packages are installed. (And even then advanced users could muck with the bundled Python, but let's ignore that edge case.) In short, `import X` is often the wild west. For applications that want to "just work" without requiring end users to manage Python packages, `import X` is dangerous because `X` could come from anywhere and be anything - possibly even a separate code base providing the same package name! Since Python applications may not want to burden users with Python packaging, they may vendor Python package dependencies such that a known compatible version is always available. In most cases, a Python application can insert itself into `sys.path` to ensure its copies of packages are picked up first. This works a lot of the time. But the strategy can fall apart. Some Python applications support loading plugins or extensions. When user-provided code can be executed, that code could have dependencies on additional Python packages. Or that custom code could perform `sys.path` modifications to provide its own package dependencies. What this means is that `import X` from the perspective of the main application becomes dangerous again. You want to pick up the packages that you provided. But you just aren't sure that those packages will actually be picked up. And to complicate matters even more, an extension may wish to use a *different* version of a package from what you distribute. e.g. they may want to adopt the latest version that you haven't ported to you or they may want to use an old versions because they haven't ported yet. So now you have the requirements that multiple versions of packages be available. In Python's shared module namespace, that means having separate package names. A partial solution to this quagmire is using relative - not absolute - imports. e.g. say you have a package named "knights." It has a dependency on a 3rd party package named "shrubbery." Let's assume you distribute your application with a copy of "shrubbery" which is installed at some packages root, alongside "knights:" / /knights/__init__.py /knights/ni.py /shrubbery/__init__.py If from `knights.ni` you `import shrubbery`, you /could/ get the copy of "shrubbery" distributed by your application. Or you could pick up some other random copy that is also installed somewhere in `sys.path`. Whereas if you vendor "shrubbery" into your package. e.g. / /knights/__init__.py /knights/ni.py /knights/vendored/__init__.py /knights/vendored/shrubbery/__init__.py Then from `knights.ni` you `from .vendored import shrubbery`, you are *guaranteed* to get your local copy of the "shrubbery" package. This reliable behavior is highly desired by Python applications. But there are problems. What we've done is effectively rename the "shrubbery" package to "knights.vendored.shrubbery." If a module inside that package attempts an `import shrubbery.x`, this could fail because "shrubbery" is no longer the package name. Or worse, it could pick up a separate copy of "shrubbery" somewhere else in `sys.path` and you could have a Frankenstein package pulling its code from multiple installs. So for this to work, all package-local imports must be using relative imports. e.g. `from . import x`. The takeaway is that packages using relative imports for their own modules are much more flexible and therefore friendly to downstream consumers that may wish to vendor them under different names. Packages using relative imports can be dropped in and used, often without source modifications. This is a big deal, as downstream consumers don't want to be modifying/forking packages they don't maintain. Because of the advantages of relative imports, *I've individually reached the conclusion that relative imports within packages should be considered a best practice.* I would encourage the Python community to discuss adopting that practice more formally (perhaps as a PEP or something). But package-local relative imports aren't a cure-all. There is a major problem with nested dependencies. e.g. if "shrubbery" depends on the "herring" package. There's no reasonable way of telling "shrubbery" that "herring" is actually provided by "knights.vendored." You might be tempted to convert non package-local imports to relative. e.g. `from .. import herring`. But the importer doesn't allow relative imports outside the current top-level package and this would break classic installs where "shrubbery" and "herring" are proper top-level packages and not sub-packages in e.g. a "vendored" sub-package. For cases where this occurs, the easiest recourse today is to rewrite imported source code to use relative imports. That's annoying, but it works. In summary, some Python applications may want to vendor and distribute Python package dependencies. Reliance on absolute imports is dangerous because the global Python environment is effectively undefined from the perspective of the application. The safest thing to do is use relative imports from within the application. But because many packages don't use relative imports themselves, vendoring a package can require rewriting source code so imports are relative. And even if relative imports are used within that package, relative imports can't be used for other top-level packages. So source code rewriting is required to handle these. If you vendor your Python package dependencies, your world often consists of a lot of pain. It's better to absorb that pain than inflict it on the end-users of your application (who shouldn't need to care about Python packaging). But this is a pain that Python application developers must deal with. And I feel that pain undermines the health of the Python ecosystem because it makes Python a less attractive platform for standalone applications. I would very much welcome a discussion and any ideas on improving the Python package dependency problem for standalone Python applications. I think encouraging the use of relative imports within packages is a solid first step. But it obviously isn't a complete solution. Gregory
Hi!
On Thu, Mar 22, 2018 at 09:58:07AM -0700, Gregory Szorc
Not all consumers of Python packages wish to consume Python packages in the common `pip install <package>`
IMO `pip` is for developers. To package and distribute end-user applications there are rpm, dpkg/deb, PyInstaller, cx_Freeze, py2exe (+ installer like NSIS or InnoSetup), py2app, etc... Most of them pack a copy of Python interpreter and necessary parts of stdlib, so there is no problem with `sys.path` and wrong imports.
Gregory
Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
On 3/22/2018 10:48 AM, Oleg Broytman wrote:
Hi!
On Thu, Mar 22, 2018 at 09:58:07AM -0700, Gregory Szorc
wrote: Not all consumers of Python packages wish to consume Python packages in the common `pip install <package>`
IMO `pip` is for developers. To package and distribute end-user applications there are rpm, dpkg/deb, PyInstaller, cx_Freeze, py2exe (+ installer like NSIS or InnoSetup), py2app, etc...
Most of them pack a copy of Python interpreter and necessary parts of stdlib, so there is no problem with `sys.path` and wrong imports.
Yes, there are tools to create standalone packages. Some even bundle a Python install so the execution environment is more deterministic. These are great ways to distribute Python applications! However, if you are a Python application that is maintained by your distribution's package manager, you pretty much must use a Python installed by the system package manager. And that leaves us with the original problem of an undefined execution environment. So packaging tools for standalone Python applications only work if you control the package distribution channel. If you are a successful Python application that is packaged by your distro, you lose the ability to control your own destiny and must confront these problems for users who installed your application through their distro's package manager. i.e. the cost of success for your Python application is a lot of pain inflicted by policies of downstream packagers. Also, not vendoring dependencies puts the onus on downstream packagers to deal with those dependencies. There can be package version conflicts between various packaged Python applications ("dependency hell"). Vendoring dependencies under application-local package names removes the potential for version conflicts. It's worth noting that some downstream packagers do insist on unbundling dependencies. So they may get stuck with work regardless. But if you vendor dependencies, a downstream packager is at least capable of packaging a Python application without having to deal with "dependency hell." Maybe what I'm asking for here is import machinery where an application can forcefully limit or influence import mechanisms for modules in a certain package. But this seems difficult to achieve given the constraint of a single, global modules namespace (`sys.modules`) per interpreter.
On Mar 22, 2018, at 09:58, Gregory Szorc
Not all consumers of Python packages wish to consume Python packages in the common `pip install <package>` + `import <package>` manner. Some Python applications may wish to vendor Python package dependencies such that known compatible versions are always available.
It’s important to understand how painful vendoring is to some downstream consumers. Debian is a good example. There we often have to go through a lot of hoops to unvendor packages, both for policy and for good distribution practices. The classic example is a security vulnerability in a library. It’s the distro’s responsibility to fix that, but in the face of vendored dependencies, you can’t just patch the system package. Now you also have to hunt down all the vendored versions and figure out if *they’re* vulnerable, etc. It certainly doesn’t help that you can easily have vendored libraries vendoring their own dependencies. I think I found one application in Debian once that had like 4 or 5 versions of urllib3 inside it! You mention dependency hell for downstream consumers like a Linux distro, but this type of integration work is exactly the job of a distro. They have to weigh the health and security of all the applications and libraries they support, so it doesn’t bother me that it’s sometimes challenging to work out the right versions of library dependencies. It bothers me a lot that I have to (sometimes heavily) modify packages to devendorize dependencies, especially because it’s not always clearly evident that that has happened. That said, I completely understand the desire for application and library authors to pin their dependency versions. We’ve had some discussions in the past (not really leading anywhere) about how to satisfy both communities. I definitely don’t go so far as to discourage global imports, and I definitely don’t like vendoring all your dependencies. For applications distributed outside of a distro, there are lots of options, from zip apps (e.g. pex) to frozen binaries, etc. Developers are mostly going to use pip, and maybe a requirements.txt, so I think that use case is well covered. Downstream consumers need to be able to easily devendorize, but I think ultimately, the need to vendorize just points to something more fundamental missing from Python’s distribution and import system. Cheers, -Barry
On Thu, Mar 22, 2018 at 12:30:02PM -0700, Barry Warsaw
Developers are mostly going to use pip, and maybe a requirements.txt,
+virtual envs to avoid problems with global site-packages. IMO virtualenv for development and frozen app for distribution solve the problem much better than vendoring.
Cheers, -Barry
Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
On Mar 22, 2018, at 12:33, Oleg Broytman
On Thu, Mar 22, 2018 at 12:30:02PM -0700, Barry Warsaw
wrote: Developers are mostly going to use pip, and maybe a requirements.txt,
+virtual envs to avoid problems with global site-packages.
Yep, that was implied but of course it’s better to be explicit. :)
IMO virtualenv for development and frozen app for distribution solve the problem much better than vendoring.
+1 -Barry
FWIW, this is a topic I was planning to bring up at the language summit this year, so for those who are going to be there and want to toss around ideas (mine is nearly developed enough to present, but not quite yet), bring them.
That said, I don’t think relying on relative imports within a package should be at all controversial, but perhaps it needs an official endorsement somehow? PEP 8 is what people read to find these, but I don’t know if it makes sense for the stdlib (maybe it could deal with some of the shadowing issues people run into? If they manage to import the top level module before their own appears ahead of it on sys.path... thinking out loud here).
Cheers,
Steve
Top-posted from my Windows phone
From: Barry Warsaw
Sent: Thursday, March 22, 2018 12:56
To: Python-Dev
Subject: Re: [Python-Dev] Better support for consuming vendored packages
On Mar 22, 2018, at 12:33, Oleg Broytman
On Thu, Mar 22, 2018 at 12:30:02PM -0700, Barry Warsaw
wrote: Developers are mostly going to use pip, and maybe a requirements.txt,
+virtual envs to avoid problems with global site-packages.
Yep, that was implied but of course it’s better to be explicit. :)
IMO virtualenv for development and frozen app for distribution solve the problem much better than vendoring.
+1 -Barry
On 23 March 2018 at 02:58, Gregory Szorc
I'd like to start a discussion around practices for vendoring package dependencies. I'm not sure python-dev is the appropriate venue for this discussion. If not, please point me to one and I'll gladly take it there.
Since you mainly seem interested in the import side of things (rather than the initial vendoring process), python-ideas is probably the most suitable location (we're not at the stage of a concrete design proposal that would be appropriate for python-dev, and this doesn't get far enough into import system arcana to really need to be an import-sig discussion rather than a python-ideas one).
What we've done is effectively rename the "shrubbery" package to "knights.vendored.shrubbery." If a module inside that package attempts an `import shrubbery.x`, this could fail because "shrubbery" is no longer the package name. Or worse, it could pick up a separate copy of "shrubbery" somewhere else in `sys.path` and you could have a Frankenstein package pulling its code from multiple installs. So for this to work, all package-local imports must be using relative imports. e.g. `from . import x`.
If it's the main application doing the vendoring, then the following kind of snippet can be helpful: from knights.vendored import shrubbery import sys sys.path["shrubbery"] = shrubbery So doing that kind of aliasing on a process-wide basis is already possible, as long as you have a point where you can inject the alias (and by combining it with a lazy importer, you can defer the actual import until someone actually uses the module). Limiting aliasing to a particular set of modules *doing* imports would be much harder though, since we don't pass that information along (although context variables would potentially give us a way to make it available without having to redefine all the protocol APIs) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 24 March 2018 at 19:29, Nick Coghlan
On 23 March 2018 at 02:58, Gregory Szorc
wrote: I'd like to start a discussion around practices for vendoring package dependencies. I'm not sure python-dev is the appropriate venue for this discussion. If not, please point me to one and I'll gladly take it there.
Since you mainly seem interested in the import side of things (rather than the initial vendoring process), python-ideas is probably the most suitable location (we're not at the stage of a concrete design proposal that would be appropriate for python-dev, and this doesn't get far enough into import system arcana to really need to be an import-sig discussion rather than a python-ideas one).
What we've done is effectively rename the "shrubbery" package to "knights.vendored.shrubbery." If a module inside that package attempts an `import shrubbery.x`, this could fail because "shrubbery" is no longer the package name. Or worse, it could pick up a separate copy of "shrubbery" somewhere else in `sys.path` and you could have a Frankenstein package pulling its code from multiple installs. So for this to work, all package-local imports must be using relative imports. e.g. `from . import x`.
If it's the main application doing the vendoring, then the following kind of snippet can be helpful:
from knights.vendored import shrubbery import sys sys.path["shrubbery"] = shrubbery
Oops, s/path/modules/ :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, Mar 24, 2018 at 9:29 AM, Nick Coghlan
On 23 March 2018 at 02:58, Gregory Szorc
wrote: I'd like to start a discussion around practices for vendoring package dependencies. I'm not sure python-dev is the appropriate venue for this discussion. If not, please point me to one and I'll gladly take it there.
[...]
If it's the main application doing the vendoring, then the following kind of snippet can be helpful:
from knights.vendored import shrubbery import sys sys.path["shrubbery"] = shrubbery
I suspect you meant
sys.modules["shrubbery"] = shrubbery
participants (6)
-
Barry Warsaw
-
Gregory Szorc
-
Nick Coghlan
-
Oleg Broytman
-
Steve Dower
-
Steve Holden