On Wednesday, May 23, 2018, Michael Sarahan <msarahan@anaconda.com> wrote:


On Wed, May 23, 2018 at 3:45 PM, Wes Turner <wes.turner@gmail.com> wrote:


On Wednesday, May 23, 2018, Michael Sarahan <msarahan@anaconda.com> wrote:
Thanks for starting this discussion, Victor!  This topic is something we're very interested in at Anaconda.

I'd like to generalize the problem statement to the question of "how can we make pip behave well when it is sharing package management with something else?  Similarly, how can we make the something else behave well with pip?"

We all share the pain of trying to have two package managers effectively manage the same space.  The alternate-folder-for-pip approach is a good idea, but ultimately has issues that you pointed out.

After several great conversations with many people at PyCon last week, I came to the conclusion that conda and pip probably won't ever interoperate very well.  In order for them to do so, conda must respect all of pip's constraints during its solving of dependencies, and likewise, pip must respect conda's constraints.  We are investigating the first of those options and having some promising initial results, but the inverse is not something that seems feasible.  It amounts to pip having a solver that is at least as good as the best supported package manager, and pip learning about *all* of any other package managers that it claims to be compatible with.  That doesn't seem like a viable project.
What's the status on this? AFAIU, depsolving setup.py packages is blocked because it's necessary to execute setup.py to get the conditional requirements?

Yes, executing setup.py is a blocker.  Wheels are improving this.  The main complicating factor is conditional or optional dependencies, as you say, and how to express or execute the branch logic required.  setup.py need not be executed all the time - only enough to gather the metadata to feed an index.  It's much more of a security concern than a time concern.  A first rough approach that didn't handle optional or conditional requirements seems reasonable as a proof of concept.
 


Ultimately, I believe a better approach is for the PyPA to define a minimal set of functionality and interfaces to PyPI that any package manager claiming to manage python packages must implement.  Pip can be a reference implementation of that specification.  Any distributor (Red Hat, Canonical, Homebrew, Anaconda) could then have their own implementations that use their solvers, but also can install software from PyPI at user's request, or as a fall-through when a native package is unavailable.
 Metadata compatibility and adapter registration could help solve for this; though there's no money and not much demand. #PEP426JSONLD

With adapter registration, what would the adapters be?  I'm just not sure what pip's role should be.  If you define interfaces for solving and installing/uninstalling/upgrading the solved-for set of packages, maybe that's enough? 
 
Plugins/adapters/interface implementations.
http://zopeinterface.readthedocs.io/en/latest/adapter.html

Perhaps ironically and conveniently, the well-regarded pluggy system for plugins does not depend on setuptools entry_points:
https://pluggy.readthedocs.io/en/latest/

Some way to avoid adding plugin/adapter registration overhead in site.py would be necessary.


 #PEP426JSONLD looks great, but do you see that as a glue between PyPI and pip, pip and other tools, or other tools and PyPI?  I am impressed and encouraged by your research into the topic, and I'd like to know if there's a way that we can help with it if it would further our goals of having conda be able to either interop with pip happily or install directly from PyPI.

I piggybacked onto PEP426 because if we were going to substantially change metadata, we might as well make it linked data; ideally with a cross-language spec that would unfortunately take years to get compliance from EVERY vendor for/with.

Metadata interoperability wouldn't be strictly necessary to achieve what I think you're describing; but otherwise there'd be such a disjoint dependency graph that determining what we've installed here and where we need to be would be frustratingly complex.
 
 

User interface could be unified by having "pip" on distributions be a wrapper around the native package manager, matching the exact minimal behavior of pip.
Would `sudo pip` then be the only way? 

Heavens, no. I'm only concerned about the myriad blog posts out there that tell people to [sudo] pip install something.  That is the user interface that exists and is most common.  It is the universal command that might not be the right option, but by golly it's always an option.  There is great value in having only one way to tell people to do things.  I'm proposing that we make the underlying implementation of that user interface be vendor-specific.  Vendors can and should keep their own interfaces to package management, too, because those are broader in scope than pip.

I'm not sure whether it'd be easier to debug interleaved calls to various package managers. Or to explain why `pip install` didn't work on my machine.

Though I have often wondered whether I need to do `conda skeleton` PyPI (or fork the conda-forge template), and then manually merge version changes stably.

There would need to be a map between pypi package name, conda package names, (os, ver) package names; which I think I addressed in that smattering of notes on the PEP426JSONLD issue.
That catalog-to-catalog mapping data would need to be hosted somewhere. Warehouse can easily serve JSON if it's defined in the package.

Maybe a blockchain with per-project TUF signing keys, package checksums/signatures, and VCS GPG keyrings could also host package metadata and package name mappings someday.

 

The same kind of approach may also be good for virtual environments, but it seems like there's less contention there.  Conda is different enough from virtualenv that we get some friction, but I think and hope we can smooth that out over time.
`conda install pip` and `conda export -f environment.yml` seem to work okay (for Python, R, nodejs but not yet npm,)?

Solving dependencies in a container is still the correct way to avoid cruft, IMHO.

I agree, but sadly I can't force that on users.  We (Anaconda and other distribution managers) still have to support people who are happily shooting themselves in the foot with "sudo pip" even when sudo is in no way necessary for a normal conda installation.  Until we (the Python community) solve the package problems we have, we'll lose mindshare to less useful (IMHO) tools that are easier to manage packages with.

A warning would be good. Is checking for `uid=0` sufficient?

IDK what sort of timeline would be needed to requiring a CLI flag to bypass an error message when running pip as root.

Containers often do this without specifying `USER nonroot` first in their Dockerfiles. And then setting read-only or other-user permissions does require root and so is better handled by actual OS package managers like FPM, IMHO.
 
 


Best, 
Michael

On Wednesday, May 23, 2018, Victor Stinner <vstinner@redhat.com> wrote:
> Hi,
>
> pip is currently not well integrated on Linux: it conflicts  with the
> system package manager like apt or rpm. When pip writes files  into
> /usr, it can replace files written by the system package manager  and
> so create different kind of issues. For example, if you check the
> system integry, you will likely see that some Python files have been
> modified.
>
> I would like to open a discussion to see how each Linux vendor handles
> the issue, and see if a common solution can be designed.
>
> Debian uses /usr for apt-get install and /usr/local for distutils and
> "sudo pip".
>
> Fedora  decided to change pip to install files into /usr/local by
> default,  instead of /usr, so "sudo pip install" doesn't replace files
> installed  by dnf (Fedora package manager):
> https://fedoraproject.org/wiki/Changes/Making_sudo_pip_safe
>
> It  gives you 3 main places to install Python code: /usr (managed by
> dnf),  /usr/local (managed by sudo pip), $HOME/.local (managed by pip
> --user).
>
> Would it make sense to make the Fedora/Debian change upstream? At
> least, give an opt-in option for Linux vendors to use /usr/local?
>
> I  propose to make the change upstream because there are still issues,
> and  I don't want to be alone to have to fix them :-) It should be
> easier if  we agree on a filesystem layout and an implementation, so
> we can  collaborate on issues!
>
>
> Issues with the current Fedora implementation:
>
> (1)  When Python is embedded in an application, there is an issue with
> the  current heuristic to decide if /usr/local should be added to
> sys.path:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1532287
>
> (2)  On Fedora, "sudo pip install -U" currently removes old code from
> /usr  and install the new one in /usr/local. We should leave /usr
> unchanged,  since only dnf should touch /usr.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1550368#c24
>
> The implementation is made of a single patch on the Python site module:
>
> https://src.fedoraproject.org/rpms/python3/blob/master/f/00251-change-user-install-location.patch
>
> --
>
> There are two issues related to the "sudo pip" change, but they
> already exist when pip is installed in $HOME/.local:
>
> (3) Priority issue between PATH and PYTHONPATH directories.
>
> When  the user runs "pip", the pip binary may come from /usr,
> /usr/local or  $HOME/.local/bin, but the Python pip module ("import
> pip") may come from  a different path. Which binary and which module
> should be used?
>
> Obvisouly, users can replace these two environment variables...
>
> (4)  Related to (3). Running "pip" may run pip binary of one pip
> version,  but pick the "pip" Python module of another pip version.
>
> For example, pip9 binary from /usr/bin/pip, but pip10 module from /usr/local.
>
>
> Fedora works around issue (4) with a downstream patch on pip:
>
> https://src.fedoraproject.org/rpms/python-pip/blob/master/f/pip9-allow-pip10-import.patch
>
> --
>
> I  don't well well how Linux distributions handle the issue with "sudo
>  pip". So don't hesitate to correct me if I'm wrong :-) My goal is
> just  to start a discussion about a common "upstream" solution.
>
> Victor
> --
> Distutils-SIG mailing list
> distutils-sig@python.org
> https://mail.python.org/mm3/mailman3/lists/distutils-sig.python.org/
> Message archived at https://mail.python.org/mm3/archives/list/distutils-sig@python.org/message/OLGLHTSHLEPLHUTTVNU6L5QFTMNFIB6Z/
>