On Wed, Feb 24, 2021 at 12:42 PM Paul Moore <p.f.moore@gmail.com> wrote:
On Wed, 24 Feb 2021 at 10:55, Stéfane Fermigier <sf@fermigier.com> wrote:
> There is probably a clever way to reuse common packages (probably via clever symlinking) and reduce the footprint of these installations.

Ultimately the problem is that a general tool can't deal with
conflicts (except by raising an error). If application A depends on
lib==1.0 and application B depends on lib==2.0, you simply can't have
a (consistent) environment that supports both A and B. But that's the
rare case - 99% of the time, there are no conflicts. One env per app
is a safe, but heavy handed, approach. Managing environments manually
isn't exactly *hard*, but it's annoying manual work that pipx does an
excellent job of automating, so it's a disk space vs admin time
trade-off.

There are three ways to approach the question:

1) Fully isolated envs. The safest option but uses the most space.

2) Try to minimise the number of dependencies installed by interpreting the requirements specification in the looser way possible. This is both algorithmically hard (see https://hal.archives-ouvertes.fr/hal-00149566/document for instance, or the more recent https://hal.archives-ouvertes.fr/hal-03005932/document ) and risky, as you've noted.

3) But the best way IMHO is to compute dependencies for each virtualenv independently from the others, but still share the packages, using some indirection mechanisms (hard links, symlinks or some Python-specific constructs) when the versions match exactly.

The 3rd solution is probably the best of the 3, but the sharing mechanism still needs to be specified (and, if needed, implemented) properly.

I've tried Christian's suggestions of using rdfind on my pipx installation, and it claims to reduce the footprint by 30% (nice, but less than I expected. This would however scale better with the number of installed packages).

I'm not sure this would be practical in reality, OTOH, because I think there is a serious risk of breakage each time I would upgrade one of the packages (via 'pipx upgrade-all' for instance).

So IMHO the best way to implement solution 3 would be by using some variant of the approach popularized by Nix (repository of immutable packages + links to each virtualenv).

  S.

--
Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier
Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/
Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/
Founder & Organiser, PyParis & PyData Paris - http://pyparis.org/http://pydata.fr/