On Fri, Jan 17, 2014 at 4:29 PM, Donald Stufft <donald@stufft.io> wrote:

On Jan 17, 2014, at 7:09 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On 18 Jan 2014 04:45, "Hannes Schmidt" <hannes@ucsc.edu> wrote:
>
> Thank you for your suggestions, Donald. They are both feasible but they feel like workarounds to me.
>
> Manually listing transitive dependencies seems like a step backwards to me. Isn't the whole point of setuptools/distutils that each component manages its own list of dependencies? How will this scale to dozens of transitive dependencies and busy development teams where dependencies are revved frequently. I really liked what Maven did for Java and therefore liked what pip is trying to do for Python.

I’ve never used Maven, but doesn’t it require you to run a Maven repository or otherwise point it at each individual location if they aren’t in a Maven repo?

Yes. But Java is a compiled language so you *need* to host the compiled artifacts somewhere. Python's interpreted nature is what makes the incredible convenience of installing from sources at hg:// and git:// URLs possible.

Hannes, this is the role we consider to be best served by running a private index server and using the full dependency resolution mechanism. You may want to review PEP 426 to see the intended mode of operation.

However, the question at this point is whether there is a legitimate model of use where dependency links (or an equivalent) are used to create a more concrete dependency tree in a closed environment where the software developer/publisher and integrator/user are the *same* organisation.

PEP 426 focuses on the case where those are different organisations (since that case poses the greatest security risk, allowing publishers to compromise the installations of integrators and users, beyond just running the software they provide). While it does include the "meta requirements" field to handle cases of (effective) dependency bundling rather than normal versioning, that model is deliberately more restrictive than dependency links.

It seems to me that overall, it's much, much safer to simply avoid supporting this degree of transitive trust directly in the core installation toolchain. A more appropriate solution to handling the trusted internal distribution use case may be a dependency scanner that processes the dependency links metadata to generate a suitable requirements.txt file. From pip's perspective, trusting that generated file is the same as trusting any other explicitly specified set of packages, while it also provides an opportunity for the resolved dependency links to be audited before trusting them.

Unfortunately, we can't actually do that easily in a secure fashion, because obtaining the metadata currently requires executing an untrusted setup.py. So trusting dependency links "just" to generate a requirements.txt file is almost as bad as trusting them for installing software. The main advantage is that you can always generate the dependency list as an unprivileged user or in a sandbox, even if operating in an untrusted environment. Installation, by contrast, is often going to involve elevated privileges.

Regardless, perhaps there's a case to be made that such a requirements.txt generator, with opt-in dependency links support, should remain available, with the feature being removed entirely only for actual package installation.

That approach is still far more secure by default than the status quo, strongly encourages the use of a private index to better align with the upstream distribution model, but restores support for users like Hannes that are currently relying on dependency links as a dependency resolution solution that doesn't require additional server components in a trusted environment.

> I haven't used --find-links yet so I may be wrong but tarring up packages and serving them by a web server is additional work that I'd rather avoid. It just creates yet another copy of the package and requires maintaining an additional server component.
>
> I loved the fact that I could point pip directly at my source repos for both my top-level projects as well as their dependencies. That seemed like a perfect fit for an interpreted language like Python where packages are distributed in source form.

In a trusted environment, it can be, but on the internet, it is intolerably insecure.
However, making it easy to generate a transitive requirements.txt for a trusted environment may be a usage model we can support without leaving upstream distribution and usage open to compromise.

Regards,
Nick.

Let me just be explicit that I totally get the fact that this is breaking your workflow. Nobody *ever* likes being told that the way they are doing something is no longer going to be supported. Processing dependency links by default is a security risk however in your constrained use case it’s probably Ok, but only because you have a tightly constrained use case.

One thing that pip *could* do is leave the —process-dependency-links flag in place instead of removing it as per the deprecation cycle, however I think that ultimately this will be a disservice to users. The new formats that we’re designing and implementing are not going to support things like dependency links so while this would allow things to continue for awhile as we further improve the system it will likely be increasingly difficult to maintain the functionality of dependency links.

So again I’m really really sorry that this is affecting you, but I can’t see a good way to keep this concept around that doesn’t hurt the security for the other use case.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Hannes Schmidt
Software Application Developer
Data Migration Engineer
Cancer Genomics Hub
University of California, Santa Cruz

(206) 696-2316 (cell)
hannes@ucsc.edu