Mailman 3 Please do not remove dependency_links - Distutils-SIG

newer
[ANN] pypiserver 1.1.5 - minimal...

Please do not remove dependency_links

Hannes Schmidt

Jan. 17, 2014

2:52 a.m.

I read through the "Removing dependency_links" thread [1] and I beg you not follow through with the deprecation and removal of dependency_links and to rethink your approach. The mentioned thread indicates that research was done to gauge the popularity of the dependency_links in publicly hosted Python projects. That approach is fundamentally flawed: Publicly hosted projects are much more likely to also be available on PyPI than private, closed-source projects. Consequently, their dependencies are also more likely to be hosted on PyPI as well. Because of that, they are much less likely to rely on the dependency_links feature. Another misconception seem to be that dependency_links is predominantly used for installing patched or customized versions of dependencies hosted on PyPI. I'm pretty sure the predominant use case for dependency_links is with projects that are hosted privately, e.g. for an organization's internal use. I represent such an organization and removing dependency_links would impact us negatively. We host a set of internal projects and their dependencies on Bitbucket and we rely on dependency_links to install them directly from there. I understand the motivation for this change – security – but there must be smarter way to handle it. Could we fallback to dependency_links if a PyPI lookup isn't successful? Could we restrict dependency_links to links that share a prefix with the link from which the package is currently being installed? A combination of the two? [1]: https://mail.python.org/pipermail/distutils-sig/2013-October/022937.html -- Hannes Schmidt Software Application Developer Data Migration Engineer Cancer Genomics Hub University of California, Santa Cruz (206) 696-2316 (cell) hannes@ucsc.edu

Attachments:

attachment.html (text/html — 2.1 KB)

Show replies by date

Donald Stufft

January 2014

12:44 p.m.

On Jan 16, 2014, at 9:52 PM, Hannes Schmidt <hannes@ucsc.edu> wrote:

...

I read through the "Removing dependency_links" thread [1] and I beg you not follow through with the deprecation and removal of dependency_links and to rethink your approach.

The mentioned thread indicates that research was done to gauge the popularity of the dependency_links in publicly hosted Python projects. That approach is fundamentally flawed: Publicly hosted projects are much more likely to also be available on PyPI than private, closed-source projects. Consequently, their dependencies are also more likely to be hosted on PyPI as well. Because of that, they are much less likely to rely on the dependency_links feature.

Another misconception seem to be that dependency_links is predominantly used for installing patched or customized versions of dependencies hosted on PyPI. I'm pretty sure the predominant use case for dependency_links is with projects that are hosted privately, e.g. for an organization's internal use. I represent such an organization and removing dependency_links would impact us negatively. We host a set of internal projects and their dependencies on Bitbucket and we rely on dependency_links to install them directly from there.

I understand the motivation for this change – security – but there must be smarter way to handle it. Could we fallback to dependency_links if a PyPI lookup isn't successful? Could we restrict dependency_links to links that share a prefix with the link from which the package is currently being installed? A combination of the two?

[1]: https://mail.python.org/pipermail/distutils-sig/2013-October/022937.html

Pip 1.5 has already been released which turns dependency links off by default and adds a flag —process-dependency-links and they are scheduled to be removed in 1.6. It’s possible we can “undeprecate” them and leave them off by default however my question to you would be why you can’t use a requirements.txt file to handle installing things from bitbucket? Fundamentally a setup.py should not contain “concrete” locations but instead should contain abstract names/version specifiers whereas a requirements.txt (or the pip invocation itself) takes those abstract names and makes them concrete by pairing them with an index. So to be more explicit, is there any reason why one of: 1) Create a requirements.txt and point to bitbucket in that and install your environment using ``pip install -r requirements.txt`` 2) Create tar balls of your packages and host them in a directory somewhere behind nginx and use —find-links https://example.com/that-directory/ as your install location. Won’t work just as fine for you? ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Hannes Schmidt

6:45 p.m.

Thank you for your suggestions, Donald. They are both feasible but they feel like workarounds to me. Manually listing transitive dependencies seems like a step backwards to me. Isn't the whole point of setuptools/distutils that each component manages its own list of dependencies? How will this scale to dozens of transitive dependencies and busy development teams where dependencies are revved frequently. I really liked what Maven did for Java and therefore liked what pip is trying to do for Python. I haven't used --find-links yet so I may be wrong but tarring up packages and serving them by a web server is additional work that I'd rather avoid. It just creates yet another copy of the package and requires maintaining an additional server component. I loved the fact that I could point pip directly at my source repos for both my top-level projects as well as their dependencies. That seemed like a perfect fit for an interpreted language like Python where packages are distributed in source form. Removing dependency_links makes that feature useless to me because it doesn't extend to dependencies, only top level components. On Fri, Jan 17, 2014 at 4:44 AM, Donald Stufft <donald@stufft.io> wrote:

...

On Jan 16, 2014, at 9:52 PM, Hannes Schmidt <hannes@ucsc.edu> wrote:

I read through the "Removing dependency_links" thread [1] and I beg you not follow through with the deprecation and removal of dependency_links and to rethink your approach.

The mentioned thread indicates that research was done to gauge the popularity of the dependency_links in publicly hosted Python projects. That approach is fundamentally flawed: Publicly hosted projects are much more likely to also be available on PyPI than private, closed-source projects. Consequently, their dependencies are also more likely to be hosted on PyPI as well. Because of that, they are much less likely to rely on the dependency_links feature.

Another misconception seem to be that dependency_links is predominantly used for installing patched or customized versions of dependencies hosted on PyPI. I'm pretty sure the predominant use case for dependency_links is with projects that are hosted privately, e.g. for an organization's internal use. I represent such an organization and removing dependency_links would impact us negatively. We host a set of internal projects and their dependencies on Bitbucket and we rely on dependency_links to install them directly from there.

I understand the motivation for this change – security – but there must be smarter way to handle it. Could we fallback to dependency_links if a PyPI lookup isn't successful? Could we restrict dependency_links to links that share a prefix with the link from which the package is currently being installed? A combination of the two?

[1]: https://mail.python.org/pipermail/distutils-sig/2013-October/022937.html

Pip 1.5 has already been released which turns dependency links off by default and adds a flag —process-dependency-links and they are scheduled to be removed in 1.6. It’s possible we can “undeprecate” them and leave them off by default however my question to you would be why you can’t use a requirements.txt file to handle installing things from bitbucket?

Fundamentally a setup.py should not contain “concrete” locations but instead should contain abstract names/version specifiers whereas a requirements.txt (or the pip invocation itself) takes those abstract names and makes them concrete by pairing them with an index.

So to be more explicit, is there any reason why one of:

1) Create a requirements.txt and point to bitbucket in that and install your environment using ``pip install -r requirements.txt`` 2) Create tar balls of your packages and host them in a directory somewhere behind nginx and use —find-links https://example.com/that-directory/as your install location.

Won’t work just as fine for you?

----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-- Hannes Schmidt Software Application Developer Data Migration Engineer Cancer Genomics Hub University of California, Santa Cruz (206) 696-2316 (cell) hannes@ucsc.edu

Daniel Holth

7:12 p.m.

On Fri, Jan 17, 2014 at 1:45 PM, Hannes Schmidt <hannes@ucsc.edu> wrote:

...

Thank you for your suggestions, Donald. They are both feasible but they feel like workarounds to me.

Manually listing transitive dependencies seems like a step backwards to me. Isn't the whole point of setuptools/distutils that each component manages its own list of dependencies? How will this scale to dozens of transitive dependencies and busy development teams where dependencies are revved frequently. I really liked what Maven did for Java and therefore liked what pip is trying to do for Python.

I haven't used --find-links yet so I may be wrong but tarring up packages and serving them by a web server is additional work that I'd rather avoid. It just creates yet another copy of the package and requires maintaining an additional server component.

I loved the fact that I could point pip directly at my source repos for both my top-level projects as well as their dependencies. That seemed like a perfect fit for an interpreted language like Python where packages are distributed in source form. Removing dependency_links makes that feature useless to me because it doesn't extend to dependencies, only top level components.

IIUC a dependency link is the same as passing --find-links=x for that URL. I understand this doesn't exactly solve your problem. Your internal projects are in private bitbucket repositories?

Hannes Schmidt

2:21 a.m.

On Fri, Jan 17, 2014 at 11:12 AM, Daniel Holth <dholth@gmail.com> wrote:

...

On Fri, Jan 17, 2014 at 1:45 PM, Hannes Schmidt <hannes@ucsc.edu> wrote:

...
Thank you for your suggestions, Donald. They are both feasible but they feel like workarounds to me.

Manually listing transitive dependencies seems like a step backwards to me. Isn't the whole point of setuptools/distutils that each component manages its own list of dependencies? How will this scale to dozens of transitive dependencies and busy development teams where dependencies are revved frequently. I really liked what Maven did for Java and therefore liked what pip is trying to do for Python.

I haven't used --find-links yet so I may be wrong but tarring up packages and serving them by a web server is additional work that I'd rather avoid. It just creates yet another copy of the package and requires maintaining an additional server component.

I loved the fact that I could point pip directly at my source repos for both my top-level projects as well as their dependencies. That seemed like a perfect fit for an interpreted language like Python where packages are distributed in source form. Removing dependency_links makes that feature useless to me because it doesn't extend to dependencies, only top level components.

IIUC a dependency link is the same as passing --find-links=x for that URL. I understand this doesn't exactly solve your problem.

I don't know. How do I translate pip install hg+ssh://hg@bitbucket.org/cghub/cghub-cloud-agent@default with cghub-cloud-agent's setup.py being setup( name="cghub-cloud-agent", version="1.0.dev1", install_requires=[ 'cghub-cloud-lib>=1.0.dev1', 'cghub-python-lib>=1.4.dev1', ... ], dependency_links=[ 'hg+ssh:// hg@bitbucket.org/cghub/cghub-python-lib@default#egg=cghub-python-lib-1.4.dev1 ', 'hg+ssh:// hg@bitbucket.org/cghub/cghub-cloud-lib@default#egg=cghub-cloud-lib-1.0.dev1' ], ) into a use of --find-links?

...

Your internal projects are in private bitbucket repositories?

Yep -- Hannes Schmidt Software Application Developer Data Migration Engineer Cancer Genomics Hub University of California, Santa Cruz (206) 696-2316 (cell) hannes@ucsc.edu

Marcus Smith

2:33 a.m.

...

I don't know. How do I translate

pip install hg+ssh://hg@bitbucket.org/cghub/cghub-cloud-agent@default

with cghub-cloud-agent's setup.py being

setup( name="cghub-cloud-agent", version="1.0.dev1", install_requires=[ 'cghub-cloud-lib>=1.0.dev1', 'cghub-python-lib>=1.4.dev1', ... ], dependency_links=[ 'hg+ssh:// hg@bitbucket.org/cghub/cghub-python-lib@default#egg=cghub-python-lib-1.4.dev1 ', 'hg+ssh:// hg@bitbucket.org/cghub/cghub-cloud-lib@default#egg=cghub-cloud-lib-1.0.dev1 ' ], )

into a use of --find-links?

--find-links assumes packaging. if you don't package, it doesn't help you. you're left putting all the vcs urls into a requirements file for your top-level app.

Nick Coghlan

12:09 a.m.

On 18 Jan 2014 04:45, "Hannes Schmidt" <hannes@ucsc.edu> wrote:

...

Thank you for your suggestions, Donald. They are both feasible but they

feel like workarounds to me.

...

Manually listing transitive dependencies seems like a step backwards to

...

From pip's perspective, trusting that generated file is the same as

me. Isn't the whole point of setuptools/distutils that each component manages its own list of dependencies? How will this scale to dozens of transitive dependencies and busy development teams where dependencies are revved frequently. I really liked what Maven did for Java and therefore liked what pip is trying to do for Python. Hannes, this is the role we consider to be best served by running a private index server and using the full dependency resolution mechanism. You may want to review PEP 426 to see the intended mode of operation. However, the question at this point is whether there is a legitimate model of use where dependency links (or an equivalent) are used to create a more concrete dependency tree in a closed environment where the software developer/publisher and integrator/user are the *same* organisation. PEP 426 focuses on the case where those are different organisations (since that case poses the greatest security risk, allowing publishers to compromise the installations of integrators and users, beyond just running the software they provide). While it does include the "meta requirements" field to handle cases of (effective) dependency bundling rather than normal versioning, that model is deliberately more restrictive than dependency links. It seems to me that overall, it's much, much safer to simply avoid supporting this degree of transitive trust directly in the core installation toolchain. A more appropriate solution to handling the trusted internal distribution use case may be a dependency scanner that processes the dependency links metadata to generate a suitable requirements.txt file. trusting any other explicitly specified set of packages, while it also provides an opportunity for the resolved dependency links to be audited before trusting them. Unfortunately, we can't actually do that easily in a secure fashion, because obtaining the metadata currently requires executing an untrusted setup.py. So trusting dependency links "just" to generate a requirements.txt file is almost as bad as trusting them for installing software. The main advantage is that you can always generate the dependency list as an unprivileged user or in a sandbox, even if operating in an untrusted environment. Installation, by contrast, is often going to involve elevated privileges. Regardless, perhaps there's a case to be made that such a requirements.txt generator, with opt-in dependency links support, should remain available, with the feature being removed entirely only for actual package installation. That approach is still far more secure by default than the status quo, strongly encourages the use of a private index to better align with the upstream distribution model, but restores support for users like Hannes that are currently relying on dependency links as a dependency resolution solution that doesn't require additional server components in a trusted environment.

...

I haven't used --find-links yet so I may be wrong but tarring up packages and serving them by a web server is additional work that I'd rather avoid. It just creates yet another copy of the package and requires maintaining an additional server component.

I loved the fact that I could point pip directly at my source repos for both my top-level projects as well as their dependencies. That seemed like a perfect fit for an interpreted language like Python where packages are distributed in source form.

In a trusted environment, it can be, but on the internet, it is intolerably insecure. However, making it easy to generate a transitive requirements.txt for a trusted environment may be a usage model we can support without leaving upstream distribution and usage open to compromise. Regards, Nick.

Donald Stufft

12:29 a.m.

On Jan 17, 2014, at 7:09 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 18 Jan 2014 04:45, "Hannes Schmidt" <hannes@ucsc.edu> wrote:

...
Thank you for your suggestions, Donald. They are both feasible but they feel like workarounds to me.

Manually listing transitive dependencies seems like a step backwards to me. Isn't the whole point of setuptools/distutils that each component manages its own list of dependencies? How will this scale to dozens of transitive dependencies and busy development teams where dependencies are revved frequently. I really liked what Maven did for Java and therefore liked what pip is trying to do for Python.

I’ve never used Maven, but doesn’t it require you to run a Maven repository or otherwise point it at each individual location if they aren’t in a Maven repo?

...

Hannes, this is the role we consider to be best served by running a private index server and using the full dependency resolution mechanism. You may want to review PEP 426 to see the intended mode of operation.

However, the question at this point is whether there is a legitimate model of use where dependency links (or an equivalent) are used to create a more concrete dependency tree in a closed environment where the software developer/publisher and integrator/user are the *same* organisation.

PEP 426 focuses on the case where those are different organisations (since that case poses the greatest security risk, allowing publishers to compromise the installations of integrators and users, beyond just running the software they provide). While it does include the "meta requirements" field to handle cases of (effective) dependency bundling rather than normal versioning, that model is deliberately more restrictive than dependency links.

It seems to me that overall, it's much, much safer to simply avoid supporting this degree of transitive trust directly in the core installation toolchain. A more appropriate solution to handling the trusted internal distribution use case may be a dependency scanner that processes the dependency links metadata to generate a suitable requirements.txt file. From pip's perspective, trusting that generated file is the same as trusting any other explicitly specified set of packages, while it also provides an opportunity for the resolved dependency links to be audited before trusting them.

Unfortunately, we can't actually do that easily in a secure fashion, because obtaining the metadata currently requires executing an untrusted setup.py. So trusting dependency links "just" to generate a requirements.txt file is almost as bad as trusting them for installing software. The main advantage is that you can always generate the dependency list as an unprivileged user or in a sandbox, even if operating in an untrusted environment. Installation, by contrast, is often going to involve elevated privileges.

Regardless, perhaps there's a case to be made that such a requirements.txt generator, with opt-in dependency links support, should remain available, with the feature being removed entirely only for actual package installation.

That approach is still far more secure by default than the status quo, strongly encourages the use of a private index to better align with the upstream distribution model, but restores support for users like Hannes that are currently relying on dependency links as a dependency resolution solution that doesn't require additional server components in a trusted environment.

...
I haven't used --find-links yet so I may be wrong but tarring up packages and serving them by a web server is additional work that I'd rather avoid. It just creates yet another copy of the package and requires maintaining an additional server component.

I loved the fact that I could point pip directly at my source repos for both my top-level projects as well as their dependencies. That seemed like a perfect fit for an interpreted language like Python where packages are distributed in source form.

In a trusted environment, it can be, but on the internet, it is intolerably insecure.

However, making it easy to generate a transitive requirements.txt for a trusted environment may be a usage model we can support without leaving upstream distribution and usage open to compromise.

Regards, Nick. Let me just be explicit that I totally get the fact that this is breaking your workflow. Nobody *ever* likes being told that the way they are doing something is no longer going to be supported. Processing dependency links by default is a security risk however in your constrained use case it’s probably Ok, but only because you have a tightly constrained use case.

One thing that pip *could* do is leave the —process-dependency-links flag in place instead of removing it as per the deprecation cycle, however I think that ultimately this will be a disservice to users. The new formats that we’re designing and implementing are not going to support things like dependency links so while this would allow things to continue for awhile as we further improve the system it will likely be increasingly difficult to maintain the functionality of dependency links. So again I’m really really sorry that this is affecting you, but I can’t see a good way to keep this concept around that doesn’t hurt the security for the other use case. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Nick Coghlan

1:21 a.m.

...

...
...
I haven't used --find-links yet so I may be wrong but tarring up

...

...
...
I loved the fact that I could point pip directly at my source repos

for both my top-level projects as well as their dependencies. That seemed

...

...
In a trusted environment, it can be, but on the internet, it is

intolerably insecure.

...
However, making it easy to generate a transitive requirements.txt for a

...

...
Regards, Nick.

Let me just be explicit that I totally get the fact that this is breaking your workflow. Nobody *ever* likes being told that the way they are doing something is no longer going to be supported. Processing dependency links by default is a security risk however in your constrained use case it’s

On 18 Jan 2014 10:29, "Donald Stufft" <donald@stufft.io> wrote: packages and serving them by a web server is additional work that I'd rather avoid. It just creates yet another copy of the package and requires maintaining an additional server component. like a perfect fit for an interpreted language like Python where packages are distributed in source form. trusted environment may be a usage model we can support without leaving upstream distribution and usage open to compromise. probably Ok, but only because you have a tightly constrained use case.

...

One thing that pip *could* do is leave the —process-dependency-links flag

in place instead of removing it as per the deprecation cycle, however I think that ultimately this will be a disservice to users. The new formats that we’re designing and implementing are not going to support things like dependency links so while this would allow things to continue for awhile as we further improve the system it will likely be increasingly difficult to maintain the functionality of dependency links. We don't know that for sure - dependency links is entirely supportable even under PEP 426 through an appropriate metadata extension, it would just require someone to write the code to generate and process it. Alternatively, affected users could just stay with setuptools style metadata for internal use.

...

So again I’m really really sorry that this is affecting you, but I can’t see a good way to keep this concept around that doesn’t hurt the security for the other use case.

That's where I see a separate "scan-dependency-links" tool potentially fitting in. The concept would still be completely gone from the core installation toolchain in pip so it isn't available by default, but users in a situation like Hannes could use it to generate a suitable requirements.txt file (perhaps even dynamically) and feed *that* to pip. PyPI itself, however, could still change to block uploading of *new* packages that define dependency links (similar to what we did for external link scanning). Imposing backwards incompatible changes like this one, however well justified, always brings with it a certain responsibility to help users in managing the transition. In this case, I suspect a *separate* link scanning tool (which we can put as many security warnings on as you like) that generates a requirements.txt file may be a more appropriate approach than just telling users to use built packages on a private HTTP or PyPI server, since spinning up and maintaining new server infrastructure is often a much more challenging prospect in user environments than installing and using a new client tool. Cheers, Nick.

...

----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372

DCFA

...

Marcus Smith

2:46 a.m.

...

Imposing backwards incompatible changes like this one, however well justified, always brings with it a certain responsibility to help users in managing the transition. In this case, I suspect a *separate* link scanning tool (which we can put as many security warnings on as you like) that generates a requirements.txt file may be a more appropriate approach than just telling users to use built packages on a private HTTP or PyPI server, since spinning up and maintaining new server infrastructure is often a much more challenging prospect in user environments than installing and using a new client tool.

so, instead of pip install git+https://myrepo@master#egg=toplevelapp it's this: deplinks_scantool git+https://myrepo@master#egg=toplevelapp > requirements.txt pip install -r requirements.txt the scan tool step is pretty hefty in that it's downloading and running egg_info, but yes, I can see that being easier for some users than dealing with packaging

Hannes Schmidt

3:40 a.m.

Drawing from my Maven experience, there is no direct equivalent to dependency_links in Maven. In pom.xml (the equivalent of setup.py) one specifies dependencies and version constraints. This is the "what". In ~/.m2/settings.xml one specifies the repositories to be scanned for those dependencies. That's the "where". You *can* specify repositories in pom.xml, too, but I always considered that an artificial use case, and if I remember correctly, there were controversies about that. I do actually think that dependency_links meddles the "what" and the "where" concerns which is one reason I sort of agree with its removal. I'm just not convinced it's a good idea to remove it without offering an equally convenient alternative. By convenient I mean that it facilitates installation directly from source repository without the need of going through a separate and mostly redundant (due to Python being an interpreted language) package repository. I'd be completely happy with specifying the mapping from "what" to "where" in a configuration file, outside of setup.py. That would be even more convenient since it would be more DRY than having dependency_links in every package's setup.py. The mapping could be Python code consisting of lines like: map( 'foo<=1.5', git+https://myrepo/foo@master#egg=1.5 ) or with patterns map_re( '(bar-.*)<=(.*)', git+https://myrepo/$1@master#egg=$1-$2 ) This is probably a bit naive, especially with respect to how it translates the version constraint into a concrete version. On Fri, Jan 17, 2014 at 6:46 PM, Marcus Smith <qwcode@gmail.com> wrote:

...

...
Imposing backwards incompatible changes like this one, however well justified, always brings with it a certain responsibility to help users in managing the transition. In this case, I suspect a *separate* link scanning tool (which we can put as many security warnings on as you like) that generates a requirements.txt file may be a more appropriate approach than just telling users to use built packages on a private HTTP or PyPI server, since spinning up and maintaining new server infrastructure is often a much more challenging prospect in user environments than installing and using a new client tool.

so, instead of

pip install git+https://myrepo@master#egg=toplevelapp

it's this:

deplinks_scantool git+https://myrepo@master#egg=toplevelapp > requirements.txt pip install -r requirements.txt

the scan tool step is pretty hefty in that it's downloading and running egg_info, but yes, I can see that being easier for some users than dealing with packaging

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

-- Hannes Schmidt Software Application Developer Data Migration Engineer Cancer Genomics Hub University of California, Santa Cruz (206) 696-2316 (cell) hannes@ucsc.edu

Marcus Smith

4:58 a.m.

...

I'd be completely happy with specifying the mapping from "what" to "where" in a configuration file, outside of setup.py. That would be even more convenient since it would be more DRY than having dependency_links in every package's setup.py. The mapping could be Python code consisting of lines like:

map( 'foo<=1.5', git+https://myrepo/foo@master#egg=1.5 )

or with patterns

map_re( '(bar-.*)<=(.*)', git+https://myrepo/$1@master#egg=$1-$2 )

This is probably a bit naive, especially with respect to how it translates the version constraint into a concrete version.

I think that's an interesting idea, although I think your $2 would need to alter the @ part as well, which is the tag or branch your cloning If the mappings could be constructed as option values, then the mappings could just live in the requirements file. e.g. a mapping for "myproject" that has myproject==version map to a tag in the clone equal to "version" --vcs-req-map myproject:git+ https://myrepo/myproject@${version}#egg=myproject

Hannes Schmidt

2:34 a.m.

On Fri, Jan 17, 2014 at 4:29 PM, Donald Stufft <donald@stufft.io> wrote:

...

On Jan 17, 2014, at 7:09 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On 18 Jan 2014 04:45, "Hannes Schmidt" <hannes@ucsc.edu> wrote:

...
Thank you for your suggestions, Donald. They are both feasible but they

feel like workarounds to me.

...
Manually listing transitive dependencies seems like a step backwards to

me. Isn't the whole point of setuptools/distutils that each component manages its own list of dependencies? How will this scale to dozens of transitive dependencies and busy development teams where dependencies are revved frequently. I really liked what Maven did for Java and therefore liked what pip is trying to do for Python.

I’ve never used Maven, but doesn’t it require you to run a Maven repository or otherwise point it at each individual location if they aren’t in a Maven repo?

Yes. But Java is a compiled language so you *need* to host the compiled artifacts somewhere. Python's interpreted nature is what makes the incredible convenience of installing from sources at hg:// and git:// URLs possible.

...

Hannes, this is the role we consider to be best served by running a private index server and using the full dependency resolution mechanism. You may want to review PEP 426 to see the intended mode of operation.

However, the question at this point is whether there is a legitimate model of use where dependency links (or an equivalent) are used to create a more concrete dependency tree in a closed environment where the software developer/publisher and integrator/user are the *same* organisation.

PEP 426 focuses on the case where those are different organisations (since that case poses the greatest security risk, allowing publishers to compromise the installations of integrators and users, beyond just running the software they provide). While it does include the "meta requirements" field to handle cases of (effective) dependency bundling rather than normal versioning, that model is deliberately more restrictive than dependency links.

It seems to me that overall, it's much, much safer to simply avoid supporting this degree of transitive trust directly in the core installation toolchain. A more appropriate solution to handling the trusted internal distribution use case may be a dependency scanner that processes the dependency links metadata to generate a suitable requirements.txt file. From pip's perspective, trusting that generated file is the same as trusting any other explicitly specified set of packages, while it also provides an opportunity for the resolved dependency links to be audited before trusting them.

Unfortunately, we can't actually do that easily in a secure fashion, because obtaining the metadata currently requires executing an untrusted setup.py. So trusting dependency links "just" to generate a requirements.txt file is almost as bad as trusting them for installing software. The main advantage is that you can always generate the dependency list as an unprivileged user or in a sandbox, even if operating in an untrusted environment. Installation, by contrast, is often going to involve elevated privileges.

Regardless, perhaps there's a case to be made that such a requirements.txt generator, with opt-in dependency links support, should remain available, with the feature being removed entirely only for actual package installation.

That approach is still far more secure by default than the status quo, strongly encourages the use of a private index to better align with the upstream distribution model, but restores support for users like Hannes that are currently relying on dependency links as a dependency resolution solution that doesn't require additional server components in a trusted environment.

...
I haven't used --find-links yet so I may be wrong but tarring up packages and serving them by a web server is additional work that I'd rather avoid. It just creates yet another copy of the package and requires maintaining an additional server component.

I loved the fact that I could point pip directly at my source repos for both my top-level projects as well as their dependencies. That seemed like a perfect fit for an interpreted language like Python where packages are distributed in source form.

In a trusted environment, it can be, but on the internet, it is intolerably insecure.

However, making it easy to generate a transitive requirements.txt for a trusted environment may be a usage model we can support without leaving upstream distribution and usage open to compromise.

Regards, Nick.

Let me just be explicit that I totally get the fact that this is breaking your workflow. Nobody *ever* likes being told that the way they are doing something is no longer going to be supported. Processing dependency links by default is a security risk however in your constrained use case it’s probably Ok, but only because you have a tightly constrained use case.

One thing that pip *could* do is leave the —process-dependency-links flag in place instead of removing it as per the deprecation cycle, however I think that ultimately this will be a disservice to users. The new formats that we’re designing and implementing are not going to support things like dependency links so while this would allow things to continue for awhile as we further improve the system it will likely be increasingly difficult to maintain the functionality of dependency links.

So again I’m really really sorry that this is affecting you, but I can’t see a good way to keep this concept around that doesn’t hurt the security for the other use case.

----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-- Hannes Schmidt Software Application Developer Data Migration Engineer Cancer Genomics Hub University of California, Santa Cruz (206) 696-2316 (cell) hannes@ucsc.edu

Marcus Smith

1:58 a.m.

...

I haven't used --find-links yet so I may be wrong but tarring up packages and serving them by a web server is additional work that I'd rather avoid. It just creates yet another copy of the package and requires maintaining an additional server component.

--find-links can point to a local directory of packages, if that helps you. also, in case it's not clear, you don't manually tar up packages, you run "python setup.py sdist" to generate them.

Hannes Schmidt

2:25 a.m.

On Fri, Jan 17, 2014 at 5:58 PM, Marcus Smith <qwcode@gmail.com> wrote:

...

...
I haven't used --find-links yet so I may be wrong but tarring up packages and serving them by a web server is additional work that I'd rather avoid. It just creates yet another copy of the package and requires maintaining an additional server component.

--find-links can point to a local directory of packages, if that helps you. also, in case it's not clear, you don't manually tar up packages, you run "python setup.py sdist" to generate them.

Sure, scratch the "manual" part. But don't I'd still have to run this for every package after every commit? I really like the fact that I can install bleeding edge using @master (or @default) as well as a tag, e.g. @1.5 or a stable branch, e.g. @stable. -- Hannes Schmidt Software Application Developer Data Migration Engineer Cancer Genomics Hub University of California, Santa Cruz (206) 696-2316 (cell) hannes@ucsc.edu

Marcus Smith

2:25 a.m.

...

Manually listing transitive dependencies seems like a step backwards to me. Isn't the whole point of setuptools/distutils that each component manages its own list of dependencies? How will this scale to dozens of transitive dependencies and busy development teams where dependencies are revved frequently. I really liked what Maven did for Java and therefore liked what pip is trying to do for Python.

just in case it's not clear, if you package everything and dump the tars into a directory, then there's no listing everything, just maintain the install_requires listings for each project. it's just this: pip install --find-links=/local/packages my-top-level-app==X.Y but if you don't package, and pip stops processing dependency links, then yes, you will end up putting all the vcs urls into one requirements file per top-level app.

Matthew Iversen

1:08 p.m.

On 17/01/14 13:52, Hannes Schmidt wrote:

...

I read through the "Removing dependency_links" thread [1] and I beg you not follow through with the deprecation and removal of dependency_links and to rethink your approach.

The mentioned thread indicates that research was done to gauge the popularity of the dependency_links in publicly hosted Python projects. That approach is fundamentally flawed: Publicly hosted projects are much more likely to also be available on PyPI than private, closed-source projects. Consequently, their dependencies are also more likely to be hosted on PyPI as well. Because of that, they are much less likely to rely on the dependency_links feature.

Another misconception seem to be that dependency_links is predominantly used for installing patched or customized versions of dependencies hosted on PyPI. I'm pretty sure the predominant use case for dependency_links is with projects that are hosted privately, e.g. for an organization's internal use. I represent such an organization and removing dependency_links would impact us negatively. We host a set of internal projects and their dependencies on Bitbucket and we rely on dependency_links to install them directly from there.

I understand the motivation for this change -- security -- but there must be smarter way to handle it. Could we fallback to dependency_links if a PyPI lookup isn't successful? Could we restrict dependency_links to links that share a prefix with the link from which the package is currently being installed? A combination of the two?

[1]: https://mail.python.org/pipermail/distutils-sig/2013-October/022937.html

-- Hannes Schmidt Software Application Developer Data Migration Engineer Cancer Genomics Hub University of California, Santa Cruz

(206) 696-2316 (cell) hannes@ucsc.edu <mailto:hannes@ucsc.edu>

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Are you aware that you can also provide a solid infrastructure for a "proprietary environment of packages" by hosting your own private pypi mirror? There are quite a few projects that enable this, such as https://pypi.python.org/pypi/pypiserver https://pypi.python.org/pypi/mypypi https://github.com/steiza/simplepypi http://djangopypi2.readthedocs.org/en/latest/ You can use the -i flag, or a .pypirc config file to tell pip to look at your own private index instead of the official one; or direct it to use the official pypi only after looking at your private pypi. You can also host a network-available folder of wheels for pip to find, or simply a http accessible folder of packages as Donald suggested. -- Matt Iversen PGP: 0xc046e8a874522973 // 2F04 3DCC D6E6 D5AC D262 2E0B C046 E8A8 7452 2973

Nick Coghlan

3:08 p.m.

On 17 Jan 2014 23:22, "Matthew Iversen" <teh.ivo@gmail.com> wrote:

...

On 17/01/14 13:52, Hannes Schmidt wrote:

...
I read through the "Removing dependency_links" thread [1] and I beg you

...

...
The mentioned thread indicates that research was done to gauge the

...

...
Another misconception seem to be that dependency_links is predominantly

used for installing patched or customized versions of dependencies hosted on PyPI. I'm pretty sure the predominant use case for dependency_links is with projects that are hosted privately, e.g. for an organization's internal use. I represent such an organization and removing dependency_links would impact us negatively. We host a set of internal

...

...
I understand the motivation for this change – security – but there must

be smarter way to handle it. Could we fallback to dependency_links if a PyPI lookup isn't successful? Could we restrict dependency_links to links

not follow through with the deprecation and removal of dependency_links and to rethink your approach. popularity of the dependency_links in publicly hosted Python projects. That approach is fundamentally flawed: Publicly hosted projects are much more likely to also be available on PyPI than private, closed-source projects. Consequently, their dependencies are also more likely to be hosted on PyPI as well. Because of that, they are much less likely to rely on the dependency_links feature. projects and their dependencies on Bitbucket and we rely on dependency_links to install them directly from there. that share a prefix with the link from which the package is currently being installed? A combination of the two?

...

...
[1]:

https://mail.python.org/pipermail/distutils-sig/2013-October/022937.html

...
-- Hannes Schmidt Software Application Developer Data Migration Engineer Cancer Genomics Hub University of California, Santa Cruz

(206) 696-2316 (cell) hannes@ucsc.edu

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Are you aware that you can also provide a solid infrastructure for a "proprietary environment of packages" by hosting your own private pypi mirror?

There are quite a few projects that enable this, such as

https://pypi.python.org/pypi/pypiserver

https://pypi.python.org/pypi/mypypi

https://github.com/steiza/simplepypi

http://djangopypi2.readthedocs.org/en/latest/

You can use the -i flag, or a .pypirc config file to tell pip to look at your own private index instead of the official one; or direct it to use the official pypi only after looking at your private pypi.

You can also host a network-available folder of wheels for pip to find, or simply a http accessible folder of packages as Donald suggested.

Private PyPI mirrors and internal hosting options would be a suitable advanced topic for the PUG. I don't recall whether or not Marcus already added a placeholder for it, though. Cheers, Nick.

...

-- Matt Iversen PGP: 0xc046e8a874522973 // 2F04 3DCC D6E6 D5AC D262 2E0B C046 E8A8 7452

2973

...

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

4027

Age (days ago)

4028

Last active (days ago)

List overview

Download

17 comments

6 participants

participants (6)

Daniel Holth
Donald Stufft
Hannes Schmidt
Marcus Smith
Matthew Iversen
Nick Coghlan

Please do not remove dependency_links

Hannes Schmidt

Donald Stufft

Hannes Schmidt

Daniel Holth

Hannes Schmidt

Marcus Smith

Nick Coghlan

Donald Stufft

Nick Coghlan

Marcus Smith

Hannes Schmidt

Marcus Smith

Hannes Schmidt

Marcus Smith

Hannes Schmidt

Marcus Smith

Matthew Iversen

Nick Coghlan

tags

participants (6)