On 22 April 2017 at 06:25, Wayne Werner <waynejwerner@gmail.com> wrote:
On Fri, 21 Apr 2017, Jannis Gebauer wrote:
They could, of course, fix this very easily by running their own PyPi mirrors.
And now they have two problems.
On the one hand, I agree that there is a potential from some abuse and vulnerabilities... but I think that I'd argue that if you're in a position where you're worried about that attack vector and you're using pypi.python.org then *you're doing it wrong!*
On systems where I'm worried about pypi as an attack vector, I've downloaded the packages, built wheels, and stuck them in an S3 bucket, and I install with `--no-index --find-links=/path/to/my/wheelhouse`.
Right, package whitelists with artifact caches under organisational control are the typical answer here, since they protect you from a range of potential threats, and ensure you always have the ability to reproduce builds if necessary, even if the original publisher decides to delete all their previous releases. Pinterest put together a decent write-up about the approach back when they first published pinrepo as an open source project: https://medium.com/@Pinterest_Engineering/open-sourcing-pinrepo-the-pinteres... The devpi devs have also granted in-principle approval for package whitelisting support in their PyPI caching proxy configuration settings: https://github.com/devpi/devpi/issues/198
I'm not sure if there are any improvements that you could make to the security of pip/pypi that are much better, but I'm not a security expert :)
Pinning dependencies protects against implicit upgrade attacks, and checking for expected hashes helps avoid substitution attacks. The latter was added to baseline pip in 8.0.0, and was available through `peep` before that. It's also a native part of the way that Pipenv.lock files work. For an approach that's less reliant on being part of an automated build pipeline, the yum/dnf solution to this at the operating system layer is repository priorities: https://wiki.centos.org/PackageManagement/Yum/Priorities conda similarly supports priorities at the channel level: https://conda.io/docs/channels.html#after-conda-4-1-0 The gist of that approach is that it lets you assign priority orders to particular repositories, such that the higher priority repo will always win in the event of a name clash. In this kind of context, it's used to ensure that 3rd party repositories can only ever add *new* packages, without allowing them to ever replace or upgrade ones that are available from the private repository. My understanding is that Debian's equivalent system is broadly similar in principle, but has a few additional features beyond the basic relative priority support. Nobody has been motivated to implement that capability for the Python-specific tooling so far, as it competes against two alternatives that will often make more architectural sense: - automated build pipelines using dependency pinning, hash checks, and pre-filtered artifact repositories - relying on build & deployment formats that already offer repository priority support (e.g. native Linux packages or conda packages) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia