[Distutils] The sad and insecure state of commercial private package indexes

Nick Coghlan ncoghlan at gmail.com
Sat Apr 22 03:13:12 EDT 2017


On 22 April 2017 at 06:25, Wayne Werner <waynejwerner at gmail.com> wrote:
> On Fri, 21 Apr 2017, Jannis Gebauer wrote:
>
>> They could, of course, fix this very easily by running their own PyPi
>> mirrors.
>
>
> And now they have two problems.
>
>
> On the one hand, I agree that there is a potential from some abuse and
> vulnerabilities... but I think that I'd argue that if you're in a
> position where you're worried about that attack vector and you're using
> pypi.python.org then *you're doing it wrong!*
>
> On systems where I'm worried about pypi as an attack vector, I've
> downloaded the packages, built wheels, and stuck them in an S3 bucket,
> and I install with `--no-index --find-links=/path/to/my/wheelhouse`.

Right, package whitelists with artifact caches under organisational
control are the typical answer here, since they protect you from a
range of potential threats, and ensure you always have the ability to
reproduce builds if necessary, even if the original publisher decides
to delete all their previous releases.

Pinterest put together a decent write-up about the approach back when
they first published pinrepo as an open source project:
https://medium.com/@Pinterest_Engineering/open-sourcing-pinrepo-the-pinterest-artifact-repository-6ebfe5917bcb

The devpi devs have also granted in-principle approval for package
whitelisting support in their PyPI caching proxy configuration
settings: https://github.com/devpi/devpi/issues/198

> I'm not sure if there are any improvements that you could make to the
> security of pip/pypi that are much better, but I'm not a security expert
> :)

Pinning dependencies protects against implicit upgrade attacks, and
checking for expected hashes helps avoid substitution attacks. The
latter was added to baseline pip in 8.0.0, and was available through
`peep` before that. It's also a native part of the way that
Pipenv.lock files work.

For an approach that's less reliant on being part of an automated
build pipeline, the yum/dnf solution to this at the operating system
layer is repository priorities:
https://wiki.centos.org/PackageManagement/Yum/Priorities

conda similarly supports priorities at the channel level:
https://conda.io/docs/channels.html#after-conda-4-1-0

The gist of that approach is that it lets you assign priority orders
to particular repositories, such that the higher priority repo will
always win in the event of a name clash. In this kind of context, it's
used to ensure that 3rd party repositories can only ever add *new*
packages, without allowing them to ever replace or upgrade ones that
are available from the private repository.

My understanding is that Debian's equivalent system is broadly similar
in principle, but has a few additional features beyond the basic
relative priority support.

Nobody has been motivated to implement that capability for the
Python-specific tooling so far, as it competes against two
alternatives that will often make more architectural sense:

- automated build pipelines using dependency pinning, hash checks, and
pre-filtered artifact repositories
- relying on build & deployment formats that already offer repository
priority support (e.g. native Linux packages or conda packages)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Distutils-SIG mailing list