[Distutils] continuous integration options (was Re: Travis-CI is not open source, except in fact it *is* open source)

Nick Coghlan ncoghlan at gmail.com
Sun Nov 6 22:28:37 EST 2016


On 7 November 2016 at 07:20, Chris Barker <chris.barker at noaa.gov> wrote:
> So how is allowing anyone to push something to PyPi that will run arbitrary
> code on a CI server, that will push arbitrary code to PyPi that will then
> get run by anyone that pip installs it?

PyPI currently has the ability to impersonate any PyPI publisher,
which makes it an enormous security threat in and of itself, so we
need to limit the attack surfaces that it exposes.

> Essentially, we have already said that there is no such thing as "trusting
> PyPi" -- you need to trust each individual package. So how in any sort of
> auto-build system going to change that??

Currently we're reasonably confident that the only folks that can
compromise Django users (for example) are the Django devs and the PyPI
service administrators. The former is an inherent problem in trusting
any software publisher, while the latter we currently mitigate by
tightly controlling admin access to the production PyPI service, and
strictly limiting the server-side processing that PyPI performs on
uploaded files to reduce the opportunities for privilege escalation
attacks.

Once you start providing a server-side build service however, you're
opening up additional attack vectors on the core publishing system,
and getting any aspect of that wrong may lead to publishers being able
to impersonate *each other*. Unfortunately, offering secure
multi-tenancy in software services when you allow tenants to run
arbitrary code is a really hard problem - it's the main reason that
OpenShift v3 hasn't fully displaced OpenShift v2 yet, and that's with
the likes of Red Hat, Google, CoreOS and Deis collaborating on the
underlying Kubernetes infrastructure.

Linux distros and conda-forge duck that multi-tenancy problem by
treating the build system itself as the publisher, with everyone with
access to it being a relatively trusted co-tenant (think "share house
with no locks on interior doors" rather than "apartment complex").
That approach works OK at smaller scales, but the gatekeeping involved
in approving new co-publishers introduces off-putting friction for
potential participants (hence both app developers and data analysts
finding ways to bypass the sysadmin and OS developer dominated Linux
packaging ecosystems).

For PyPI, we can mitigate the difficulty by getting the builds to
happen somewhere else (like external CI services), but even then you
still have a non-trivial service integration problem to manage,
especially if you decide to tackle it through a "bring your own build
service" approach (ala GitHub CI integration).

Whichever way you go though (native build service, or integration with
external build services), you're signing up for a major ongoing
maintenance task, as you're either now responsible for a shared build
system serving tens of thousands of software publishers [1], or else
you're responsible for maintaining a coherent publisher UX while also
maintaining compatibility with multiple external systems that you
don't directly control.

Cheers,
Nick.

[1] There were ~35k distinct publisher accounts on PyPI when Donald
last checked in August: https://github.com/pypa/warehouse/issues/1428

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Distutils-SIG mailing list