[Distutils] PEP 470 discussion, part 3

Thu Jul 24 17:57:37 CEST 2014

FYI: PEP 458 provides a way to address most of the security issues with
this as well.   (We call these "provides-everything" attacks in some of our
prior work: https://isis.poly.edu/~jcappos/papers/cappos_pmsec_tr08-02.pdf)

One way of handling this is that whomever registers the name can choose
what other packages can be registered that meet that dependency.   Another
is that PyPI could automatically manage the metadata for this.   Clearly
someone has to be responsible for making sure that this is 'off-by-default'
so that a malicious party cannot claim to provide a popular package and get
their software installed instead.   What do you think makes the most sense?

Even if only "the right" projects can create trusted packages for a
dependency, there are security issues also with respect to which package
should be trusted.   Suppose you have projects zap and bar, which should be
chosen to meet a dependency.   Which should be used?

With TUF we currently support them choosing a fixed project (zap or bar),
but supporting the most recent upload is also possible.   We had an
explicit tag and type of delegation in Stork for this case (the timestamp
tag), but I think we can get equivalent functionality with threshold
signatures in TUF.

Once we understand more about how people would like to use it, we can make
sure PEP 458 explains how this is supported in a clean way while minimizing
the security impact.

Thanks,
Justin

On Thu, Jul 24, 2014 at 11:41 AM, Donald Stufft <donald at stufft.io> wrote:

> On July 24, 2014 at 7:26:11 AM, Richard Jones (r1chardj0n3s at gmail.com)
> wrote:
>
> Even ignoring the malicious possibility there is a probably greater chance
> of accidental mistakes:
>
> - company sets up internal index using pip's multi-index support and hosts
> various modules
> - someone quite innocently uploads something with the same name, never
> version, to pypi
> - company installs now use that unknown code
>
> devpi avoids this (I would recommend it over multi-index for companies
> anyway) by having a white list system for packages that might be pulled
> from upstream that would clash with internal packages.
>
> As Nick's mentioned, a signing infrastructure - tied to the index
> registration of a name - could solve this problem.
>
> Yes, those are two solutions, another solution is for PyPI to allow
> registering a namespace, like dstufft.* and companies simply name all their
> packages that. This isn’t a unique problem to this PEP though. This problem
> exists anytime a company has an internal package that they do not want on
> PyPI. It’s unlikely that any of those companies are using the external link
> feature if that package is internal.
>
>
>
> There still remains the usability issue of unsophisticated users running
> into external indexes and needing to cope with that in one of a myriad of
> ways as evidenced by the PEP. One solution proposed and refined at the
> EuroPython gathering today has PyPI caching packages from external indexes
> *for packages registered with PyPI*. That is: a requirement of registering
> your package (and external index URL) with PyPI is that you grant PyPI
> permission to cache packages from your index in the central index - a
> scenario that is ideal for users. Organisations not wishing to do that
> understand that they're the ones causing the pain for users.
>
> We can’t cache the packages which aren’t currently hosted on PyPI. Not in
> an automatic fashion anyways. We’d need to ensure that their license allows
> us to do so. The PyPI ToS ensures this when they upload but if they never
> upload then they’ve never agreed to the ToS for that artifact.
>
>
>
> An extension of this proposal is quite elegant; to reduce the pain of
> migration from the current approach to the new, we implement that caching
> right now, using the current simple index scraping. This ensures the
> packages are available to all clients throughout the transition period.
>
> As said above, we can’t legally do this automatically, we’d need to ensure
> that there is a license that grants us distribution rights.
>
>
>
> The transition issue was enough for those at the meeting today to urge me
> to reject the PEP.
>
> To be clear, there are really three issues at play:
>
> 1) Should we continue to support scraping external urls *at all*. This is
> a cause of a lot of problems in pip and it infects our architecture with
> things that cause confusing error messages that we cannot really get away
> from. It’s also super slow and grossly insecure.
>
> 2) Should we continue to support direct links from a project’s /simple/
> page to a downloadable file which isn’t hosted on PyPI.
>
> 3) If we allow direct links to a downloadable file from a project’s
> /simple/ page, do we mandate that they include a hash (and thus are safe)
> or do we also allow ones without a checksum (and thus are unsafe).
>
> For me, 1 is absolutely not. It is terrible and it is the cause of
> horrible UX issues as well as performance issues. However 1 is also the
> majorly useful one. Eliminating 1 eliminates PIL and that is > 90% of the
> /simple/ traffic for the projects which this will have any impact.
>
> For me 2 is a question of, is the relatively small (both traffic and
> number of packages) worth the extra cognitive overhead of users having to
> understand that there are *two* ways for something to be installed from not
> PyPI. Additionally is it worth the removal of ability for people to legally
> mirror the actual *files* without manually white listing the ones that
> they’ve vetted and found the license to allow them to do so (and even then
> in the future a project could switch to a license which doesn’t allow
> that). For me this is again no, it’s not worth it. Additional concepts to
> learn with their own quirks and causing pain for people wanting to mirror
> their installs is not worth keeping things working for a tiny fraction of
> things.
>
> For me 3 is no just because 2 is no, but assuming 2 is “yes”, I still
> think 3 is no because the external vs unverified split is confusing to
> users. Additionally the impact of this one, if I recall correctly, is
> almost zero.
>
>
>
>
>       Richard
>
>
> On 24 July 2014 12:40, Vladimir Diaz <vladimir.v.diaz at gmail.com> wrote:
>
>> In metadata 2.0 even with package signing you end up where I can have you
>> install “django-foobar” which depends on “FakeDjango”, which provides
>> “Django”, and then for all intents and purposes you have a “Django” package
>> installed.
>>
>> Can you go into more detail?  Particularly, the part where "FakeDjango"
>> provides Django.
>>
>> Richard Jones mentions the case where an external index provides an
>> "updated release" and tricks the updater into installing a compromised
>> "Django."  Is this the same thing?
>>
>>
>> On Thu, Jul 24, 2014 at 4:55 AM, Richard Jones <r1chardj0n3s at gmail.com>
>> wrote:
>>
>>> Thanks for responding, even from your sick bed.
>>>
>>> This message about users having to view and understand /simple/ indexes
>>> is repeated many times. I didn't have to do that in the case of PIL. The
>>> tool told me "use --allow-external PIL to allow" and then when that failed
>>> it told me "use --allow-unverified PIL to allow". There was no needing to
>>> understand why, nor any reading of /simple/ indexes.
>>> Currently most users (I'm thinking of people who install PIL once or
>>> twice) don't need to edit configuration files, and with a modification we
>>> could make the above process interactive. Those ~3000 packages that have
>>> internal and external packages would be slow, yes.
>>>
>>> This PEP proposes a potentially confusing break for both users and
>>> packagers. In particular, during the transition there will be packages
>>> which just disappear as far as users are concerned. In those cases users
>>> will indeed need to learn that there is a /simple/ page and they will need
>>> to view it in order to find the URL to add to their installation invocation
>>> in some manner. Even once install tools start supporting the new mechanism,
>>> users who lag (which as we all know are the vast majority) will run into
>>> this.
>>>
>>> On the devpi front: indeed it doesn't use the mirroring protocol because
>>> it is not a mirror. It is a caching proxy that uses the same protocols as
>>> the install tools to obtain, and then cache the files for install. Those
>>> files are then presented in a single index for the user to use. There is no
>>> need for multi-index support, even in the case of having multiple staging
>>> indexes. There is a need for devpi to be able to behave just like an
>>> installer without needing intervention, which I believe will be possible in
>>> this proposal as it can automatically add external indexes as it needs to.
>>>
>>> I talked to a number of people last night and I believe the package
>>> spoofing concept is also a vulnerability in the Linux multi-index model
>>> (where an external index provides an "updated release" of some core package
>>> like libssl on Linux, or perhaps requests in Python land). As I understand
>>> it, there is no protection against this. Happy to be told why I'm wrong, of
>>> course :)
>>>
>>>
>>>       Richard
>>>
>>> _______________________________________________
>>> Distutils-SIG maillist  -  Distutils-SIG at python.org
>>> https://mail.python.org/mailman/listinfo/distutils-sig
>>>
>>>
>>
>
>
>
> --
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372
> DCFA
>
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20140724/119b7bce/attachment-0001.html>