[Distutils] Need for respect (was: PEP 438, pip and --allow-external)

Donald Stufft donald at stufft.io
Tue May 13 14:38:30 CEST 2014


On May 13, 2014, at 8:16 AM, Paul Moore <p.f.moore at gmail.com> wrote:

>> External and verifiable packages have the same security as uploaded files
>> (though I would like to use sha256 instead of md5 the URL).
> 
> Correct (I think it might even be correct for indirectly linked files
> where each link has a hash, which PEP 438 doesn't consider verifiable
> - although I'm not a security expert so don't quote me). The PEP is
> open to sha256 in place of md5, although I don't know if pip supports
> it.

So the answer is “maybe”.

The thing you need to be able to do is verify each “hop” that pip has to take
in order to find that file.

Here are some cases which PEP438 explicitly supports (and so does pip):

File Hosted on PyPI (safe)
--------------------------

1. pip fetches /simple/foobar/ and verifies that the contents of that page
   is correct by verifying the TLS connection. pip finds a link to
   a file hosted on PyPI and it includes the #md5=hash. We know this hash is
   accurate because we've verified the contents of the page.
2. pip downloads the file from PyPI and verifies it against the md5 hash.


File Hosted Externally, but Directly Linked with Hash (safe)
------------------------------------------------------------

1. pip fetches /simple/foobar/ and verifies that the contents of that page
   is correct by verifying the TLS connection. pip finds a link to a file
   hosted on downloads.example.com and it includes the #md5=hash. We know
   this hash is accurate because we've verified the contents of the page.
2. pip downloads the file from downloads.example.com and verifies it against
   the md5 hash.


File Hosted Externally, but Directly Linked without a Hash (unsafe)
-------------------------------------------------------------------

1. pip fetches /simple/foobar/ and verifies that the contents of that page
   is correct by verifying the TLS connection. pip finds a link to a file
   hosted on downloads.example.com and it does not include the #md5=hash.
2. pip downloads the file from downloads.example.com and does not verify it
   against anything because we don't have a safely acquired hash to verify
   it against.


File Hosted Externally, Indirectly Linked with a Hash on the Indirect Page (unsafe)
-----------------------------------------------------------------------------------

1. pip fetches /simple/foobar/ and verifies that the contents of that page
   is correct by verifying the TLS connection. pip finds a link to a another
   page at http://downloads.example.com/.
2. pip fetches http://downloads.example.com/, it does no verification because
   we have no safely acquired method to do so. It finds a link to a file hosted
   on downloads.example.com with an #md5=hash, however we cannot know the hash
   is accurate because we have no way to verify the contents of this page.
3. pip downloads the file from downloads.example.com and does not verify it
   against anything because we don't have a safely acquired hash to verify it
   against.


File Hosted Externally, Indirectly Linked No Hash on the Indirect Page (unsafe)
-------------------------------------------------------------------------------

1. pip fetches /simple/foobar/ and verifies that the contents of that page
   is correct by verifying the TLS connection. pip finds a link to a another
   page at http://downloads.example.com/.
2. pip fetches http://downloads.example.com/, it does no verification because
   we have no safely acquired method to do so. It finds a link to a file hosted
   on downloads.example.com without a #md5=hash.
3. pip downloads the file from downloads.example.com and does not verify it
   against anything because we don't have a safely acquired hash to verify it
   against.


Marc-Andre has suggested an additional method which is currently not supported
by the PEP nor by pip:

Marc-Andre Proposal (safe)
--------------------------

1. pip fetches /simple/foobar/ and verifies that the contents of that page
   is correct by verifying the TLS connection. pip finds a link to a another
   page at http://downloads.example.com/ and notices that this link has a
   #md5=hash. We know this hash is accurate because we've verified the contents
   of the page.
2. pip fetches http://downloads.example.com/ and verifies the contents of that
   page is correct by verifying the hash of that content against the hash found
   in step 1. It finds a link to a file hosted on downloads.example.com and
   which includes a #md5=hash. We know this hash is accurate because we've
   verified the contents of this page.
3. pip downloads the file from downloads.example.com and verifies it using the
   md5 hash.


Marc-Andre Proposal - Other Outcome (unsafe)
--------------------------------------------

1. pip fetches /simple/foobar/ and verifies that the contents of that page
   is correct by verifying the TLS connection. pip finds a link to a another
   page at http://downloads.example.com/ and notices that this link has a
   #md5=hash. We know this hash is accurate because we've verified the contents
   of the page.
2. pip fetches http://downloads.example.com/ and verifies the contents of that
   page is correct by verifying the hash of that content against the hash found
   in step 1. It finds a link to a file hosted on downloads.example.com which 
   does not include a #md5=hash.
3. pip downloads the file from downloads.example.com and does not verify it
   because we have no hash to verify it against.


An aside about TLS
------------------

Techincally you can replace any spot where we use a hash with a TLS connection
and still be safe. This means that: https:/pypi.python.org/simple/foobar/ -> 
https://downloads.example.com/ -> https://downlaods.example.com/foobar-1.0.tar.gz
is secure even if there is no hashes involved at all. You can apply TLS and
hashes in any combination as long as there is no step in the chain which is not
able to be verified with either a hash or TLS.

PEP438 does not support the idea that a file can be safely hosted on TLS without
a hash and neither does pip. I only mention it here because I want to be clear
that some of the above examples could be actually "safe" if TLS were in use
but that we won't consider it safe for a variety of reasons.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20140513/760fb688/attachment.sig>


More information about the Distutils-SIG mailing list