[Catalog-sig] Deprecate External Links

Donald Stufft donald.stufft at gmail.com
Wed Feb 27 19:21:40 CET 2013


On Wednesday, February 27, 2013 at 1:11 PM, M.-A. Lemburg wrote:
> On 27.02.2013 18:37, Donald Stufft wrote:
> > On Wednesday, February 27, 2013 at 12:10 PM, M.-A. Lemburg wrote:
> > > Package installers only need access to the static files in
> > > the /simple/ index. Those can be put behind a CDN to increase
> > > uptime.
> > > 
> > > PyPI itself doesn't have to be up and running if you just want
> > > to download the files (unfortunately, that's not true at the
> > > moment, because the /simple/ index is dynamically generated,
> > > but that can be changed).
> > > 
> > > See http://wiki.python.org/moin/CloudPyPI for details.
> > 
> > I'm aware of that, but that doesn't change the statement. If /simple/
> > is down you cannot determine the external urls. There is no way
> > to increase uptime by adding more points of failure. 
> > 
> 
> 
> Please reread the proposal. The /simple/ index would
> get hosted on a separate domain which then points to the CDN.
> 
> 

It. Does. Not. Matter. You are simply moving the SPOF which is
/simple/, if /simple/ is how you discover the CDN and/or external
urls then the things it points too can have 100% uptime and if
/simple/ is down the entire system is down. 
> 
> > > > > Another reason for not uploading files to PyPI are the license
> > > > > terms you have to agree to on PyPI and the fact that you can no
> > > > > longer control where your distribution files are made available
> > > > > by agreeing to them. This could be changed as well, but we'd need
> > > > > to add more legalese to the PyPI mirror setup for this to work...
> > > > > not sure whether people providing the mirrors would like this.
> > > > > 
> > > > 
> > > > 
> > > > 
> > > > The legalese doesn't particularly give any more rights than any
> > > > free/OSS license does. There's not a requirement currently that
> > > > packages on PyPI be free/OSS but this change would only actually
> > > > affect people who want to upload non free code to PyPI.
> > > > 
> > > 
> > > 
> > > 
> > > It does affect any package author, regardless of the license.
> > > Some examples:
> > > 
> > > * you may be forced remove a distribution from the net (think DMCA,
> > > patents, trademarks, etc)
> > > 
> > 
> > 
> > IANAL but I'm pretty sure if any of those things occur you didn't have
> > the legal right to grant that license to the PSF and the PSF would be
> > required to take them down anyways. 
> > 
> 
> 
> Exactly and taking down those files becomes impossible once
> you've uploaded them to PyPI (see the terms that you agree to
> when uploading to PyPI).
> 
> 

PyPI would have to take them down because you didn't have a right
to release them. PyPI would also respond to the DMCA process etc. 
> 
> > > * the distribution may contain a serious bug that you don't want to
> > > spread
> > > 
> > 
> > 
> > This is a completely separate issue. PyPI supports (and always will)
> > a method of saying "delete and/or don't install this". This is really just
> > a strawman.
> > 
> 
> 
> The "delete" part works on PyPI itself. It doesn't on
> some of the mirrors and once you've pinned a version with
> such a bug in, say, zc.buildout configured to use one of
> those mirrors, your system will continue to use the broken
> release.
> 
> 

Then we need to fix the mirroring protocol. 
> 
> > > * you may want to keep more accurate statistics of the reach of
> > > your project
> > > 
> > 
> > 
> > What statistics do you want? Let's have PyPI produce them and
> > properly anonymize them instead of leaking data. 
> > 
> 
> 
> Number of downloads broken down by country is most important
> one, I guess. But others will likely want other details as well.
> 
> > > > > Security can be had by having installers check the GPG signatures
> > > > > of distribution file. You don't need to trust the download
> > > > > site for that.
> > > > > 
> > > > 
> > > > 
> > > > 
> > > > GPG signatures are good, we don't have them yet. And when we do
> > > > it's only 1 layer of defense, not the final solution.
> > > > 
> > > 
> > > 
> > > 
> > > Sure, you still have to trust the author :-)
> > 
> > But do I need to trust his host? Do I need to trust that his laptop didn't
> > get swiped and with it his GPG key? Ideally I don't *need* to trust the
> > author either. I download his package from PyPI and I can review it,
> > then I know it's fine and I can download that version and use it. PyPI
> > isn't to the point you can make that assumption but It should get there.
> > 
> 
> 
> Hmm, if you went through the hassle of reviewing the
> already downloaded file manually, why would you want to
> download it again ?
> 
> 

Because I have automated deployment systems? Which currently cannot
use PyPI because of issues such as these. However other people don't realize
or are unable to do so and expect Foo==1.0 to be reasonably secure, which it
isn't. 
> 
> In such a setup, getting the file locally would appear
> to be the more natural choice, IMO, bypassing all the
> issues we're discussing here.
> 
> > > > > I'm not sure what you meant with privacy in this context.
> > > > 
> > > > If I download something from server there is a certain amount
> > > > of information that by nature of HTTP and networking gets
> > > > leaked to that host. Additionally if it's done via non TLS connections
> > > > it also gets leaked to anyone who has a MITM on my connection.
> > > > 
> > > > This is especially important in countries where the government
> > > > actively surveils or modifies the traffic of their citizens.
> > > > 
> > > 
> > > 
> > > 
> > > I can see an issue with e.g. trying to download code that
> > > is illegal to use in a country (e.g. crypto code, exploits,
> > > hacks, etc.), but the country officials would probably just
> > > block the complete PyPI site than bother with filtering single
> > > requests.
> > > 
> > > IMO, that's beyond the scope of what we're discussing
> > > here, though.
> > > 
> > 
> > 
> > It's not just "crypto code", "exploits", "hacks" it's also things like
> > https://ooni.torproject.org/ and Tor itself which are *good* projects
> > that certain governments might not particularly like.
> > 
> 
> 
> Well, yes, but it's still beyond the scope of what we're
> discussing, right ?
> 
> 

One of the reasons against external urls is the privacy implications. People
downloading things and not wanting someone to spy on them is one of the
reasons why they'd want their downloads to be private. 
> 
> > > > > Something that would work even with older Python versions is
> > > > > letting the download URL point to a meta-file which contains
> > > > > the links to the other distribution files. That way you
> > > > > avoid having the installers trying to parse arbitrary
> > > > > websites and you can add more security to the downloads
> > > > > by providing hash values, etc. in those meta-files.
> > > > > 
> > > > > Since installers already know how to parse the /simple/
> > > > > (HTML) index files, we might use that same format
> > > > > for those meta-files.
> > > > > 
> > > > 
> > > 
> > > 
> > > 
> > > So what do you think of the above idea ?
> > If the hashes is on the external system then they are as good as
> > useless. If I'm able to do something nefarious with the packages
> > that are hosted I can do something nefarious with the metadata file.
> > 
> > Putting the hashes on PyPI fixes the security issue (because we
> > have a real SSL cert, and tools are starting to validate it) but doesn't
> > fix the other issues.
> > 
> 
> 
> Ok, so how about we follow the same approach as used for
> PyPI downloads: you add a hash to the download URL pointing to
> the meta file, e.g.
> 
> http://example.com/files/pkg-1.0-distribution-files.html#bb4536e6...
> 
> That way the hash would get stored on PyPI and installers
> can verify that the data in the meta file hasn't been tampered
> with.
> 
> This can be supported by all distutils versions, since the
> download URL is a free format field.
> 
> The hashes of the files and download URLs would still be in
> the meta file, but now protected by the hash stored on PyPI.
> 
> In a way, you'd be setting up a remote PyPI /simple/ index
> for the which the installers already have parsers in place.
> 
> 

I'd need to think about this, offhand it fixes the security implications
but it doesn't solve any of the other issues which are just as important.

Particularly how this would play into the future of PyPI wrt the latest PEP's.
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com (http://eGenix.com)
> 
> Professional Python Services directly from the Source (#1, Feb 26 2013)
> > > > Python Projects, Consulting and Support ... http://www.egenix.com/
> > > > mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/
> > > > mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
> > > > 
> > > 
> > 
> 
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
> eGenix.com (http://eGenix.com) Software, Skills and Services GmbH Pastor-Loeh-Str.48
> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
> Registered at Amtsgericht Duesseldorf: HRB 46611
> http://www.egenix.com/company/contact/
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130227/7e18c173/attachment.html>


More information about the Catalog-SIG mailing list