[Catalog-sig] Deprecate External Links

M.-A. Lemburg mal at egenix.com
Wed Feb 27 19:11:09 CET 2013


On 27.02.2013 18:37, Donald Stufft wrote:
> On Wednesday, February 27, 2013 at 12:10 PM, M.-A. Lemburg wrote:
>> Package installers only need access to the static files in
>> the /simple/ index. Those can be put behind a CDN to increase
>> uptime.
>>
>> PyPI itself doesn't have to be up and running if you just want
>> to download the files (unfortunately, that's not true at the
>> moment, because the /simple/ index is dynamically generated,
>> but that can be changed).
>>
>> See http://wiki.python.org/moin/CloudPyPI for details.
>
> I'm aware of that, but that doesn't change the statement. If /simple/
> is down you cannot determine the external urls. There is no way
> to increase uptime by adding more points of failure. 

Please reread the proposal. The /simple/ index would
get hosted on a separate domain which then points to the CDN.

>>>> Another reason for not uploading files to PyPI are the license
>>>> terms you have to agree to on PyPI and the fact that you can no
>>>> longer control where your distribution files are made available
>>>> by agreeing to them. This could be changed as well, but we'd need
>>>> to add more legalese to the PyPI mirror setup for this to work...
>>>> not sure whether people providing the mirrors would like this.
>>>>
>>>
>>>
>>> The legalese doesn't particularly give any more rights than any
>>> free/OSS license does. There's not a requirement currently that
>>> packages on PyPI be free/OSS but this change would only actually
>>> affect people who want to upload non free code to PyPI.
>>>
>>
>>
>> It does affect any package author, regardless of the license.
>> Some examples:
>>
>> * you may be forced remove a distribution from the net (think DMCA,
>> patents, trademarks, etc)
>>
>>
> 
> IANAL but I'm pretty sure if any of those things occur you didn't have
> the legal right to grant that license to the PSF and the PSF would be
> required to take them down anyways. 

Exactly and taking down those files becomes impossible once
you've uploaded them to PyPI (see the terms that you agree to
when uploading to PyPI).

>> * the distribution may contain a serious bug that you don't want to
>> spread
>>
>>
> 
> This is a completely separate issue. PyPI supports (and always will)
> a method of saying "delete and/or don't install this". This is really just
> a strawman.

The "delete" part works on PyPI itself. It doesn't on
some of the mirrors and once you've pinned a version with
such a bug in, say, zc.buildout configured to use one of
those mirrors, your system will continue to use the broken
release.

>> * you may want to keep more accurate statistics of the reach of
>> your project
> 
> What statistics do you want? Let's have PyPI produce them and
> properly anonymize them instead of leaking data. 

Number of downloads broken down by country is most important
one, I guess. But others will likely want other details as well.

>>>> Security can be had by having installers check the GPG signatures
>>>> of distribution file. You don't need to trust the download
>>>> site for that.
>>>>
>>>
>>>
>>> GPG signatures are good, we don't have them yet. And when we do
>>> it's only 1 layer of defense, not the final solution.
>>>
>>
>>
>> Sure, you still have to trust the author :-)
>
> But do I need to trust his host? Do I need to trust that his laptop didn't
> get swiped and with it his GPG key? Ideally I don't *need* to trust the
> author either. I download his package from PyPI and I can review it,
> then I know it's fine and I can download that version and use it. PyPI
> isn't to the point you can make that assumption but It should get there.

Hmm, if you went through the hassle of reviewing the
already downloaded file manually, why would you want to
download it again ?

In such a setup, getting the file locally would appear
to be the more natural choice, IMO, bypassing all the
issues we're discussing here.

>>>> I'm not sure what you meant with privacy in this context.
>>>
>>> If I download something from server there is a certain amount
>>> of information that by nature of HTTP and networking gets
>>> leaked to that host. Additionally if it's done via non TLS connections
>>> it also gets leaked to anyone who has a MITM on my connection.
>>>
>>> This is especially important in countries where the government
>>> actively surveils or modifies the traffic of their citizens.
>>>
>>
>>
>> I can see an issue with e.g. trying to download code that
>> is illegal to use in a country (e.g. crypto code, exploits,
>> hacks, etc.), but the country officials would probably just
>> block the complete PyPI site than bother with filtering single
>> requests.
>>
>> IMO, that's beyond the scope of what we're discussing
>> here, though.
>>
>>
> 
> It's not just "crypto code", "exploits", "hacks" it's also things like
> https://ooni.torproject.org/ and Tor itself which are *good* projects
> that certain governments might not particularly like.

Well, yes, but it's still beyond the scope of what we're
discussing, right ?

>>>> Something that would work even with older Python versions is
>>>> letting the download URL point to a meta-file which contains
>>>> the links to the other distribution files. That way you
>>>> avoid having the installers trying to parse arbitrary
>>>> websites and you can add more security to the downloads
>>>> by providing hash values, etc. in those meta-files.
>>>>
>>>> Since installers already know how to parse the /simple/
>>>> (HTML) index files, we might use that same format
>>>> for those meta-files.
>>>>
>>>
>>
>>
>> So what do you think of the above idea ?
> If the hashes is on the external system then they are as good as
> useless. If I'm able to do something nefarious with the packages
> that are hosted I can do something nefarious with the metadata file.
> 
> Putting the hashes on PyPI fixes the security issue (because we
> have a real SSL cert, and tools are starting to validate it) but doesn't
> fix the other issues.

Ok, so how about we follow the same approach as used for
PyPI downloads: you add a hash to the download URL pointing to
the meta file, e.g.

http://example.com/files/pkg-1.0-distribution-files.html#bb4536e6...

That way the hash would get stored on PyPI and installers
can verify that the data in the meta file hasn't been tampered
with.

This can be supported by all distutils versions, since the
download URL is a free format field.

The hashes of the files and download URLs would still be in
the meta file, but now protected by the hash stored on PyPI.

In a way, you'd be setting up a remote PyPI /simple/ index
for the which the installers already have parsers in place.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 26 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Catalog-SIG mailing list