[Distutils] [Python-checkins] peps: PEP 470: use external indexes over link spidering

Donald Stufft donald at stufft.io
Thu May 15 19:09:06 CEST 2014


On May 15, 2014, at 11:13 AM, Stefan Krah <stefan-usenet at bytereef.org> wrote:

> I'm not opposed to the Custom Additional Index if downloads can still be
> protected by a checksum stored on PyPI.
> 
> What I find disturbing is that even after all these discussions the document
> contains many personal preferences, heavily biased statements about external
> servers and legal advice.
> 
> 
> If all these things weren't present and checksum facilities were added,
> I'd simply be +1 on the PEP.  Instead, I'm forced to vote -1 on the
> current draft and go through it point by point.
> 
> 
> nick.coghlan <python-checkins at python.org> wrote:
>> +Author: Donald Stufft <donald at stufft.io>,
>> +
>> +Custom Additional Index
>> +-----------------------
>> +
>> +Compared to the complex rules which a project must be aware of to prevent
>> +themselves from being considered unsafely hosted setting up an index is fairly
>> +trivial and in the simplest case does not require anything more than a
>> +filesystem and a standard web server such as Nginx.
> 
> It's trivial to add the explicit URL with the checksum compared to configuring
> a new subdomain, so I don't like the reason given.
> 
> However, if it is still possible to deposit a checksum on PyPI instead
> of using an SSL certificate to secure the download, I don't mind the
> custom index.
> 

Right now they’ll require SSL certificates (which can be gotten for free, or you
can upload a file to pythonhosted.org and use that as your custom index since
you can upload any file you want there pretty much and it has TLS already
enabled.

Eventually we’ll be switching to a package signing scheme and we won’t require
TLS to secure things anymore.

> 
> 
>> +External Links on the Simple Installer API
>> +------------------------------------------
> 
> I think the existing scheme could have been simplified:
> 
>   https://mail.python.org/pipermail/distutils-sig/2014-May/024275.html
> 
> But I'm not going to insist, since the new scheme is no worse if checksums
> can be added.
> 
> 
> 
>> +Deprecation and Removal of Link Spidering
>> +=========================================
>> +After that switch, an email will be sent to projects which rely on hosting
>> +external to PyPI. This email will warn these projects that externally hosted
>> +files have been deprecated on PyPI and that in 6 months from the time of that
>> +email that all external links will be removed from the installer APIs. This
>> +email *must* include instructions for converting their projects to be hosted
>> +on PyPI and *must* include links to a script or package that will enable them
>> +to enter their PyPI credentials and package name and have it automatically
>> +download and re-host all of their files on PyPI. This email *must also*
>> +include instructions for setting up their own index page and registering that
>> +with PyPI.
> 
> Please add: The email *must* include the PyPI terms and conditions.

Well users have to explicitly say they agree to the terms whenever they sign up
for an account (and those terms are presented to them). However perhaps they
signed up a long time ago and forget them. I don’t see anything wrong with including
a link to the legal section.

> 
> 
> 
>> +Five months after the initial email, another email must be sent to any projects
>> +still relying on external hosting. This email will include all of the same
>> +information that the first email contained, except that the removal date will
>> +be one month away instead of six.
> 
> Also here, please include the terms and conditions in the mail.
> 
> 
> 
>> +* People are generally surprised that PyPI allows externally linking to files
>> +  and doesn't require people to host on PyPI. In contrast most of them are
>> +  familiar with the concept of multiple software repositories such as is in
>> +  use by many OSs.
> 
> This is speculation and should be deleted.

It’s not speculation. It’s a conclusion based on evidence.

> 
> 
> 
>> +* PyPI is fronted by a globally distributed CDN which has improved the
>> +  reliability and speed for end users. It is unlikely that any particular
>> +  external host has something comparable. This can lead to extremely bad
>> +  performance for end users when the external host is located in different
>> +  parts of the world or does not generally have good connectivity.
> 
> This is editorializing. In fact recently a former Debian project leader has
> called PyPI "a bit flaky":
> 
>    https://mail.python.org/pipermail/distutils-sig/2014-May/024275.html
> 
> Compare that to launchpad.net and code.google.com. These kinds of statements
> should not be in a PEP as they will just lead to ongoing friction.

I don’t think you linked to what you meant to link to here.

> 
> 
> 
>> +  As a data point, many users reported sub DSL speeds and latency when
>> +  accessing PyPI from parts of Europe and Asia prior to the use of the CDN.
> 
> Irrelevant. Because the old PyPI server was overloaded, all external servers
> have to be, too?

It wasn’t overloaded, it was just the result of being on the opposite side of the world.

> 
> 
> 
>> +* PyPI has monitoring and an on-call rotation of sysadmins whom can respond to
>> +  downtime quickly, thus enabling a quicker response to downtime. Again it is
>> +  unlikely that any particular external host will have this.
> 
> That is quite general and irrelevant to the topic of this PEP (replacing one
> scheme of accommodating external hosts with another).

The idea is that users explicitly opt in to which external hosted they rely on. Thus
they make the decision to rely on a host that doesn’t have this instead of it being
made for them. explicit vs implicit.

> 
> 
>> +* PyPI supports mirroring, both for private organizations and public mirrors.
>> +  The legal terms of uploading to PyPI ensure that mirror operators, both
>> +  public and private, have the right to distribute the software found on PyPI.
> 
> I don't think so:
> 
> https://mail.python.org/pipermail/distutils-sig/2014-May/024275.html
> 
>    "People have also said that this overrules the licenses on their packages.
>     That is not so! The licenses in this case run in parallel, and distribution
>     needs to satisfy both licenses or it cannot be done at all."
> 
> So the above paragraph should read:
> 
>    "The legal terms of uploading to PyPI entail that mirror operators, both
>     public and private, have the obligation to check if distribution satisfies
>     both licenses for each software package found on PyPI."
> 
> Best to delete it entirely.

No, the licenses do run parallel, but what that means is if the person licenses
their software in a way that it can’t be distributed under the terms of the PyPI
license then they can’t upload it. It doesn’t *overrule* it *prevents*. At least
this is my understanding of it, IANAL.

The ability for people to mirror automatically and without having to inspect
each individual package was an explicit goal of the license terms as said
by VanL, the lawyer who wrote them:

https://mail.python.org/pipermail/python-legal-sig/2013-March/000003.html

> 
> 
>> +  However software that is hosted externally does not have this, causing
>> +  private organizations to need to investigate each package individually and
>> +  manually to determine if the license allows them to mirror it.
> 
> Software hosted externally has a single license, which is far simpler to
> handle.

See above.

> 
> 
>> +  For public mirrors this essentially means that these externally hosted
>> +  packages *cannot* be reasonably mirrored. This is particularly troublesome
>> +  in countries such as China where the bandwidth to outside of China is
>> +  highly congested making a mirror within China often times a massively better
>> +  experience.
> 
> Not true, see above. Best to delete it.

It is true, see above.

> 
>> +* Installers have no method to determine if they should expect any particular
>> +  URL to be available or not. It is not unusual for the simple API to reference
>> +  old packages and URLs which have long since stopped working. This causes
>> +  installers to have to assume that it is OK for any particular URL to not be
>> +  accessible. This causes problems where an URL is temporarily down or
>> +  otherwise unavailable (a common cause of this is using a copy of Python
>> +  linked against a really ancient copy of OpenSSL which is unable to verify
>> +  the SSL certificate on PyPI) but it *should* be expected to be up. In this
>> +  case installers will typically silently ignore this URL and later the user
>> +  will get a confusing error stating that the installer couldn't find any
>> +  versions instead of getting the real error message indicating that the URL
>> +  was unavailable.
> 
> I do not understand this paragraph (honest!).

I can try to reword it. Essentially it means external hosts are statistically unreliable.
Some are OK uptime wise but it’s obvious when scanning PyPI that a significant
portion of them are unavailable. So we have to assume that an URL is supposed
to be unavailable and can’t hard fail on it being down so we can’t provide
good error messages when an URL isn’t available but is supposed to be because
we have no way of knowing that.

> 
> 
>> +* In the long run, global opt in flags like ``--allow-all-external`` will
>> +  become little annoyances that developers cargo cult around in order to make
>> +  their installer work. When they run into a project that requires it they
>> +  will most likely simply add it to their configuration file for that installer
>> +  and continue on with whatever they were actually trying to do. This will
>> +  continue until they try to install their requirements on another computer
>> +  or attempt to deploy to a server where their install will fail again until
>> +  they add the "make it work" flag in their configuration file.
> 
> It seems to me that this will happen with the new flag, too.

The difference is in the understanding and the scope.

> 
> 
> Stefan Krah
> 
> 
> 
> 
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20140515/448380bf/attachment.sig>


More information about the Distutils-SIG mailing list