[Distutils] [Python-checkins] peps: PEP 470: use external indexes over link spidering

Stefan Krah stefan-usenet at bytereef.org
Thu May 15 17:13:24 CEST 2014


I'm not opposed to the Custom Additional Index if downloads can still be
protected by a checksum stored on PyPI.

What I find disturbing is that even after all these discussions the document
contains many personal preferences, heavily biased statements about external
servers and legal advice.


If all these things weren't present and checksum facilities were added,
I'd simply be +1 on the PEP.  Instead, I'm forced to vote -1 on the
current draft and go through it point by point.


nick.coghlan <python-checkins at python.org> wrote:
> +Author: Donald Stufft <donald at stufft.io>,
> +
> +Custom Additional Index
> +-----------------------
> +
> +Compared to the complex rules which a project must be aware of to prevent
> +themselves from being considered unsafely hosted setting up an index is fairly
> +trivial and in the simplest case does not require anything more than a
> +filesystem and a standard web server such as Nginx.

It's trivial to add the explicit URL with the checksum compared to configuring
a new subdomain, so I don't like the reason given.

However, if it is still possible to deposit a checksum on PyPI instead
of using an SSL certificate to secure the download, I don't mind the
custom index.



> +External Links on the Simple Installer API
> +------------------------------------------

I think the existing scheme could have been simplified:

   https://mail.python.org/pipermail/distutils-sig/2014-May/024275.html

But I'm not going to insist, since the new scheme is no worse if checksums
can be added.



> +Deprecation and Removal of Link Spidering
> +=========================================
> +After that switch, an email will be sent to projects which rely on hosting
> +external to PyPI. This email will warn these projects that externally hosted
> +files have been deprecated on PyPI and that in 6 months from the time of that
> +email that all external links will be removed from the installer APIs. This
> +email *must* include instructions for converting their projects to be hosted
> +on PyPI and *must* include links to a script or package that will enable them
> +to enter their PyPI credentials and package name and have it automatically
> +download and re-host all of their files on PyPI. This email *must also*
> +include instructions for setting up their own index page and registering that
> +with PyPI.

Please add: The email *must* include the PyPI terms and conditions.



> +Five months after the initial email, another email must be sent to any projects
> +still relying on external hosting. This email will include all of the same
> +information that the first email contained, except that the removal date will
> +be one month away instead of six.

Also here, please include the terms and conditions in the mail.



> +* People are generally surprised that PyPI allows externally linking to files
> +  and doesn't require people to host on PyPI. In contrast most of them are
> +  familiar with the concept of multiple software repositories such as is in
> +  use by many OSs.

This is speculation and should be deleted.



> +* PyPI is fronted by a globally distributed CDN which has improved the
> +  reliability and speed for end users. It is unlikely that any particular
> +  external host has something comparable. This can lead to extremely bad
> +  performance for end users when the external host is located in different
> +  parts of the world or does not generally have good connectivity.

This is editorializing. In fact recently a former Debian project leader has
called PyPI "a bit flaky":

    https://mail.python.org/pipermail/distutils-sig/2014-May/024275.html

Compare that to launchpad.net and code.google.com. These kinds of statements
should not be in a PEP as they will just lead to ongoing friction.



> +  As a data point, many users reported sub DSL speeds and latency when
> +  accessing PyPI from parts of Europe and Asia prior to the use of the CDN.

Irrelevant. Because the old PyPI server was overloaded, all external servers
have to be, too?



> +* PyPI has monitoring and an on-call rotation of sysadmins whom can respond to
> +  downtime quickly, thus enabling a quicker response to downtime. Again it is
> +  unlikely that any particular external host will have this.

That is quite general and irrelevant to the topic of this PEP (replacing one
scheme of accommodating external hosts with another).


> +* PyPI supports mirroring, both for private organizations and public mirrors.
> +  The legal terms of uploading to PyPI ensure that mirror operators, both
> +  public and private, have the right to distribute the software found on PyPI.

I don't think so:

https://mail.python.org/pipermail/distutils-sig/2014-May/024275.html

    "People have also said that this overrules the licenses on their packages.
     That is not so! The licenses in this case run in parallel, and distribution
     needs to satisfy both licenses or it cannot be done at all."

So the above paragraph should read:

    "The legal terms of uploading to PyPI entail that mirror operators, both
     public and private, have the obligation to check if distribution satisfies
     both licenses for each software package found on PyPI."

Best to delete it entirely.


> +  However software that is hosted externally does not have this, causing
> +  private organizations to need to investigate each package individually and
> +  manually to determine if the license allows them to mirror it.

Software hosted externally has a single license, which is far simpler to
handle.


> +  For public mirrors this essentially means that these externally hosted
> +  packages *cannot* be reasonably mirrored. This is particularly troublesome
> +  in countries such as China where the bandwidth to outside of China is
> +  highly congested making a mirror within China often times a massively better
> +  experience.

Not true, see above. Best to delete it.

> +* Installers have no method to determine if they should expect any particular
> +  URL to be available or not. It is not unusual for the simple API to reference
> +  old packages and URLs which have long since stopped working. This causes
> +  installers to have to assume that it is OK for any particular URL to not be
> +  accessible. This causes problems where an URL is temporarily down or
> +  otherwise unavailable (a common cause of this is using a copy of Python
> +  linked against a really ancient copy of OpenSSL which is unable to verify
> +  the SSL certificate on PyPI) but it *should* be expected to be up. In this
> +  case installers will typically silently ignore this URL and later the user
> +  will get a confusing error stating that the installer couldn't find any
> +  versions instead of getting the real error message indicating that the URL
> +  was unavailable.

I do not understand this paragraph (honest!).


> +* In the long run, global opt in flags like ``--allow-all-external`` will
> +  become little annoyances that developers cargo cult around in order to make
> +  their installer work. When they run into a project that requires it they
> +  will most likely simply add it to their configuration file for that installer
> +  and continue on with whatever they were actually trying to do. This will
> +  continue until they try to install their requirements on another computer
> +  or attempt to deploy to a server where their install will fail again until
> +  they add the "make it work" flag in their configuration file.

It seems to me that this will happen with the new flag, too.


Stefan Krah






More information about the Distutils-SIG mailing list