[Distutils] [Python-checkins] peps: PEP 470: use external indexes over link spidering

Ian Cordasco graffatcolmingov at gmail.com
Thu May 15 18:50:46 CEST 2014


On Thu, May 15, 2014 at 10:13 AM, Stefan Krah
<stefan-usenet at bytereef.org> wrote:
> I'm not opposed to the Custom Additional Index if downloads can still be
> protected by a checksum stored on PyPI.
>
> What I find disturbing is that even after all these discussions the document
> contains many personal preferences, heavily biased statements about external
> servers and legal advice.

PEPs frequently include personal preferences and reasoning. PEP-0466
is an example of this.

> nick.coghlan <python-checkins at python.org> wrote:
>> +Author: Donald Stufft <donald at stufft.io>,
>> +
>> +Custom Additional Index
>> +-----------------------
>> +
>> +Compared to the complex rules which a project must be aware of to prevent
>> +themselves from being considered unsafely hosted setting up an index is fairly
>> +trivial and in the simplest case does not require anything more than a
>> +filesystem and a standard web server such as Nginx.
>
> It's trivial to add the explicit URL with the checksum compared to configuring
> a new subdomain, so I don't like the reason given.

If we're arguing triviality of things, it's trivial to just upload
your package to PyPI, so I don't like your reason given. Then again my
dislike of your reason doesn't invalidate it, just as your dislike of
this doesn't invalidate it.

>> +Deprecation and Removal of Link Spidering
>> +=========================================
>> +After that switch, an email will be sent to projects which rely on hosting
>> +external to PyPI. This email will warn these projects that externally hosted
>> +files have been deprecated on PyPI and that in 6 months from the time of that
>> +email that all external links will be removed from the installer APIs. This
>> +email *must* include instructions for converting their projects to be hosted
>> +on PyPI and *must* include links to a script or package that will enable them
>> +to enter their PyPI credentials and package name and have it automatically
>> +download and re-host all of their files on PyPI. This email *must also*
>> +include instructions for setting up their own index page and registering that
>> +with PyPI.
>
> Please add: The email *must* include the PyPI terms and conditions.

I think a link to the Terms and Conditions is far more reasonable.

>> +Five months after the initial email, another email must be sent to any projects
>> +still relying on external hosting. This email will include all of the same
>> +information that the first email contained, except that the removal date will
>> +be one month away instead of six.
>
> Also here, please include the terms and conditions in the mail.

Again, a link to them should suffice.

>> +* People are generally surprised that PyPI allows externally linking to files
>> +  and doesn't require people to host on PyPI. In contrast most of them are
>> +  familiar with the concept of multiple software repositories such as is in
>> +  use by many OSs.
>
> This is speculation and should be deleted.

This is not speculation. If you poll #python Freenode you'll see this
is a consensus opinion.

>> +* PyPI is fronted by a globally distributed CDN which has improved the
>> +  reliability and speed for end users. It is unlikely that any particular
>> +  external host has something comparable. This can lead to extremely bad
>> +  performance for end users when the external host is located in different
>> +  parts of the world or does not generally have good connectivity.
>
> This is editorializing. In fact recently a former Debian project leader has
> called PyPI "a bit flaky":
>
>     https://mail.python.org/pipermail/distutils-sig/2014-May/024275.html

Either this hasn't happened and you're intentionally linking to the
wrong message assuming people won't actually read it, or you linked to
the wrong message. I'd like to believe the latter but apparently there
are grand conspiracies in the Python Packaging community and I'm not
about to just trust you if there are, you could be one of the
conspirators.

>> +  As a data point, many users reported sub DSL speeds and latency when
>> +  accessing PyPI from parts of Europe and Asia prior to the use of the CDN.
>
> Irrelevant. Because the old PyPI server was overloaded, all external servers
> have to be, too?

No but link spidering seriously can degrade performance. Perhaps this
should be reworded to be more explicit, but the point stands.

>> +* PyPI has monitoring and an on-call rotation of sysadmins whom can respond to
>> +  downtime quickly, thus enabling a quicker response to downtime. Again it is
>> +  unlikely that any particular external host will have this.
>
> That is quite general and irrelevant to the topic of this PEP (replacing one
> scheme of accommodating external hosts with another).

It isn't irrelevant. PyPI's availability and speed is superior to it's
past and to many other services and hosts. If anything happens to the
server you host cdecimal on, you're responsible for understanding that
and handling that. As a developer who uploads their software to PyPI
though you wouldn't have to worry about that. The downtime would be
minute.

>> +* PyPI supports mirroring, both for private organizations and public mirrors.
>> +  The legal terms of uploading to PyPI ensure that mirror operators, both
>> +  public and private, have the right to distribute the software found on PyPI.
>
> I don't think so:
>
> https://mail.python.org/pipermail/distutils-sig/2014-May/024275.html

There must be a conspiracy. All your mail links point to the same message.

>> +  However software that is hosted externally does not have this, causing
>> +  private organizations to need to investigate each package individually and
>> +  manually to determine if the license allows them to mirror it.
>
> Software hosted externally has a single license, which is far simpler to
> handle.

Software hosted on PyPI has n+1 license determinations. PyPI's terms
and conditions and then each package's license. I find it hard to
believe that 1 extra lookup will make a significant difference.

>> +* Installers have no method to determine if they should expect any particular
>> +  URL to be available or not. It is not unusual for the simple API to reference
>> +  old packages and URLs which have long since stopped working. This causes
>> +  installers to have to assume that it is OK for any particular URL to not be
>> +  accessible. This causes problems where an URL is temporarily down or
>> +  otherwise unavailable (a common cause of this is using a copy of Python
>> +  linked against a really ancient copy of OpenSSL which is unable to verify
>> +  the SSL certificate on PyPI) but it *should* be expected to be up. In this
>> +  case installers will typically silently ignore this URL and later the user
>> +  will get a confusing error stating that the installer couldn't find any
>> +  versions instead of getting the real error message indicating that the URL
>> +  was unavailable.
>
> I do not understand this paragraph (honest!).

Since links are frequently broken, installers can never know whether
or not they should expect any single URL to be available. If a package
is hosted externally and none of the URLs respond in a way that pip
can use to extract the desired package, it has no choice but to tell
the user that no versions of the package were available to download.
pip cannot know which URL it should expect to be available and if
there are 15 separate URLs, telling the user that every single one is
broken is of no use to the user; they still cannot download your
package. Does that help?


More information about the Distutils-SIG mailing list