The PyPI Twitter feed stopped updating a couple of weeks back -- kind of
missing seeing it, found all sorts of fun things I wouldn't have
otherwise noticed, anyone know if this is a temporary thing or was it
done away with? Thanks.
Hi Donald, Nick, Richard, all,
finally got around to read and think about the issues discussed in PEP470.
First of all thanks for going through the effort of trying to
advance the overall situation with a focus on making it easier
for our wonderful and beloved "end users" :)
However, I think PEP470 needs to achieve stronger backward compatibility for
end-users because, as is typical for the 99%, they like to see change
but hate to be forced to change themselves.
Allow me to remind of how PEP438 worked in this regard: all
end users always remained able to install all projects, including those
with ancient tools and they all benefitted from the changes PEP438
brought: 90% of the projects were automatically switched to
"pypi-explicit" mode, speeding up and making more reliable installs for
everyone across the board. Let me thank specifically and once
again our grand tooler Donald here who implemented most of it.
However, PEP470 does not achieve this level of backward compatibility yet.
Let's look at its current procedure leading up to the final switch:
"After that switch, an email will be sent to projects which rely on
hosting external to PyPI. This email will warn these projects that
externally hosted files have been deprecated on PyPI and that in 6
months from the time of that email that all external links will be
removed from the installer APIs. (...)
Five months after the initial email, another email must be sent to
any projects still relying on external hosting. (...)
Finally a month later all projects will be switched to the pypa-only
mode and PyPI will be modified to remove the externally linked files
functionality."
This process tries to trigger changes from those 2974 project maintainers
who are today operating in pypi-crawl* modes. If we are left with a 1000
stale project maintainers at final-switch time, and speculate about just 100
downloads for each of their projects, it means this final switch may get
us 100000 failing installation interactions the day after the final switch.
Might be higher or lower, but i hope we agree that we'll very likely
have a significant "stale project maintainer" problem affecting
many end-users and existing CI installations etc.
Even for those maintainers who switch to use an external index
as currently advertised by the PEP, and with their release files also
being downloaded a 100 times each, we'll have another 50000 interactions
from end users which need to re-configure their tool usage to switch to
use an external index. Granted, those using a new pip version would get
a useful hint how to do that. Others, using older versions, would have
to discover the project pypi website to hopefully understand how to
make their stuff work again.
In any case, we'd likely get a ton of end-user side installation issues
and i think PEP470 needs to be modified to try minimize this number.
It could take the ball where PEP438 dropped it:
"Thus the hope is that eventually all projects on PyPI can be migrated to
the pypi-explicit mode, while preserving the ability to install release
files hosted externally via installer tools. Deprecation of hosting
modes to eventually only allow the pypi-explicit mode is NOT REGULATED
by this PEP but is expected to become feasible some time after
successful implementation of the transition phases described in this
PEP. It is expected that deprecation requires a new process to deal with
abandoned packages because of unreachable maintainers for still popular
packages."
PEP470 could be this successor, cleaning up and simplifying the situation.
But how to maintain full backward compat and get rid of crawling?
here is a sketched process how we could get rid of pypi-crawl* modes:
- sent a warning note to maintainers a month before their pypi-crawl*
hosted projects are converted (informing about the process, see next points).
Advertise a tool to convert pypi-crawl* hosting modes to pypi-explicit.
This tool automates the crawling to register all found release files
either as explicit references with MD5s, or upload them to become
pypi-hosted files, at the option of the maintainer. It will also switch
the hosting mode on the pypi site automatically.
We'll also disallow pypi-crawl* modes on pypi at warning time for new
projects or to switch to them from other modes.
- a month later a pypi admin (guess who!) uses the same conversion tool,
but with his admin superpowers, to convert any remaining
pypi-crawl* hosting-mode projects automatically with one addition:
all those admin-converted projects will get a "stale" flag
because the maintainer did not react and perform the conversion himself.
This "stale" status will be shown on the web page and new tool releases
can maybe learn to read this flag from the simple page so that they can warn
the end users they are installing a project with a known-to-be stale
maintainer.
The admin-driven conversion can be done incrementally in bunches,
to make it even more unlikely that we are going to face storms
of unhappy end users at any one point and to iron out issues as we go.
The result of this process is that we have only one hosting mode:
pypi-explicit which is already introduced and specified with PEP438.
And pypi's simple pages will continue to present two kinds of links:
- rel="internal": release files directly uploaded to pypi
- other external links will be direct URLS with hash-checksums to external
release files. Tools already can already recognize them and inform the user.
sidenote: if people have a PIP_DOWNLOAD_CACHE they will
only depend on reachability of pypi after they first installed
an external dependency. So it's operationally a good situation given
the fact that using "--allow-externals" provides exactly the same
file installation integrity as pypi hosted files itself do.
After we completed the automated admin-pypi transition there is no external
scraping, no unverified links and tools could drop support for them over
time. And there remain two ways how you can release files: upload them
to pypi or register a checksummed link. In addition, we will have
a clear list of a bunch of "stale" marked projects and can work
with it further.
Note that with this proposed process 93% of maintainers, most toolers
and all end-users can remain ignorant of this PEP and will not be
bothered: everything just continues to work unmodified. Some end users
will experience a speed up because the client-side will not need
to download/crawl additional external simple pages. There are no new
things people need to learn except for the "crawl" maintainers to whom
we nicely and empathically send a message: "switch or be switched" :)
You'll note that the process proposed here does not require
pypi.python.org to manage "external additional indexes" information or
tools to learn to recognize them. At this point, I am not sure it's
really needed for the cleanup and simplifiation issues PEP470 tries to
address.
backward-compat-is-a-thing'ly yours,
holger
Hi all,
(I hope that this hasn't been discussed previously)
so I've been trying to find out whether there's an explicit recommendation for creating and naming scripts/entry points depending on Python version that they're built with, but I didn't find any. As an example, setuptools' easy_install uses "easy_install-MAJOR.MINOR" (with dash), while pip uses "pipMAJOR.MINOR" (without a dash). Also, some projects only create "foo-MAJOR.MINOR", while others also create "foo-MAJOR" (and most also create "foo" without any versions).
It may seem like an overkill, but wouldn't it be best to standardize:
- which version is preferred (with or without dash)
- which of these three variants (foo-MAJOR.MINOR, foo-MAJOR, foo) should be created by default
Or better yet, I think it'd make sense to provide setuptools facilites to create these variants in a sensible default way and provide installation flags to alter this behaviour. Right now, it seems to me that every project is doing this on its own, which is not only inconsistent, but it also duplicates lots of efforts and is more error prone than providing one centralized solution (e.g. a function in distutils/setuptools).
Thoughts/comments?
Thanks!
--
Regards,
Bohuslav "Slavek" Kabrda.
Currently PEP440 has a version specifier syntax like
``foo (2,~=2,==2,!=2,>=2,<=2,>2,<2)``. This is a hold over from PEP 345 of
which I cannot locate a rationale for this change.
I believe that we should revert this syntax back to the setuptools style of
``foo~=2,==2,!=2,>=2,<=2,>2,<2``. This change represents a backwards
incompatible change to how dependencies are specified for dubious benefits.
* It requires that users learn a new syntax for little/no benefit to them.
* It requires the use of quoting if you use this syntax on the shell.
We are depending on the space + parentheses in order to enable:
* A default comparison operator. This is ~= if the leading version is < 1980
or >= if the leading version is >= 1980.
* The direct reference syntax, which is ``foo (from https://...)``.
On these, I think that we should also remove the default comparison idea. It
originally started out as a shorthand for ~= but it was realized that this is
going to do wrong thing for date base releases so it was later changed so that
it does ~= or >= depending on the leading version. However it's still going to
do the wrong thing for a wide variety of projects. The current selector for
which you get (~= or >=) is based off of the leading version, however there are
a lot of projects which this detection simply won't work for. One instance of
a project where it won't is Twisted which has date based releases but instead
of using 2014.0 they do 14.0.
While we could mandate to Twisted (and anyone else) that if they want to do
date based they need to use YYYY and not YY as their leading version, it'll
still do the wrong thing for any rolling release which does not use a date
based release scheme. For instance a scheme that simply does an incrementing
version counter.
I think that the default operator is born out of an attempt to be prescriptive
about what meanings people put in their versions. I believe that the inability
to provide a default that is always going to be correct with all sane schemes
points to the idea that guessing in the face of ambiguity is still a bad idea
and we should just require that people be explicit.
If we assume that we're going to ditch the default comparison operator the only
thing left that _requires_ the ``foo (==2.0)`` syntax is the direct reference
syntax (``foo (from https://...)``). For this I think the downsides of the new
syntax outweigh the minor benefits in syntax. I would suggest that we just
define an operator that means direct reference. Something like
``foo@https://...`` could be reasonable and even has a decent verbal
representation in the form of "foo at https://...". This does have the downside
that it might be somewhat confusing if there is an "@" in the URL we are
referencing.
So what do people think? Drop the default comparison operator idea? Drop the
new syntax and continue using the old?
-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
I've just released version 0.1.9 of distlib on PyPI [1]. For newcomers,
distlib is a library of packaging functionality which is intended to be
usable as the basis for third-party packaging tools.
The main changes in this release are as follows:
Fixed issue #47: Updated binary launchers to fix double-quoting bug
where script executable paths have spaces.
Added ``keystore`` keyword argument to signing and verification APIs.
A more detailed change log is available at [2].
Please try it out, and if you find any problems or have any suggestions for
improvements, please give some feedback using the issue tracker! [3]
Regards,
Vinay Sajip
[1] https://pypi.python.org/pypi/distlib/0.1.9
[2] http://pythonhosted.org/distlib/overview.html#change-log-for-distlib
[3] https://bitbucket.org/pypa/distlib/issues/new
I've had all kinds of troubles getting lxml to buildout on OSX 10.9, as
per
http://stackoverflow.com/questions/22752332/cannot-install-lxml-3-3-3-on-os….
If you can help me with that, that would be awesome.
But assuming you can't: I can get lxml to install using pip, but I can't
get buildout 2 to see and use that.
After the pip install it ends up in
/usr/local/lib/python2.7/site-packages/lxml:
And the interpreter sees it:
-----------
cat bin/buildout:
#!/usr/local/opt/python/bin/python2.7
import sys
sys.path[0:0] = [
'/Users/brad/Development/python/eggs/setuptools-3.5.1-py2.7.egg',
'/Users/brad/Development/python/eggs/zc.buildout-2.2.1-py2.7.egg',
]
import zc.buildout.buildout
if __name__ == '__main__':
sys.exit(zc.buildout.buildout.main())
-----------
/usr/local/opt/python/bin/python2.7
Python 2.7.6 (default, Apr 9 2014, 11:48:52)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml
>>>
-----------
But buildout doesn't pull it in:
minitage.recipe: We have no distributions for lxml that satisfies
'lxml==3.3.5'.
minitage.recipe: Trying to get distribution for 'lxml'
So it tries installing it and I get the header complaints as per the SO
thread.
So am I missing something to get buildout 2 to pull it in from my
site-packages?
Cheers
Brad
So there's an ongoing debate over pip's behaviour around disallowing
external hosting by default (see thread "pip: cdecimal an externally
hosted file and may be unreliable" over on python-dev for the latest
round).
It appears that the reason for disallowing external hosting (as
opposed to unverifiable downloads) is purely about reliability - we
can't be sure that an external host provides the same level of uptime
as PyPI[1]. Given that, it seems to me that the situation is, for an
externally hosted package foo:
`pip install foo` - fails immediately, 100% of the time
`pip install --allow-external foo foo` - works in all but a few
cases where foo's host is down[2]
I cannot understand how guaranteed failure is ever better than
"occasional but rare" failure.
For situations where it is critical to minimise the risk of an
external host outage causing a deployment to fail, the only answer is
to not use foo, or to host foo on your own private index. In both
cases, all you need is to know that foo is externally hosted to do
that - you certainly don't need pip to fail.
As a concrete proposal:
1. Remove --allow-external/--allow-all-external and make it the
default behaviour.
2. Add a new command to pip, maybe something like `pip check-external`
which checks a set of reuirements, and reports the requirements that
are externally hosted and which hosts they rely on. That gives users
who need 100% reliability the information they need to implement the
appropriate solution. Without causing pain for users who don't.
Note that the above is based on the fact[3] that *there are no
security risks to --allow-external*. I am not arguing for a reduction
in security, or a change to any sort of threat model.
Comments?
Paul
[1] Donald explicitly stated that this is the case in the earlier
thread (https://mail.python.org/pipermail/python-dev/2014-May/134454.html).
I think Nick confirmed this (although I don't have a reference to
hand). If it's not true then we need to be a lot clearer and more
explicit about *why* ignoring external hosting by default is needed,
because it seems nobody knows :-(
[2] BTW, the syntax of `--allow-external` is hideous, but that's a
side issue I don't want to debate right now.
[3] See the first note.
I'm not opposed to the Custom Additional Index if downloads can still be
protected by a checksum stored on PyPI.
What I find disturbing is that even after all these discussions the document
contains many personal preferences, heavily biased statements about external
servers and legal advice.
If all these things weren't present and checksum facilities were added,
I'd simply be +1 on the PEP. Instead, I'm forced to vote -1 on the
current draft and go through it point by point.
nick.coghlan <python-checkins(a)python.org> wrote:
> +Author: Donald Stufft <donald(a)stufft.io>,
> +
> +Custom Additional Index
> +-----------------------
> +
> +Compared to the complex rules which a project must be aware of to prevent
> +themselves from being considered unsafely hosted setting up an index is fairly
> +trivial and in the simplest case does not require anything more than a
> +filesystem and a standard web server such as Nginx.
It's trivial to add the explicit URL with the checksum compared to configuring
a new subdomain, so I don't like the reason given.
However, if it is still possible to deposit a checksum on PyPI instead
of using an SSL certificate to secure the download, I don't mind the
custom index.
> +External Links on the Simple Installer API
> +------------------------------------------
I think the existing scheme could have been simplified:
https://mail.python.org/pipermail/distutils-sig/2014-May/024275.html
But I'm not going to insist, since the new scheme is no worse if checksums
can be added.
> +Deprecation and Removal of Link Spidering
> +=========================================
> +After that switch, an email will be sent to projects which rely on hosting
> +external to PyPI. This email will warn these projects that externally hosted
> +files have been deprecated on PyPI and that in 6 months from the time of that
> +email that all external links will be removed from the installer APIs. This
> +email *must* include instructions for converting their projects to be hosted
> +on PyPI and *must* include links to a script or package that will enable them
> +to enter their PyPI credentials and package name and have it automatically
> +download and re-host all of their files on PyPI. This email *must also*
> +include instructions for setting up their own index page and registering that
> +with PyPI.
Please add: The email *must* include the PyPI terms and conditions.
> +Five months after the initial email, another email must be sent to any projects
> +still relying on external hosting. This email will include all of the same
> +information that the first email contained, except that the removal date will
> +be one month away instead of six.
Also here, please include the terms and conditions in the mail.
> +* People are generally surprised that PyPI allows externally linking to files
> + and doesn't require people to host on PyPI. In contrast most of them are
> + familiar with the concept of multiple software repositories such as is in
> + use by many OSs.
This is speculation and should be deleted.
> +* PyPI is fronted by a globally distributed CDN which has improved the
> + reliability and speed for end users. It is unlikely that any particular
> + external host has something comparable. This can lead to extremely bad
> + performance for end users when the external host is located in different
> + parts of the world or does not generally have good connectivity.
This is editorializing. In fact recently a former Debian project leader has
called PyPI "a bit flaky":
https://mail.python.org/pipermail/distutils-sig/2014-May/024275.html
Compare that to launchpad.net and code.google.com. These kinds of statements
should not be in a PEP as they will just lead to ongoing friction.
> + As a data point, many users reported sub DSL speeds and latency when
> + accessing PyPI from parts of Europe and Asia prior to the use of the CDN.
Irrelevant. Because the old PyPI server was overloaded, all external servers
have to be, too?
> +* PyPI has monitoring and an on-call rotation of sysadmins whom can respond to
> + downtime quickly, thus enabling a quicker response to downtime. Again it is
> + unlikely that any particular external host will have this.
That is quite general and irrelevant to the topic of this PEP (replacing one
scheme of accommodating external hosts with another).
> +* PyPI supports mirroring, both for private organizations and public mirrors.
> + The legal terms of uploading to PyPI ensure that mirror operators, both
> + public and private, have the right to distribute the software found on PyPI.
I don't think so:
https://mail.python.org/pipermail/distutils-sig/2014-May/024275.html
"People have also said that this overrules the licenses on their packages.
That is not so! The licenses in this case run in parallel, and distribution
needs to satisfy both licenses or it cannot be done at all."
So the above paragraph should read:
"The legal terms of uploading to PyPI entail that mirror operators, both
public and private, have the obligation to check if distribution satisfies
both licenses for each software package found on PyPI."
Best to delete it entirely.
> + However software that is hosted externally does not have this, causing
> + private organizations to need to investigate each package individually and
> + manually to determine if the license allows them to mirror it.
Software hosted externally has a single license, which is far simpler to
handle.
> + For public mirrors this essentially means that these externally hosted
> + packages *cannot* be reasonably mirrored. This is particularly troublesome
> + in countries such as China where the bandwidth to outside of China is
> + highly congested making a mirror within China often times a massively better
> + experience.
Not true, see above. Best to delete it.
> +* Installers have no method to determine if they should expect any particular
> + URL to be available or not. It is not unusual for the simple API to reference
> + old packages and URLs which have long since stopped working. This causes
> + installers to have to assume that it is OK for any particular URL to not be
> + accessible. This causes problems where an URL is temporarily down or
> + otherwise unavailable (a common cause of this is using a copy of Python
> + linked against a really ancient copy of OpenSSL which is unable to verify
> + the SSL certificate on PyPI) but it *should* be expected to be up. In this
> + case installers will typically silently ignore this URL and later the user
> + will get a confusing error stating that the installer couldn't find any
> + versions instead of getting the real error message indicating that the URL
> + was unavailable.
I do not understand this paragraph (honest!).
> +* In the long run, global opt in flags like ``--allow-all-external`` will
> + become little annoyances that developers cargo cult around in order to make
> + their installer work. When they run into a project that requires it they
> + will most likely simply add it to their configuration file for that installer
> + and continue on with whatever they were actually trying to do. This will
> + continue until they try to install their requirements on another computer
> + or attempt to deploy to a server where their install will fail again until
> + they add the "make it work" flag in their configuration file.
It seems to me that this will happen with the new flag, too.
Stefan Krah
I’ve just published a draft of PEP 470 - Using Multi Index Support for External to PyPI Package File Hosting
You can see this online at http://legacy.python.org/dev/peps/pep-0470/ or read below
-------------------------------------------------------------------------------
PEP: 470
Title: Using Multi Index Support for External to PyPI Package File Hosting
Version: $Revision$
Last-Modified: $Date$
Author: Donald Stufft <donald(a)stufft.io>,
BDFL-Delegate: Richard Jones <richard(a)python.org>
Discussions-To: distutils-sig(a)python.org
Status: Draft
Type: Process
Content-Type: text/x-rst
Created: 12-May-2014
Post-History: 14-May-2014
Abstract
========
This PEP proposes that the official means of having an installer locate and
find package files which are hosted externally to PyPI become the use of
multi index support instead of the practice of using external links on the
simple installer API.
It is important to remember that this is **not** about forcing anyone to host
their files on PyPI. If someone does not wish to do so they will never be under
any obligation too. They can still list their project in PyPI as an index, and
the tooling will still allow them to host it elsewhere.
Rationale
=========
There is a long history documented in PEP 438 that explains why externally
hosted files exist today in the state that they do on PyPI. For the sake of
brevity I will not duplicate that and instead urge readers to first take a look
at PEP 438 for background.
There are currently two primary ways for a project to make itself available
without directly hosting the package files on PyPI. They can either include
links to the package files in the simpler installer API or they can publish
a custom package index which contains their project.
Custom Additional Index
-----------------------
Each installer which speaks to PyPI offers a mechanism for the user invoking
that installer to provide additional custom locations to search for files
during the dependency resolution phase. For pip these locations can be
configured per invocation, per shell environment, per requirements file, per
virtual environment, and per user.
The use of additional indexes instead of external links on the simple
installer API provides a simple clean interface which is consistent with the
way most Linux package systems work (apt-get, yum, etc). More importantly it
works the same even for projects which are commercial or otherwise have their
access restricted in some form (private networks, password, IP ACLs etc)
while the external links method only realistically works for projects which
do not have their access restricted.
Compared to the complex rules which a project must be aware of to prevent
themselves from being considered unsafely hosted setting up an index is fairly
trivial and in the simplest case does not require anything more than a
filesystem and a standard web server such as Nginx or Twisted Web. Even if
using simple static hosting without autoindexing support, it is still
straightforward to generate appropriate index pages as static HTML.
Example Index with Twisted Web
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Create a root directory for your index, for the purposes of the example
I'll assume you've chosen ``/var/www/index.example.com/``.
2. Inside of this root directory, create a directory for each project such
as ``mkdir -p /var/www/index.example.com/{foo,bar,other}/``.
3. Place the package files for each project in their respective folder,
creating paths like ``/var/www/index.example.com/foo/foo-1.0.tar.gz``.
4. Configure Twisted Web to serve the root directory, ideally with TLS.
::
$ twistd -n web --path /var/www/index.example.com/
Examples of Additional indexes with pip
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
**Invocation:**
::
$ pip install --extra-index-url https://pypi.example.com/ foobar
**Shell Environment:**
::
$ export PIP_EXTRA_INDEX_URL=https://pypi.example.com/
$ pip install foobar
**Requirements File:**
::
$ echo "--extra-index-url https://pypi.example.com/\nfoobar" > requirements.txt
$ pip install -r requirements.txt
**Virtual Environment:**
::
$ python -m venv myvenv
$ echo "[global]\nextra-index-url = https://pypi.exmaple.com/" > myvenv/pip.conf
$ myvenv/bin/pip install foobar
**User:**
::
$ echo "[global]\nextra-index-url = https://pypi.exmaple.com/" >~/.pip/pip.conf
$ pip install foobar
External Links on the Simple Installer API
------------------------------------------
PEP438 proposed a system of classifying file links as either internal,
external, or unsafe. It recommended that by default only internal links would
be installed by an installer however users could opt into external links on
either a global or a per package basis. Additionally they could also opt into
unsafe links on a per package basis.
This system has turned out to be *extremely* unfriendly towards the end users
and it is the position of this PEP that the situation has become untenable. The
situation as provided by PEP438 requires an end user to be aware not only of
the difference between internal, external, and unsafe, but also to be aware of
what hosting mode the package they are trying to install is in, what links are
available on that project's /simple/ page, whether or not those links have
a properly formatted hash fragment, and what links are available from pages
linked to from that project's /simple/ page.
There are a number of common confusion/pain points with this system that I
have witnessed:
* Users unaware what the simple installer api is at all or how an installer
locates installable files.
* Users unaware that even if the simple api links to a file, if it does
not include a ``#md5=...`` fragment that it will be counted as unsafe.
* Users unaware that an installer can look at pages linked from the
simple api to determine additional links, or that any links found in this
fashion are considered unsafe.
* Users are unaware and often surprised that PyPI supports hosting your files
someplace other than PyPI at all.
In addition to that, the information that an installer is able to provide
when an installation fails is pretty minimal. We are able to detect if there
are externally hosted files directly linked from the simple installer api,
however we cannot detect if there are files hosted on a linked page without
fetching that page and doing so would cause a massive performance hit just to
see if there might be a file there so that a better error message could be
provided.
Finally very few projects have properly linked to their external files so that
they can be safely downloaded and verified. At the time of this writing there
are a total of 65 projects which have files that are only available externally
and are safely hosted.
The end result of all of this, is that with PEP 438, when a user attempts to
install a file that is not hosted on PyPI typically the steps they follow are:
1. First, they attempt to install it normally, using ``pip install foobar``.
This fails because the file is not hosted on PyPI and PEP 438 has us default
to only hosted on PyPI. If pip detected any externally hosted files or other
pages that we *could* have attempted to find other files at it will give an
error message suggesting that they try ``--allow-external foobar``.
2. They then attempt to install their package using
``pip install --allow-external foobar foobar``. If they are lucky foobar is
one of the packages which is hosted externally and safely and this will
succeed. If they are unlucky they will get a different error message
suggesting that they *also* try ``--allow-unverified foobar``.
3. They then attempt to install their package using
``pip install --allow-external foobar --allow-unverified foobar foobar``
and this finally works.
This is the same basic steps that practically everyone goes through every time
they try to install something that is not hosted on PyPI. If they are lucky it'll
only take them two steps, but typically it requires three steps. Worse there is
no real indication to these people why one package might install after two
but most require three. Even worse than that most of them will never get an
externally hosted package that does not take three steps, so they will be
increasingly annoyed and frustrated at the intermediate step and will likely
eventually just start skipping it.
External Index Discovery
========================
One of the problems with using an additional index is one of discovery. Users
will not generally be aware that an additional index is required at all much
less where that index can be found. Projects can attempt to convey this
information using their description on the PyPI page however that excludes
people who discover their project organically through ``pip search``.
To support projects that wish to externally host their files and to enable
users to easily discover what additional indexes are required, PyPI will gain
the ability for projects to register external index URLs and additionally an
associated comment for each. These URLs will be made available on the simple
page however they will not be linked or provided in a form that older
installers will automatically search them.
When an installer fetches the simple page for a project, if it finds this
additional meta-data and it cannot find any files for that project in it's
configured URLs then it should use this data to tell the user how to add one
or more of the additional URLs to search in. This message should include any
comments that the project has included to enable them to communicate to the
user and provide hints as to which URL they might want if some are only
useful or compatible with certain platforms or situations.
This feature *must* be added to PyPI prior to starting the deprecation and
removal process for link spidering.
Deprecation and Removal of Link Spidering
=========================================
A new hosting mode will be added to PyPI. This hosting mode will be called
``pypi-only`` and will be in addition to the three that PEP438 has already given
us which are ``pypi-explicit``, ``pypi-scrape``, ``pypi-scrape-crawl``. This
new hosting mode will modify a project's simple api page so that it only lists
the files which are directly hosted on PyPI and will not link to anything else.
Upon acceptance of this PEP and the addition of the ``pypi-only`` mode, all new
projects will by defaulted to the PyPI only mode and they will be locked to
this mode and unable to change this particular setting. ``pypi-only`` projects
will still be able to register external index URLs as described above - the
"pypi-only" refers only to the download links that are published directly on
PyPI.
An email will then be sent out to all of the projects which are hosted only on
PyPI informing them that in one month their project will be automatically
converted to the ``pypi-only`` mode. A month after these emails have been sent
any of those projects which were emailed, which still are hosted only on PyPI
will have their mode set to ``pypi-only``.
After that switch, an email will be sent to projects which rely on hosting
external to PyPI. This email will warn these projects that externally hosted
files have been deprecated on PyPI and that in 6 months from the time of that
email that all external links will be removed from the installer APIs. This
email *must* include instructions for converting their projects to be hosted
on PyPI and *must* include links to a script or package that will enable them
to enter their PyPI credentials and package name and have it automatically
download and re-host all of their files on PyPI. This email *must also*
include instructions for setting up their own index page and registering that
with PyPI.
Five months after the initial email, another email must be sent to any projects
still relying on external hosting. This email will include all of the same
information that the first email contained, except that the removal date will
be one month away instead of six.
Finally a month later all projects will be switched to the ``pypa-only`` mode
and PyPI will be modified to remove the externally linked files functionality.
Impact
======
============ ======= ========== =======
\ PyPI External Total
============ ======= ========== =======
**Safe** 37779 65 37844
**Unsafe** 0 2974 2974
**Total** 37779 3039
============ ======= ========== =======
Rejected Proposals
==================
Keep the current classification system but adjust the options
-------------------------------------------------------------
This PEP rejects several related proposals which attempt to fix some of the
usability problems with the current system but while still keeping the
general gist of PEP 438.
This includes:
* Default to allowing safely externally hosted files, but disallow unsafely
hosted.
* Default to disallowing safely externally hosted files with only a global
flag to enable them, but disallow unsafely hosted.
These proposals are rejected because:
* The classification "system" is complex, hard to explain, and requires an
intimate knowledge of how the simple API works in order to be able to reason
about which classification is required. This is reflected in the fact that
the code to implement it is complicated and hard to understand as well.
* People are generally surprised that PyPI allows externally linking to files
and doesn't require people to host on PyPI. In contrast most of them are
familiar with the concept of multiple software repositories such as is in
use by many OSs.
* PyPI is fronted by a globally distributed CDN which has improved the
reliability and speed for end users. It is unlikely that any particular
external host has something comparable. This can lead to extremely bad
performance for end users when the external host is located in different
parts of the world or does not generally have good connectivity.
As a data point, many users reported sub DSL speeds and latency when
accessing PyPI from parts of Europe and Asia prior to the use of the CDN.
* PyPI has monitoring and an on-call rotation of sysadmins whom can respond to
downtime quickly, thus enabling a quicker response to downtime. Again it is
unlikely that any particular external host will have this. This can lead
to single packages in a dependency chain being un-installable. This will
often confuse users, who often times have no idea that this package relies
on an external host, and they cannot figure out why PyPI appears to be up
but the installer cannot find a package.
* PyPI supports mirroring, both for private organizations and public mirrors.
The legal terms of uploading to PyPI ensure that mirror operators, both
public and private, have the right to distribute the software found on PyPI.
However software that is hosted externally does not have this, causing
private organizations to need to investigate each package individually and
manually to determine if the license allows them to mirror it.
For public mirrors this essentially means that these externally hosted
packages *cannot* be reasonably mirrored. This is particularly troublesome
in countries such as China where the bandwidth to outside of China is
highly congested making a mirror within China often times a massively better
experience.
* Installers have no method to determine if they should expect any particular
URL to be available or not. It is not unusual for the simple API to reference
old packages and URLs which have long since stopped working. This causes
installers to have to assume that it is OK for any particular URL to not be
accessible. This causes problems where an URL is temporarily down or
otherwise unavailable (a common cause of this is using a copy of Python
linked against a really ancient copy of OpenSSL which is unable to verify
the SSL certificate on PyPI) but it *should* be expected to be up. In this
case installers will typically silently ignore this URL and later the user
will get a confusing error stating that the installer couldn't find any
versions instead of getting the real error message indicating that the URL
was unavailable.
* In the long run, global opt in flags like ``--allow-all-external`` will
become little annoyances that developers cargo cult around in order to make
their installer work. When they run into a project that requires it they
will most likely simply add it to their configuration file for that installer
and continue on with whatever they were actually trying to do. This will
continue until they try to install their requirements on another computer
or attempt to deploy to a server where their install will fail again until
they add the "make it work" flag in their configuration file.
Copyright
=========
This document has been placed in the public domain.
-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA