[Distutils] some questions about PEP470

Donald Stufft donald at stufft.io
Sat Oct 11 20:27:16 CEST 2014

> On Oct 11, 2014, at 2:31 AM, holger krekel <holger at merlinux.eu> wrote:
> Hi Donald,
> many thanks for answering.  A few follow up questions inline.
> On Thu, Oct 09, 2014 at 13:40 -0400, Donald Stufft wrote:
>>> On Oct 9, 2014, at 12:41 PM, holger krekel <holger at merlinux.eu> wrote:
>>> Numbers of users affected
>>> ---------------------------------
>>> Do i see it right that the PEP470 changes would mean about 6-7 thousand 
>>> users (per day) need to change their installation options to use
>>> "--extra-index-url"?  If not, how many?  Is there a monthly figure?
>> It’s impossible to couch this in terms of “users” because we have no way
>> of correlating what we see on the PyPI side with users. On the single day
>> I selected to look at the logs (which was more or less the day before the
>> day I was compounding numbers) there were 6.6k total unique IP addresses
>> that hit a /simple/ page which belonged to one of the affected projects.
>> Beyond knowing how many IP addresses it’s difficult to determine how that
>> correlates into users, that could be a single user with 6.6k different EC2
>> machines, or it could be 6.6k individual users (or even more than that if
>> there is a transparent proxy at play). In all likelihood it is not a single
>> user and it is not 6.6k users but somewhere in between.
>> Important to point out that this number also includes people spinning up
>> bandersnatch mirrors, devpi mirrors, or any other automated fetching of
>> the /simple/ page for reasons other than “I’d like to install this project”.
> I understand it's hard to get to somewhat sensible numbers and the
> number of unique IPs is probably only an upper bound.  devpi and bandersnatch
> make even that fuzzy because more than one user may be behind each such
> instance.  Anyway, can you provide a monthly number of unique IP addresses on
> the simple pages on projects with external links?

I can do that, it’ll take a little while because the scripts I have to do it
aren’t really designed to handle more than a single file and the files are
quite large once decompressed (30 days is roughly 120GB of log files). 

>>> And that the affected users can only do that if the respective
>>> maintainers of the projects offer an external index (or re-upload to PyPI)?
>> No and Yes.
>> Wherever pip/easy_install are currently finding the download from can serve
>> as the external index. This likely won’t be the most efficient repository
>> since often times these are regular web pages which have other content and
>> the like but it won’t be any worse than it is currently. For instance you
>> can take a look at https://bpaste.net/show/5a83985ad2e6 to see using the
>> current page as a find-links repository with pip.
> How can affected users discover they need to use this particular option
> and URL if they use today's pip/easy_install versions and a post PEP470 PyPI?

I plan to put the external repositories (and the commands needed to use them)
in the UI for PyPI. I suppose I should put that in the PEP as well, I was more
focused on defining the API differences and the changes.

>>> Do i see it right that up to a 1000 maintainers need to act and offer an
>>> external index if they want to keep their projects properly installable?
>> If their project is already installable, then they already have something
>> which is usable as either a simple or a find-links repository. The only
>> action required on their part is if they want the discovery affordances
>> in this PEP they would need to tell PyPI that.
> Is this true also for the (small but still) set of maintainers who
> registered external links with checksums?

A find-link repository can be as simple as a direct link to a file yes, it’s
not the best idea since you can’t ever change the location of that file without
breaking it for people but it can be done.

> If maintainers don't act, will using a post PEP470 released pip help
> the users in any way?


A common complaint that I’ve seen from users is that when faced with network
errors pip’s output is extremely unhelpful. For instance here’s an issue about
how TLS errors are presented - https://github.com/pypa/pip/issues/1511. The
primary reason for this is that because we don’t know if a particular link
*should* be up, or if it’s expected to be down (xfail, in testing lingo) we have
to assume that it should be down. Sometimes there can be many of these links so
to avoid having lots of extraneous “error” looking messages in the output we just
silently hide them. However if none of the links can be found they’ll just get
a confusing “Sorry can’t find anything to install for that”. Worse yet, if we
can find some things but not everything then we might install an older insecure
version. This could actually be used to prevent people from getting a security
update for a particular project if they’ve switched from hosting on PyPI to hosting
externally and a MITM attacker is blocking their attempts to reach the external

A post PEP 470 pip will be able to treat all URLs it fetches as mandatory, so it
can raise proper error messages when it can’t locate an URL to communicate that
to the end user and ideally make things far less confusing.

>>> I've understood you made these two statements during the discussion:
>>> - PEP438 caused bad UI for dealing with pypi-external links -- 
>>> many people are confused by it and we thus need to fix it.
>>> - PEP470 breaking backward compatibility for pypi-external links is
>>> not a big deal because it affects only a tiny fraction of the users.
>>> Could you choose which one of them you consider is true?
>> I consider them both to be true.
>> The PEP 438 UX is confusing, out of the people who have had to use it I
>> have seem a fairly high percentage of those completely confused by it. It,
>> especially right when pip 1.5 was released, was one of our most reported
>> issues. The total number of people who need to use it has gone down over
>> time, however I still believe that percentage wise most people who need to
>> use it are confused by it.
>> I do not believe that PEP 470 breaking backwards compatability for pypi-external
>> links to be a terrible burden because it only affects a small percentage of the
>> total users of PyPI.
>> I think perhaps the reason you think both of them can’t be true is you’re
>> assuming that I’m talking about percentages of the same total population?
> Yes, i was assuming that for both statements the same basis group was used.
> So i understand know you are saying overall very few people depend on
> external links but out of those who do, many are confused and annoyed
> about how it works.

Yes exactly.

> Will the people who suffered from the current external linking options
> be the same ones who could be affected by backward compatibility issues
> (i.e. commands which now work can fail with a post-PEP470 PyPI server)?

Yes-ish. A lot of those people switched away from external hosting all together
or chose to rely on different projects that didn’t need that. It would likely
be a subset of those users.

> personal side question: do i remember correctly that when we discussed
> PEP438 you pushed for the current set of behaviours wrt to external
> links while i tried to keep it simpler because you put higher priority
> on protection against MITM attacks?

I have very little recollection of the discussion around PEP 438, I have
a bad memory for details like that. It’s entirely possible and sounds like
me to be security focused.

>>> Recommendation of "--extra-index-url"
>>> --------------------------------------
>>> In your mind and forgetting about PEP470, in what situations exactly is
>>> "pip install --extra-index-url" a safe option for users?
>> The answer to this isn’t really related to —extra-index-url, ``pip install foo``
>> is “safe” (given the threat model we operate under) if, and only if, you trust
>> the operators of all of the repositories you have configured (by default, via
>> —index-url, via —extra-index-url, via —find-links, and via —process-dependency-links),
>> to give you the correct files for “foo”. How the repositories have come to be
>> configured isn’t particularly meaningful.
> I understand that as a fairly generic security statement.  But I was trying to
> rather ask about use cases and scenarios where precisely the
> --extra-index-url option is useful and to be recommended.
> I'd be grateful if Nick or you could still describe use cases,
> especially outside PEP470 external links context (the option existed 
> before so i presume there must be some use cases).

I’m not sure exactly what you want here. It’s hard to come up with an exhaustive
list of scenarios where it’s useful. One I’ve used in the past was:

-i https://index.example.com/base/ —extra-index-url https://index.example.com/project-foo-overrides/

>>> Interpretation of external link usage
>>> --------------------------------------------
>>> In the main rationale you say:
>>>   "While a large number of projects did ultimately decide to upload to
>>>   PyPI, some of them did so only because the UX around what PEP 438 was so
>>>   bad that they felt forced to do so."
>>> Could you provide some tractable background (not just your strong opinion)
>>> for this interpretation?  Why can it not be that people nowadays just
>>> prefer to upload to PyPI without even considering alternative options?
>> Well Stefan had voiced that complaint last time that he felt we were trying
>> to force him to upload to PyPI by making the UX so bad. I’ve had a few other
>> people say similar things to me in private.
> I can sympathize.  In fact, I think we didn't deliver the upload tools
> that we outlined with PEP438, particularly registration of externally 
> verified links.  My bad as well.
> best,
> holger

Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

More information about the Distutils-SIG mailing list