[Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

Monty Taylor mordred at inaugust.com
Tue Mar 12 19:51:15 CET 2013



On 03/12/2013 11:00 AM, M.-A. Lemburg wrote:
> On 12.03.2013 18:33, Jesse Noller wrote:
>>
>>>
>>> And I've put multiple compromise proposals out there to begin
>>> mitigating the problem *now* (i.e. for non-updated versions of
>>> setuptools), and every time, the objection is, "no, we need to ban it
>>> all now, no discussion, no re-evaluation, no personal choice, everyone
>>> must do as we say, no argument".
>>>
>>> And I don't understand that, at all.
>>
>> There's not much to understand: external hosting of packages is *actively harmful*, period. End users of easy_install and pip *don't even realize* 99% of the time that these tools are following links off of PyPi and installing packages from random, probably insecure/non https locations all over the internet. Once they realize it they recoil in terror if they have any understanding of the implications.
>>
>> Let me put this in different terms: out of the packages using external hosting: can you prove to me that 100% of them aren't compromised machines serving malware, performing MITM attacks, etc? The fact that the end user tools support this is a bug, but one from history. The fact that PyPI continues to support external links on simple/ is inexcusable given that we know that they are an attack vector. 
>>
>> A simple proof of concept on a popular package hosted off site deployed during PyCon would be terrible, it was bad enough that last year people were trying to MITM due to lack of SSL. 
> 
> Let's please not exaggerate all this. It's not like PyPI is
> the only server out there implementing HTTPS, ye know ;-)
> 
> A single package uploaded on PyPI with os.system('rm -rf')
> in its setup.py could easily ruin all this and no HTTPS in this
> world would stop it from showing its ugly face.
> 
> The whole Python package eco-system works based on trust and
> injecting fear into this system is not helpful, IMO.
> 
> People need to understand the possible issues, we need to make
> things safer from both the client and the server side and
> improve the tool chain. There's really nothing new here.

externally hosted packages isn't just about security. It's about
reliability of the service. PyPI as it is right now with externally
hosted packages is 100% unusable in automated systems for reasons having
nothing to do with security. For better or for worse, PyPI _IS_ the
place where python packages are expected to exist and be uploaded.
However, attempting to hang on to a feature which undermines the ability
of the service to be used is absolutely mind-blowing to me.

Why, you ask, is it broken?

a) it's massively unreliable, because reliability is now dependent on
the availability of ALL of the external link hosting sites combined.
It's not even just the packages - version information lookups, which
should take 0.1 second and be the most reliable thing ever, have to
spider a billion web pages.

b) It's massively slow. All that spidering of lycos and altavista and
some random trac site? Slow. Guess what - that spidering is happening on
my LAPTOP - so while sitting here on this plane, if I want to install a
package that's on PyPI, it has to go web-spider other things.

c) It's agressive about being both of the above. Even if packages are
hosted on PyPI, my local client will STILL spider external sites that
are listed.

The funny part is, if you remove the externally hosted packages, pypi is
a wonderfully elegant system that is super easy to scale. A PyPI can be
completely static, which is how we run the partial-mirror that OpenStack
is forced to run due to the instability of homepages stored on Apple
IIe's of various random people who decided that "python setup.py sdist
upload" is too hard to run. It's great. We love it. I works for just
about everything.

Except for those darned external links.

Why are we persisting in trying to make this super complex? Can we
revisit PEP20 here? Specifically:

Explicit is better than implicit.
Simple is better than complex.
Flat is better than nested.
...
There should be one-- and preferably only one --obvious way to do it

If I run :

pip install foo

I am EXPLICITLY asking for a package from PyPI, not from launchpad.
There is a URL option, which would allow to to request a package from
somewhere that is not pypi should I want to do that.

Having to spider out to external sites is more complex that not doing that.

External sites are effectively needless nesting.

Most importantly - PyPI is there - it's where we upload packages? What
benefit do we gain from subverting that?

Nothing.

Remove the external links. Please.

Monty


More information about the Catalog-SIG mailing list