[Catalog-sig] an immutable mirror of PyPI

Martijn Faassen faassen at startifact.com
Fri Jul 15 18:20:53 CEST 2011

Hi there,

On Fri, Jul 15, 2011 at 5:59 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> Yes, allowing this follows from the "developers should have total
>> freedom" goal that is apparently the main driver behind PyPI's use
>> cases, but there are also "security please" and "repeatability" use cases.
> There's no security on PyPI and repeatability is a myth as well :-)

Yes, there is no security and repeatability *now*. That's why I
started this whole discussion in the first place, right? You need to
argue why these are *not use cases* for PyPI, not that they're not
there now.

Because it seems if PyPI doesn't care about those, PyPI is really not
usable for automated downloading of packages at all. And it seems that
is a use case of PyPI?

> Just because we don't have malicious packages on PyPI, doesn't
> mean it's going to stay like this forever and for repeatability
> you're better off relying on files you've already downloaded and
> tested, since it is possible to reupload a package release file
> with different content, or to change the meta data of a release
> after its initial upload.

Yes, I talked about reuploading packages and the risks involved in
that before, but at least that's not possible by someone else who has
nothing to do with the removed package in the first place. Removing a
package entirely and letting the name be reused opens up the
possibility for completely unrelated people to come in and do nasty

> I'd suggest you build a read-only PyPI mirror tool that
> people can download and then use on their private net as
> they see fit. This solves you use case and that of the other
> proponents of the read-only PyPI idea, while leaving the
> mainland PyPI index unchanged.

I will explain again why this doesn't solve my use case.

I don't work in a vacuum. I share code with others. This code has
dependencies on other code. So how do people obtain this other code?

With a private mirror, I'd need to first run all that infrastructure,
and then give people access to my private mirror. How private is it
going to stay then? How many people are using my code? What
maintenance burden does that entail for me?

Or I need to give them a possibly giant tarball of all the packages
I'm using. Is that really the only way forward for sharing code? Even
for libraries with dependencies?

PyPI I thought was among other things central place where people can
download and install packages from so that they can resolve
dependencies, but you seem to be arguing against doing that. At most
it's some kind of showcase for packages that peoples should take into
their consideration. Taking this point to the extreme, it's *never*
something that you can automate downloading from. Instead you should
be giving a giant tarball of packages to everybody, always, if they
use your code at all.

Then this will need to be explained to the people who *are* using PyPI
for this purpose. I.e. some significant fraction of all Python
programmers out there.



More information about the Catalog-SIG mailing list